The new safebrowsing cvd (starting with version 48473) seems to be sorted
in a way that increases the load time of that file by several orders of
magnitude.
I have a previous version from February where the entries in the gdb
section are sorted like this:
S2:F:0000917787cff7b0993917209809ff3d94bec7e1de7188b323d9b88e0273cb71
S2:F:000149794d90dc5bce4f685deed6076d00c9209bd81cef4cbdf8a4e41f0a2153
S2:F:00042c895c912fd567afa35450cfe5d321d0d68eb3833156925c4e27d2c29aa2
S2:F:0006d4dcb0d939d725e676a9e68aaeb303e04478e6861d2a77469d1b6a0a0f7d
S2:F:0007bf7c1808d12177f0ae90d336d60c5a7a3d89703806955b75c56f898dd919
...
S2:P:00009177
S2:P:00014979
S2:P:00042c89
S2:P:0006d4dc
S2:P:0007bf7c
...
S:F:00000860493997b798861956e06d3d3606f82384259b971bb922f94f886a4b55
S:F:00000bddafae162a7a2f1249b3b38c8e4b6d3cb8bf0c30c26cc354ebcba16b37
S:F:000046cad35fbecbcc8dd4ebb244bd08aa6dbf1078279115c82f8e21b2cf8478
S:F:0000684200da7b11f38a6f4719bda4ec6c6ae8b2be1f7e12a16605b2d3a5d490
S:F:000072f3f33e47a2f97b8711d240267462aa3f0a5f8130845b119a2ad3798292
...
S:P:00000860
S:P:00000bdd
S:P:000046ca
S:P:00006842
S:P:000072f3
That loads into clamd (and clamscan) in under 5 seconds for the 3041760
entries in it.
Version 48473 and 48474 are sorted like this:
S2:P:00009177
S2:F:0000917787cff7b0993917209809ff3d94bec7e1de7188b323d9b88e0273cb71
S2:P:00014979
S2:F:000149794d90dc5bce4f685deed6076d00c9209bd81cef4cbdf8a4e41f0a2153
...
That version loads in 50+ seconds for the 3229612 entries in it.
If I flip the order of the entries so the :F: entries comes before the
corresponding :P: entry, it loads the same number of entries in 5 - 10
seconds.
If I reorder the entire file so that _all_ the :F: entries for each section
(S or S2) come before the :P: entries for that section, it loads in under 5
seconds again.
Earlier today it was mentioned that 'the next version of the CVD' would fix
it (when 48473 was the current version). That seems to have not been the
case since 48474 didn't fix it. Is there a plan to fix it? Or will we have
to live with the enormous load times for this database?
--Maarten
in a way that increases the load time of that file by several orders of
magnitude.
I have a previous version from February where the entries in the gdb
section are sorted like this:
S2:F:0000917787cff7b0993917209809ff3d94bec7e1de7188b323d9b88e0273cb71
S2:F:000149794d90dc5bce4f685deed6076d00c9209bd81cef4cbdf8a4e41f0a2153
S2:F:00042c895c912fd567afa35450cfe5d321d0d68eb3833156925c4e27d2c29aa2
S2:F:0006d4dcb0d939d725e676a9e68aaeb303e04478e6861d2a77469d1b6a0a0f7d
S2:F:0007bf7c1808d12177f0ae90d336d60c5a7a3d89703806955b75c56f898dd919
...
S2:P:00009177
S2:P:00014979
S2:P:00042c89
S2:P:0006d4dc
S2:P:0007bf7c
...
S:F:00000860493997b798861956e06d3d3606f82384259b971bb922f94f886a4b55
S:F:00000bddafae162a7a2f1249b3b38c8e4b6d3cb8bf0c30c26cc354ebcba16b37
S:F:000046cad35fbecbcc8dd4ebb244bd08aa6dbf1078279115c82f8e21b2cf8478
S:F:0000684200da7b11f38a6f4719bda4ec6c6ae8b2be1f7e12a16605b2d3a5d490
S:F:000072f3f33e47a2f97b8711d240267462aa3f0a5f8130845b119a2ad3798292
...
S:P:00000860
S:P:00000bdd
S:P:000046ca
S:P:00006842
S:P:000072f3
That loads into clamd (and clamscan) in under 5 seconds for the 3041760
entries in it.
Version 48473 and 48474 are sorted like this:
S2:P:00009177
S2:F:0000917787cff7b0993917209809ff3d94bec7e1de7188b323d9b88e0273cb71
S2:P:00014979
S2:F:000149794d90dc5bce4f685deed6076d00c9209bd81cef4cbdf8a4e41f0a2153
...
That version loads in 50+ seconds for the 3229612 entries in it.
If I flip the order of the entries so the :F: entries comes before the
corresponding :P: entry, it loads the same number of entries in 5 - 10
seconds.
If I reorder the entire file so that _all_ the :F: entries for each section
(S or S2) come before the :P: entries for that section, it loads in under 5
seconds again.
Earlier today it was mentioned that 'the next version of the CVD' would fix
it (when 48473 was the current version). That seems to have not been the
case since 48474 didn't fix it. Is there a plan to fix it? Or will we have
to live with the enormous load times for this database?
--Maarten