Just a curiosity...
It's my understanding that passing "optimize => 1" to finish() after
making a lot of changes will result in an index that's optimized for
speed. However, in addition to that, I'm finding that it's having an
effect on search results as well, albeit a positive one. My problem
is that some queries only work on an optimized index.
Consider a document with the following content:
"salad.robot mercenary"
Just random words that won't be gobbled up by the stop list. Consider
also that the tokenizing expression just looks for words. The content
would be split like: "salad|robot|mercenary".
After adding this document to my index for the first time, I can find
it with any of the following queries:
salad
robot
mercenary
salad.robot
"robot mercenary"
"salad.robot mercenary"
If I then re-index the document without making any changes to the
content, essentially just remove it and add it, and then call the
non-optimizing finish(), all of the above queries continue to work
accept for "salad.robot". That query does work if I optimize the
index after re-adding the document, however.
Perhaps I don't fully understand what KinoSearch is doing with that
query, but I suspect "salad.robot" is the equivalent to asking for
"token salad followed by token robot". Indeed, I should be able to
replace the period with any other token barrier. For example, this
should work equally well:
salad!?!!!robot
and indeed it does, but only after optimizing the index.
Granted, this may seem like an odd sort of search to perform. If the
period was important to me, I could change the tokenizer so that it
includes it in the list of characters to keep, and I may end up doing
that anyways. I guess I'm just curious to know why that query only
works after using optimize.
I should point out that I'm using KinoSearch 0.15.
It's my understanding that passing "optimize => 1" to finish() after
making a lot of changes will result in an index that's optimized for
speed. However, in addition to that, I'm finding that it's having an
effect on search results as well, albeit a positive one. My problem
is that some queries only work on an optimized index.
Consider a document with the following content:
"salad.robot mercenary"
Just random words that won't be gobbled up by the stop list. Consider
also that the tokenizing expression just looks for words. The content
would be split like: "salad|robot|mercenary".
After adding this document to my index for the first time, I can find
it with any of the following queries:
salad
robot
mercenary
salad.robot
"robot mercenary"
"salad.robot mercenary"
If I then re-index the document without making any changes to the
content, essentially just remove it and add it, and then call the
non-optimizing finish(), all of the above queries continue to work
accept for "salad.robot". That query does work if I optimize the
index after re-adding the document, however.
Perhaps I don't fully understand what KinoSearch is doing with that
query, but I suspect "salad.robot" is the equivalent to asking for
"token salad followed by token robot". Indeed, I should be able to
replace the period with any other token barrier. For example, this
should work equally well:
salad!?!!!robot
and indeed it does, but only after optimizing the index.
Granted, this may seem like an odd sort of search to perform. If the
period was important to me, I could change the tokenizer so that it
includes it in the list of characters to keep, and I may end up doing
that anyways. I guess I'm just curious to know why that query only
works after using optimize.
I should point out that I'm using KinoSearch 0.15.