Thank you for taking the time to write such a detailed answer, Avi. And
apologies for not providing more info from the get go.
What I am trying to achieve here is supporting autocomplete (no pun
intended) in a web form field, hence the -i case insensitive example in
my initial question.
Your points are all good, and my original question was a bit rushed. I
guess that the problem was that I saw this video: https://www.youtube.com/watch?v=gRvZbYtwTeo&ab_channel=NextDayVideo
The idea that someone types into an input field and matches start
dancing in the browser made me think that this was exactly what I
needed, and hence I figured that asking here about Whoosh would be a
good idea. I know realize that Whoosh would be overkill for my use-case,
as a simple (case insensitive) query substring would get me 90% of what
I want. Speed is in the order of a few milliseconds out of the box,
which is chump change in the context of a web UI.
Thank you again for taking the time to look at my question
On 3/5/2023 10:56 PM, firstname.lastname@example.org wrote: > Dino, Sending lots of data to an archived forum is not a great idea. I
> snipped most of it out below as not to replicate it.
> Your question does not look difficult unless your real question is about
> speed. Realistically, much of the time spent generally is in reading in a
> file and the actual search can be quite rapid with a wide range of methods.
> The data looks boring enough and seems to not have much structure other than
> one comma possibly separating two fields. Do you want the data as one wide
> filed or perhaps in two parts, which a CSV file is normally used to
> represent. Do you ever have questions like tell me all cars whose name
> begins with the letter D and has a V6 engine? If so, you may want more than
> a vanilla search.
> What exactly do you want to search for? Is it a set of built-in searches or
> something the user types in?
> The data seems to be sorted by the first field and then by the second and I
> did not check if some searches might be ambiguous. Can there be many entries
> containing III? Yep. Can the same words like Cruiser or Hybrid appear?
> So is this a one-time search or multiple searches once loaded as in a
> service that stays resident and fields requests. The latter may be worth
> speeding up.
> I don't NEED to know any of this but want you to know that the answer may
> depend on this and similar factors. We had a long discussion lately on
> whether to search using regular expressions or string methods. If your data
> is meant to be used once, you may not even need to read the file into
> memory, but read something like a line at a time and test it. Or, if you end
> up with more data like how many cylinders a car has, it may be time to read
> it in not just to a list of lines or such data structures, but get
> numpy/pandas involved and use their many search methods in something like a
> Of course if you are worried about portability, keep using Get Regular
> Expression Print.
> Your example was:
> $ grep -i v60 all_cars_unique.csv
> You seem to have wanted case folding and that is NOT a normal search. And
> your search is matching anything on any line. If you wanted only a complete
> field, such as all text after a comma to the end of the line, you could use
> grep specifications to say that.
> But once inside python, you would need to make choices depending on what
> kind of searches you want to allow but also things like do you want all
> matching lines shown if you search for say "a" ...