Mailing List Archive

Re: Fast full-text searching in Python (job for Whoosh?)
On 5/03/23 5:12 pm, Dino wrote:
> I can do a substring search in a list of 30k elements in less than 2ms
> with Python. Is my reasoning sound?

I just did a similar test with your actual data and got
about the same result. If that's fast enough for you,
then you don't need to do anything fancy.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?) [ In reply to ]
On 3/5/2023 1:19 AM, Greg Ewing wrote:
> I just did a similar test with your actual data and got
> about the same result. If that's fast enough for you,
> then you don't need to do anything fancy.

thank you, Greg. That's what I am going to do in fact.

--
https://mail.python.org/mailman/listinfo/python-list
Re: RE: Fast full-text searching in Python (job for Whoosh?) [ In reply to ]
Thank you for taking the time to write such a detailed answer, Avi. And
apologies for not providing more info from the get go.

What I am trying to achieve here is supporting autocomplete (no pun
intended) in a web form field, hence the -i case insensitive example in
my initial question.

Your points are all good, and my original question was a bit rushed. I
guess that the problem was that I saw this video:

https://www.youtube.com/watch?v=gRvZbYtwTeo&ab_channel=NextDayVideo

The idea that someone types into an input field and matches start
dancing in the browser made me think that this was exactly what I
needed, and hence I figured that asking here about Whoosh would be a
good idea. I know realize that Whoosh would be overkill for my use-case,
as a simple (case insensitive) query substring would get me 90% of what
I want. Speed is in the order of a few milliseconds out of the box,
which is chump change in the context of a web UI.

Thank you again for taking the time to look at my question

Dino

On 3/5/2023 10:56 PM, avi.e.gross@gmail.com wrote:
> Dino, Sending lots of data to an archived forum is not a great idea. I
> snipped most of it out below as not to replicate it.
>
> Your question does not look difficult unless your real question is about
> speed. Realistically, much of the time spent generally is in reading in a
> file and the actual search can be quite rapid with a wide range of methods.
>
> The data looks boring enough and seems to not have much structure other than
> one comma possibly separating two fields. Do you want the data as one wide
> filed or perhaps in two parts, which a CSV file is normally used to
> represent. Do you ever have questions like tell me all cars whose name
> begins with the letter D and has a V6 engine? If so, you may want more than
> a vanilla search.
>
> What exactly do you want to search for? Is it a set of built-in searches or
> something the user types in?
>
> The data seems to be sorted by the first field and then by the second and I
> did not check if some searches might be ambiguous. Can there be many entries
> containing III? Yep. Can the same words like Cruiser or Hybrid appear?
>
> So is this a one-time search or multiple searches once loaded as in a
> service that stays resident and fields requests. The latter may be worth
> speeding up.
>
> I don't NEED to know any of this but want you to know that the answer may
> depend on this and similar factors. We had a long discussion lately on
> whether to search using regular expressions or string methods. If your data
> is meant to be used once, you may not even need to read the file into
> memory, but read something like a line at a time and test it. Or, if you end
> up with more data like how many cylinders a car has, it may be time to read
> it in not just to a list of lines or such data structures, but get
> numpy/pandas involved and use their many search methods in something like a
> data.frame.
>
> Of course if you are worried about portability, keep using Get Regular
> Expression Print.
>
> Your example was:
>
> $ grep -i v60 all_cars_unique.csv
> Genesis,GV60
> Volvo,V60
>
> You seem to have wanted case folding and that is NOT a normal search. And
> your search is matching anything on any line. If you wanted only a complete
> field, such as all text after a comma to the end of the line, you could use
> grep specifications to say that.
>
> But once inside python, you would need to make choices depending on what
> kind of searches you want to allow but also things like do you want all
> matching lines shown if you search for say "a" ...
>
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?) [ In reply to ]
On Mon, 6 Mar 2023 07:40:29 -0500, Dino wrote:

> The idea that someone types into an input field and matches start
> dancing in the browser made me think that this was exactly what I
> needed, and hence I figured that asking here about Whoosh would be a
> good idea. I know realize that Whoosh would be overkill for my use-case,
> as a simple (case insensitive) query substring would get me 90% of what
> I want. Speed is in the order of a few milliseconds out of the box,
> which is chump change in the context of a web UI.

For a web application the round trips to the server for the next set of
suggestions swamp out the actual lookups. Use the developer console in
your browser to look at the network traffic and you'll see it's busy.
--
https://mail.python.org/mailman/listinfo/python-list