Mailing List Archive

Searching Lucene FAQ with Lucene
Hi

I am working on a webapp called "Katie" in order to detect duplicated
questions

https://ukatie.com/

As a test case I have imported the Lucene FAQ

https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ

to

https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en

and made them available at

https://lucene-faq.ukatie.com/

whereas the FAQ are loaded as JSON from the REST interface of Katie

https://ukatie.com/swagger-ui/?urls.primaryName=API%20V2#/faq-controller-v-2/getFAQUsingGET_1

and the Javascript can be found at

https://github.com/wyona/katie-4-faq

I am currently "experimenting" with different search algorithms, e.g.

Lucene only
SentenceBERT- Lucene Vector Search
SentenceBERT only
Weaviate

The goal is to find the right answer with "similar" questions, e.g.

- "Are there mailing lists?"
- "How can I ask questions re Lucene?"

independent whether the question was trained/indexed or not or the
answer contains keywords of the question

whereas the answer in this particular case is

https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=e19b6f48-62ac-427a-9d5e-d4e4eb110769

and another meaningful answer could be

https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=154d9aa7-29e6-457e-a2ad-315b1a67599f

There is still a lot to be improved :-) but it is lot of fun to use
Lucene for this!

Any feedback is very welcome or if you want to know more about the
implementation details.

Thanks

Michael



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Searching Lucene FAQ with Lucene [ In reply to ]
interesting -- it always matches *something* I guess? It might be
helpful to show not only the answer, but also the question that was
matched?

On Mon, Dec 20, 2021 at 5:05 AM Michael Wechner
<michael.wechner@wyona.com> wrote:
>
> Hi
>
> I am working on a webapp called "Katie" in order to detect duplicated
> questions
>
> https://ukatie.com/
>
> As a test case I have imported the Lucene FAQ
>
> https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ
>
> to
>
> https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en
>
> and made them available at
>
> https://lucene-faq.ukatie.com/
>
> whereas the FAQ are loaded as JSON from the REST interface of Katie
>
> https://ukatie.com/swagger-ui/?urls.primaryName=API%20V2#/faq-controller-v-2/getFAQUsingGET_1
>
> and the Javascript can be found at
>
> https://github.com/wyona/katie-4-faq
>
> I am currently "experimenting" with different search algorithms, e.g.
>
> Lucene only
> SentenceBERT- Lucene Vector Search
> SentenceBERT only
> Weaviate
>
> The goal is to find the right answer with "similar" questions, e.g.
>
> - "Are there mailing lists?"
> - "How can I ask questions re Lucene?"
>
> independent whether the question was trained/indexed or not or the
> answer contains keywords of the question
>
> whereas the answer in this particular case is
>
> https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=e19b6f48-62ac-427a-9d5e-d4e4eb110769
>
> and another meaningful answer could be
>
> https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=154d9aa7-29e6-457e-a2ad-315b1a67599f
>
> There is still a lot to be improved :-) but it is lot of fun to use
> Lucene for this!
>
> Any feedback is very welcome or if you want to know more about the
> implementation details.
>
> Thanks
>
> Michael
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Searching Lucene FAQ with Lucene [ In reply to ]
Am 21.12.21 um 18:49 schrieb Michael Sokolov:
> interesting -- it always matches *something* I guess?

yes, but this is something I would like to improve, that it knows when
it does not know :-)

I understand Lucene provides a score, but just defining a threshold
doesn't really solve the problem, or do I misunderstand this?

It seems to me one has to implement some kind of "understanding /
reasoning" in order to solve this. Or what would be your approach?

> It might be
> helpful to show not only the answer, but also the question that was
> matched?

yes, definitely, whereas the Katie frontend already provides this
functionality

https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en

but I have to enhance the Javascript client used at

https://lucene-faq.ukatie.com/

Thanks

Michael

>
> On Mon, Dec 20, 2021 at 5:05 AM Michael Wechner
> <michael.wechner@wyona.com> wrote:
>> Hi
>>
>> I am working on a webapp called "Katie" in order to detect duplicated
>> questions
>>
>> https://ukatie.com/
>>
>> As a test case I have imported the Lucene FAQ
>>
>> https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ
>>
>> to
>>
>> https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en
>>
>> and made them available at
>>
>> https://lucene-faq.ukatie.com/
>>
>> whereas the FAQ are loaded as JSON from the REST interface of Katie
>>
>> https://ukatie.com/swagger-ui/?urls.primaryName=API%20V2#/faq-controller-v-2/getFAQUsingGET_1
>>
>> and the Javascript can be found at
>>
>> https://github.com/wyona/katie-4-faq
>>
>> I am currently "experimenting" with different search algorithms, e.g.
>>
>> Lucene only
>> SentenceBERT- Lucene Vector Search
>> SentenceBERT only
>> Weaviate
>>
>> The goal is to find the right answer with "similar" questions, e.g.
>>
>> - "Are there mailing lists?"
>> - "How can I ask questions re Lucene?"
>>
>> independent whether the question was trained/indexed or not or the
>> answer contains keywords of the question
>>
>> whereas the answer in this particular case is
>>
>> https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=e19b6f48-62ac-427a-9d5e-d4e4eb110769
>>
>> and another meaningful answer could be
>>
>> https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=154d9aa7-29e6-457e-a2ad-315b1a67599f
>>
>> There is still a lot to be improved :-) but it is lot of fun to use
>> Lucene for this!
>>
>> Any feedback is very welcome or if you want to know more about the
>> implementation details.
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org