Mailing List Archive

Get distinct fields values from lucene index
Hello,

I am using lucene in my organization. I want to know how can I get distinct values from lucene index. I have tried ?GroupingSearch? API but it doesn?t serves the purpose. It will give all documents contains distinct values. I have used below code.


final GroupingSearch groupingSearch = new GroupingSearch(groupField);

Sort sort = new Sort(new SortField(groupField, SortField.Type.STRING_VAL, false));
groupingSearch.setSortWithinGroup(sort);
Query query = new MatchAllDocsQuery();
TopGroups<BytesRef> topGroups = null;

try {
topGroups = groupingSearch.search(searcher, query, 0, 10);
} catch (final IOException e) {
System.out.println("Can't execute group search because of an IOException. "+ e);
}

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
Re: Get distinct fields values from lucene index [ In reply to ]
In Solr and ES this is done with faceting and aggregations,
respectively, based on Lucene's low-level APIs. Have you looked at
TermsEnum? You can use that to get all distinct terms for a segment,
and then it is up to you to coalesce terms across segments ("leaves").

On Thu, Nov 21, 2019 at 1:15 AM Amol Suryawanshi
<amol.suryawanshi@qualitiasoft.com> wrote:
>
> Hello,
>
> I am using lucene in my organization. I want to know how can I get distinct values from lucene index. I have tried “GroupingSearch” API but it doesn’t serves the purpose. It will give all documents contains distinct values. I have used below code.
>
>
> final GroupingSearch groupingSearch = new GroupingSearch(groupField);
>
> Sort sort = new Sort(new SortField(groupField, SortField.Type.STRING_VAL, false));
> groupingSearch.setSortWithinGroup(sort);
> Query query = new MatchAllDocsQuery();
> TopGroups<BytesRef> topGroups = null;
>
> try {
> topGroups = groupingSearch.search(searcher, query, 0, 10);
> } catch (final IOException e) {
> System.out.println("Can't execute group search because of an IOException. "+ e);
> }
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Get distinct fields values from lucene index [ In reply to ]
Hello Michael,

Thanks for the response,

I have tried the approach suggested by you(TermsEnum) but it is not working for me. I have used below code.


String field = "address";
try (IndexReader reader = Utils.getIndexReader(indexDirectoryPath))
{
List<LeafReaderContext> leaves = reader.leaves();
for (LeafReaderContext leaf : leaves) {
Terms _terms = leaf.reader().terms(field);
if (_terms == null) {
continue;
}

TermsEnum termsEnum = _terms.iterator();
System.out.println(termsEnum);
}
} catch (IOException e) {
e.printStackTrace();
}


Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

________________________________
From: Michael Sokolov <msokolov@gmail.com>
Sent: Friday, November 22, 2019 8:11:25 PM
To: java-user@lucene.apache.org <java-user@lucene.apache.org>
Subject: Re: Get distinct fields values from lucene index

In Solr and ES this is done with faceting and aggregations,
respectively, based on Lucene's low-level APIs. Have you looked at
TermsEnum? You can use that to get all distinct terms for a segment,
and then it is up to you to coalesce terms across segments ("leaves").

On Thu, Nov 21, 2019 at 1:15 AM Amol Suryawanshi
<amol.suryawanshi@qualitiasoft.com> wrote:
>
> Hello,
>
> I am using lucene in my organization. I want to know how can I get distinct values from lucene index. I have tried ?GroupingSearch? API but it doesn?t serves the purpose. It will give all documents contains distinct values. I have used below code.
>
>
> final GroupingSearch groupingSearch = new GroupingSearch(groupField);
>
> Sort sort = new Sort(new SortField(groupField, SortField.Type.STRING_VAL, false));
> groupingSearch.setSortWithinGroup(sort);
> Query query = new MatchAllDocsQuery();
> TopGroups<BytesRef> topGroups = null;
>
> try {
> topGroups = groupingSearch.search(searcher, query, 0, 10);
> } catch (final IOException e) {
> System.out.println("Can't execute group search because of an IOException. "+ e);
> }
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Get distinct fields values from lucene index [ In reply to ]
Thanks Michael,

Appreciate your feedback. It?s working for me now.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

________________________________
From: Amol Suryawanshi <amol.suryawanshi@qualitiasoft.com>
Sent: Monday, November 25, 2019 7:05:36 PM
To: java-user@lucene.apache.org <java-user@lucene.apache.org>
Subject: RE: Get distinct fields values from lucene index

Hello Michael,

Thanks for the response,

I have tried the approach suggested by you(TermsEnum) but it is not working for me. I have used below code.


String field = "address";
try (IndexReader reader = Utils.getIndexReader(indexDirectoryPath))
{
List<LeafReaderContext> leaves = reader.leaves();
for (LeafReaderContext leaf : leaves) {
Terms _terms = leaf.reader().terms(field);
if (_terms == null) {
continue;
}

TermsEnum termsEnum = _terms.iterator();
System.out.println(termsEnum);
}
} catch (IOException e) {
e.printStackTrace();
}


Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

________________________________
From: Michael Sokolov <msokolov@gmail.com>
Sent: Friday, November 22, 2019 8:11:25 PM
To: java-user@lucene.apache.org <java-user@lucene.apache.org>
Subject: Re: Get distinct fields values from lucene index

In Solr and ES this is done with faceting and aggregations,
respectively, based on Lucene's low-level APIs. Have you looked at
TermsEnum? You can use that to get all distinct terms for a segment,
and then it is up to you to coalesce terms across segments ("leaves").

On Thu, Nov 21, 2019 at 1:15 AM Amol Suryawanshi
<amol.suryawanshi@qualitiasoft.com> wrote:
>
> Hello,
>
> I am using lucene in my organization. I want to know how can I get distinct values from lucene index. I have tried ?GroupingSearch? API but it doesn?t serves the purpose. It will give all documents contains distinct values. I have used below code.
>
>
> final GroupingSearch groupingSearch = new GroupingSearch(groupField);
>
> Sort sort = new Sort(new SortField(groupField, SortField.Type.STRING_VAL, false));
> groupingSearch.setSortWithinGroup(sort);
> Query query = new MatchAllDocsQuery();
> TopGroups<BytesRef> topGroups = null;
>
> try {
> topGroups = groupingSearch.search(searcher, query, 0, 10);
> } catch (final IOException e) {
> System.out.println("Can't execute group search because of an IOException. "+ e);
> }
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org