Dear all,
currently I am reading text fields that contain xml text. Hence, the solr input may look like this:
<field name=”tagged_text”><sec sec-type="Introduction" id="SECID0E4F">
<title>Introduction</title>
</sec>
</field>
With all “<” and “>” escaped.
I wrote a tokenizer that indexes the tag attributes (e.g. sec-type=”Introduction”) on the position of the tagged word (“Introduction” in this case) and hence I need the HTML tags when indexing. However, I want to strip the HTML in the stored string that is shown to the user on a query. So far, I figured out that the index and the stored string a separated. Thus, I thought it should be possible to manipulate the stored string either after indexing.
Is there a way to do so? I would prefer to manipulate the stored string and not introduce a second field with the plain text in the input file.
I am glad for any help!
Best Regards,
Adrian
-------------------------------------------------------
Adrian Pachzelt
- Fachinformationsdienst Biodiversitaetsforschung -
- Hosting von Open Access-Zeitschriften -
Universitaetsbibliothek Johann Christian Senckenberg
Bockenheimer Landstr. 134-138
60325 Frankfurt am Main
Tel. 069/798-39382
a.pachzelt@ub.uni-frankfurt.de<mailto:a.pachzelt@ub.uni-frankfurt.de>
-------------------------------------------------------
currently I am reading text fields that contain xml text. Hence, the solr input may look like this:
<field name=”tagged_text”><sec sec-type="Introduction" id="SECID0E4F">
<title>Introduction</title>
</sec>
</field>
With all “<” and “>” escaped.
I wrote a tokenizer that indexes the tag attributes (e.g. sec-type=”Introduction”) on the position of the tagged word (“Introduction” in this case) and hence I need the HTML tags when indexing. However, I want to strip the HTML in the stored string that is shown to the user on a query. So far, I figured out that the index and the stored string a separated. Thus, I thought it should be possible to manipulate the stored string either after indexing.
Is there a way to do so? I would prefer to manipulate the stored string and not introduce a second field with the plain text in the input file.
I am glad for any help!
Best Regards,
Adrian
-------------------------------------------------------
Adrian Pachzelt
- Fachinformationsdienst Biodiversitaetsforschung -
- Hosting von Open Access-Zeitschriften -
Universitaetsbibliothek Johann Christian Senckenberg
Bockenheimer Landstr. 134-138
60325 Frankfurt am Main
Tel. 069/798-39382
a.pachzelt@ub.uni-frankfurt.de<mailto:a.pachzelt@ub.uni-frankfurt.de>
-------------------------------------------------------