I am creating a custom Pattern Tokenizer to change the type of the
generated tokens. By incrementToken() function looks like the below code:
public boolean incrementToken() {
if (index >= str.length()) return false;
clearAttributes();
if (group >= 0) {
// match a specific group
while (matcher.find()) {
index = matcher.start(group);
final int endIndex = matcher.end(group);
if (index == endIndex) continue;
termAtt.setEmpty().append(str, index, endIndex);
offsetAtt.setOffset(correctOffset(index), correctOffset(endIndex));
//Changing Token Type based on the pattern matcher
Pattern pattern = Pattern.compile("\\p{Alnum}+");
Matcher matcher = pattern.matcher(input.toString());
boolean matchFound = matcher.find();
if (matchFound) {
typeAttribute.setType("some_random_type".toLowerCase());
}
return true;
}
}
}
I'm trying to change the type of the generated tokens based on the
condition that whenever the token encounters a particular regex, using the
typeAttribute, the type of the token should be changed. Here, I am using
the pattern "\p{Alnum}+", so whenever there is an alphanumeric token, its
type should be changed.
Currently, I am getting the token as:
"tokens" : [. { "token" : "testing", "start_offset" : 0, "end_offset" : 7,
"type" : "word", "position" : 0 }, ]
I want the above token to be like:
"tokens" : [. { "token" : "testing", "start_offset" : 0, "end_offset" : 7,
"type" : "some_random_type", "position" : 0 }, ]
Since the token matches with the pattern "\p{Alnum}+", the type of the
token should be changed to the type specified inside the
"typeAttribute.setType."
But, the code that I have done is spitting out all the tokens of the type
"some_random_type." If any token is not being matched with the pattern
"\p{Alnum}+", it is also getting the type "some_random_type".
How can I make only the specific tokens get the type "some_random_type"
which matches the pattern "some_random_type".
generated tokens. By incrementToken() function looks like the below code:
public boolean incrementToken() {
if (index >= str.length()) return false;
clearAttributes();
if (group >= 0) {
// match a specific group
while (matcher.find()) {
index = matcher.start(group);
final int endIndex = matcher.end(group);
if (index == endIndex) continue;
termAtt.setEmpty().append(str, index, endIndex);
offsetAtt.setOffset(correctOffset(index), correctOffset(endIndex));
//Changing Token Type based on the pattern matcher
Pattern pattern = Pattern.compile("\\p{Alnum}+");
Matcher matcher = pattern.matcher(input.toString());
boolean matchFound = matcher.find();
if (matchFound) {
typeAttribute.setType("some_random_type".toLowerCase());
}
return true;
}
}
}
I'm trying to change the type of the generated tokens based on the
condition that whenever the token encounters a particular regex, using the
typeAttribute, the type of the token should be changed. Here, I am using
the pattern "\p{Alnum}+", so whenever there is an alphanumeric token, its
type should be changed.
Currently, I am getting the token as:
"tokens" : [. { "token" : "testing", "start_offset" : 0, "end_offset" : 7,
"type" : "word", "position" : 0 }, ]
I want the above token to be like:
"tokens" : [. { "token" : "testing", "start_offset" : 0, "end_offset" : 7,
"type" : "some_random_type", "position" : 0 }, ]
Since the token matches with the pattern "\p{Alnum}+", the type of the
token should be changed to the type specified inside the
"typeAttribute.setType."
But, the code that I have done is spitting out all the tokens of the type
"some_random_type." If any token is not being matched with the pattern
"\p{Alnum}+", it is also getting the type "some_random_type".
How can I make only the specific tokens get the type "some_random_type"
which matches the pattern "some_random_type".