Mailing List Archive

Some questions about implementation of LevenshteinAutomata
Hi,
Recently,I have read the paper <<Fast String Correction with
Levenshtein-Automata>>. And then I tried to read its implementation of the
Lucene.But I found it's too difficult to figure it out.
I want to ask some questions about the implementation of
Levenshtein-Automata.

*1. What does state mean in LevenshteinAutomata?*
I found the comment of the state in Lev1ParametricDescription.java is like
below

> // state map
> // 0 -> [(0, 0)]
> // 1 -> [(0, 1)]
> // 2 -> [(0, 1), (1, 1)]
> // 3 -> [(0, 1), (1, 1), (2, 1)]
> // 4 -> [(0, 1), (2, 1)]
>
> Does state mean A^i, B^i .... in the paper?
eg.
0 -> A , 1 -> B, 2 -> C.
[image: ??2022-07-07 12.20.02.png]
If so, what does [(0, 0)] [(0, 1), (1, 1), (2, 1)] in the comments mean?

*2. Why can minErrors have a negative number?*

> Lev1ParametricDescription minErrors is [0, 1, 0, -1, -1]
>
> minErrors is used to calculate if a state is final or not.
the code is

boolean isAccept(int absState) {
// decode absState -> state, offset
int state = absState / (w + 1);
int offset = absState % (w + 1);
assert offset >= 0;
return w - offset + minErrors[state] <= n;
}

In the paper the final definition is wordlen - offset <=
max-edit-operations - current-used-edit-operations

So minErrrors should be the min current-used-edit-operations? But why
could it be negative?

*Final Question*
Any good resources could help me better understand the implementation?