Mailing List Archive

Question Regarding Computing Vocabulary Size
Hello, I am a master student currently working on a search engine project
on BM25similarity. My question is about computing the length of vocabulary
size of a single document. I have looked through the code base but has not
found anything useful for that specific application. I am wondering if
there is a way to compute specifically the length of the set of distinct
terms for a single document? Please let me know if you can help me with
this. Many thanks.