Mailing List Archive

[Bug 8026] New: t/extracttext.t tesseract test fails on some installations
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8026

Bug ID: 8026
Summary: t/extracttext.t tesseract test fails on some
installations
Product: Spamassassin
Version: 4.0.0
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P2
Component: Regression Tests
Assignee: dev@spamassassin.apache.org
Reporter: sidney@sidney.com
Target Milestone: Undefined

On my copy of FreeBSD 13.1-RELEASE installed on a VirtualBox VM with tesseract
5.1.0 installed from FreeBSD's pkg repository, test t/extracttext.t
consistently fails because tesseract reads the "XJ" characters in the test jpg
file as "X]J".

Recreating the test file using a font that is more tesseract-friendly seems to
help. Since the test is not intended to test the limits of tesseract's OCR
capabilities, this seems like a proper fix. I've redone the test data using Tex
Gyre Bonum font as per the results in https://superuser.com/a/1543382

--
You are receiving this mail because:
You are the assignee for the bug.