Mailing List Archive

Indexing Adobe Captivate content?
Hello,

I'm quite new to Solr, but I'm wondering if it can help me out with
indexing training content on my LMS (moodle). The catch is that many of our
training modules are created in Adobe Captivate, which means they are
basically zip files with HTML inside (some use Flash, but we can limit it
to HTML5 if that makes things easier).

I've been reading up on whether/how one could index this sort of content in
Solr, and... I'm still a bit confused. It looks like Solr Cell
<https://wiki.apache.org/solr/ExtractingRequestHandler> *could* do this, as
it can handle compressed formats like Word docs, but there's a Jira issue
from several years ago <https://issues.apache.org/jira/browse/SOLR-2416>
which purports to contain a patch for letting it index zipfiles more
generally that's still in state Open, so... I guess Word is a special case?

Basically, does anyone have experience with this sort of thing and, if it's
possible, are there any examples or more specific instructions I should
look at?

Thanks!
--Brad