Ticket #958 (new Feature Request)

Opened 5 years ago

Apache Tika integration as it was working in dm3

Reported by: Malte Owned by:
Milestone: Release 4.8
Version: 4.7
Keywords: Cc: jri, JuergeN
Complexity: 8 Area:


In conjunction with improving access to file topics through deepamehta (957) the following came to my mind. In deepaMehta 3 documents in the file-repo were indexed and searchable through an apache tika integration and i found this a very valuable assistant when doing standard "office work" with deepamehta, but also researching scientific papers i archived.

At first PDF and other popular office formats would be nice.
Would Apache Tika still be a good choice/fit for realizing such a feature?

I realize that the new "file repo" concept probably is hard to align with this goal as DM 4 would need to maintain an index of files on the hard disk before they became (fist class dm objects) file topics.

