Ticket #958 (new Feature Request)
Opened 8 years ago
Apache Tika integration as it was working in dm3
Reported by: | Malte | Owned by: | |
---|---|---|---|
Priority: | Major | Milestone: | Release 4.8 |
Component: | DeepaMehta Standard Distribution | Version: | 4.7 |
Keywords: | Cc: | jri, JuergeN | |
Complexity: | 8 | Area: | |
Module: |
Description
In conjunction with improving access to file topics through deepamehta (957) the following came to my mind. In deepaMehta 3 documents in the file-repo were indexed and searchable through an apache tika integration and i found this a very valuable assistant when doing standard "office work" with deepamehta, but also researching scientific papers i archived.
At first PDF and other popular office formats would be nice.
Would Apache Tika still be a good choice/fit for realizing such a feature?
I realize that the new "file repo" concept probably is hard to align with this goal as DM 4 would need to maintain an index of files on the hard disk before they became (fist class dm objects) file topics.