Ticket #957 (new Feature Request)

Opened 4 years ago

Last modified 4 years ago

Enable by text search in file contents across a file repository

Reported by: Malte Owned by:
Priority: Major Milestone:
Component: DeepaMehta Standard Distribution Version:
Keywords: Cc: jri, JuergeN, dgf
Complexity: 8 Area: Application Framework / API
Module: deepamehta-filemanager

Description (last modified by Malte) (diff)

When working with file repositories in deepamehta it would be very useful if one would be able to query the contents of all documents in the resp. file repository. This was possible in deepmehta3 and its couchdb storage layer (which i guess build on Apache Tika).

The Apache Tika Project Page is at:
https://tika.apache.org/

Java Example integrating Apache Tika with Lucene:
https://github.com/rickcrawford/lucene-example

Change History

comment:1 follow-up: ↓ 4 Changed 4 years ago by jri

Duplicate of #861?

comment:2 Changed 4 years ago by Malte

  • Description modified (diff)
  • Cc dgf added
  • Summary changed from Enable file name search across a file repository to Enable by text search in file contents across a file repository
  • Complexity changed from 3 to 8
  • Version 4.7 deleted
  • Type changed from Task to Feature Request

Thanks for the information this is indeed a duplicate. I will update the ticket description to be the home for another enhancement (which i was thrilled by as i saw/used it in deepamehta3).

comment:3 follow-up: ↓ 5 Changed 4 years ago by Malte

I wanted to post some questions regarding this issue since me and a friend are considering to develop this feature for a group of knowledge workers. The group first needs a search across an existing intranet filesystem (while continue to use it) but, in the end, they want to translate their written data from office documents into a structured database.

Can we as plugin developers currently

  1. leverage the dm4-storage layer in a way that we can make use of the existing lucene storage of dm4?
  2. to index the file contents does it makes sense to create and maintain a seperate index (like the spatial layer in neo4j with the dm4-geospatial module) and could we implement this right away?
  3. can we hook our search into the "By Text" search of the standard distro's webclient?
  4. how would a smart integration of "file topics" and lucene indexed "file contents" look like if you would envision it?

For now i just wanted to inform myself about the task and challenges ahead. Thanks for your help.

Nice greetings.

comment:4 in reply to: ↑ 1 Changed 4 years ago by jri

Replying to jri:

Duplicate of #861?

OK, this ticket (#957), is now about file content search.
#861 is about file name search.

comment:5 in reply to: ↑ 3 Changed 4 years ago by jri

Replying to Malte:

Can we as plugin developers currently ...

A plugin can get direct access to the underlying Neo4j database this way:

import org.neo4j.graphdb.GraphDatabaseService;

GraphDatabaseService neo4j = (GraphDatabaseService) dm4.getDatabaseVendorObject();

This gives you access to the neo4j instance as running by DM Core. From there you can do anything Neo4j, e.g. creating indexes.

You can also access the Neo4j Node underlying any DeepaMehtaObject:

import org.neo4j.graphdb.Node;

Node neo4jNode = (Node) anyDMObject.getDatabaseVendorObject()

These are the facilities e.g. the dm4-geospatial module uses to put the Neo4j Spatial library in charge.

To keep tasks separate I would start with implementing a new Webclient search mode as e.g. dm4-typesearch does: dm4-filesearch.

Note: See TracTickets for help on using tickets.