Ticket #309 (closed Feature Request: fixed)

Opened 12 years ago

Last modified 12 years ago

string contains search instead of substring

Reported by: Malte Owned by: jri
Priority: Critical Milestone: Release 4.1
Component: 3rd Party Plugins Version: 4.0.11
Keywords: Cc: dgf
Complexity: 3 Area: Application Framework / API
Module:

Description

we need to find values by a "contains" search and not by a substring, where e.g. a topic named "Algebraische Kongruente Funktionen" would be returned as a search result for the query "Funktionen".

is this configurable atm or do we even need to implement our own search?

all the resp. fields we are searching through were modelled with the following declarative settings:

"index_mode_uris": ["dm4.core.fulltext", "dm4.core.fulltext_key"],

any hint appreciated.

Change History

comment:1 Changed 12 years ago by jri

Just to clarify: to you mean the DM Webclient, the Java API, or the REST API?

comment:2 Changed 12 years ago by Malte

Thanks for your quick reply. I am writing a new client where the search is currently based on dm4-webclients rest_client`s search_topics()-method which queries /core/topic?{parameter}

So, I want to do a "contains"-search (which searches for matches over topics of three different topic types), using the REST-API.

comment:3 Changed 12 years ago by jri

OK.
The REST API already supports what you call a "contains" search.
In fulltext-indexed values (dm4.core.fulltext or dm4.core.fulltext_key) you can search for every single word.
So, you *will* find "Algebraische Kongruente Funktionen" when you search for "Funktionen".
What looks your request like?

Example: create a Person with Last Name "Algebraische Kongruente Funktionen".
(I just use Person because its Last Name is exactly indexed like your fields.)
This request will find that Last Name topic:

GET /core/topic?search=Funktionen&field=dm4.contacts.last_name

BTW: the "wholeword" parameter is false by default. So this search is positive as well:

GET /core/topic?search=Funk&field=dm4.contacts.last_name

To match whole words only add wholeword=true to the request (no result here):

GET /core/topic?search=Funk&field=dm4.contacts.last_name&wholeword=true

Note: the "field" parameter is optional. When you omit it you will search in *all* fields indexed as dm4.core.fulltext
With the field parameter specified you search only in this field. This requires the index mode dm4.core.fulltext_key.
You can use both index modes at the same (as you do).

However, what you call "matches over topics of three different topic types" is not possible.
You have to perform 3 searches consecutively and then manually combine the result.

Hope this helps.

Last edited 12 years ago by jri (previous) (diff)

comment:4 Changed 12 years ago by jri

One more thing: what is supported is not exactly a "contains" search, but rather a "begin of each word" search.
A true "contains" search would be positive for "ktionen" as well. This is not supported by Lucene. You *can* use wildcards ("*", "?") but they are not allowed as first character of a word. (I don't think this is what you asked for either).

Last edited 12 years ago by jri (previous) (diff)

comment:5 Changed 12 years ago by Malte

Thanks for your insightful clarifications. I know see also the real issue underlying my assumption that this search is not supported yet, our usage of the REST-API is correct already but sometimes the search does not perform due to an internal server error. Here is what I found out. Maybe you can give me an idea of what happens here. Thanks very much in advance!

http://stable.eduzen.tu-berlin.de:8080/core/topic?search=Funktionen&field=tub.eduzen.excercise_name&whole_word=false

which results my dm-service to log:

18.09.2012 11:40:40 de.deepamehta.core.impl.service.EmbeddedService searchTopics
WARNUNG: ROLLBACK!
18.09.2012 11:40:40 com.sun.jersey.spi.container.ContainerResponse logException
SCHWERWIEGEND: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException: java.lang.RuntimeException: Searching topics failed (searchTerm="Funktionen", fieldUri="tub.eduzen.excercise_name", wholeWord=false, clientState={dm4_topicmap_id=10631, dm4_workspace_id=9676, mjx.fontWarn=warned%3Atrue, dm4_username=admin})
	at de.deepamehta.plugins.webservice.WebservicePlugin.searchTopics(WebservicePlugin.java:109)
	at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
	at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
	at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
	at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	at com.sun.jersey.server.impl.uri.rules.ResourceObjectRule.accept(ResourceObjectRule.java:100)
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
	at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
	at org.apache.felix.http.base.internal.handler.ServletHandler.doHandle(ServletHandler.java:96)
	at org.apache.felix.http.base.internal.handler.ServletHandler.handle(ServletHandler.java:79)
	at org.apache.felix.http.base.internal.dispatch.ServletPipeline.handle(ServletPipeline.java:42)
	at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:49)
	at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:33)
	at org.apache.felix.http.base.internal.dispatch.FilterPipeline.dispatch(FilterPipeline.java:48)
	at org.apache.felix.http.base.internal.dispatch.Dispatcher.dispatch(Dispatcher.java:39)
	at org.apache.felix.http.base.internal.DispatcherServlet.service(DispatcherServlet.java:67)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.RuntimeException: Searching topics failed (searchTerm="Funktionen", fieldUri="tub.eduzen.excercise_name", wholeWord=false, clientState={dm4_topicmap_id=10631, dm4_workspace_id=9676, mjx.fontWarn=warned%3Atrue, dm4_username=admin})
	at de.deepamehta.core.impl.service.EmbeddedService.searchTopics(EmbeddedService.java:152)
	at de.deepamehta.plugins.webservice.WebservicePlugin.searchTopics(WebservicePlugin.java:107)
	... 41 more
Caused by: java.lang.IllegalArgumentException: ID 26986 refers not to a MehtaNode but to a MehtaEdge
	at de.deepamehta.mehtagraph.impl.Neo4jBase.buildMehtaNode(Neo4jBase.java:71)
	at de.deepamehta.mehtagraph.impl.Neo4jMehtaGraph.queryMehtaNodes(Neo4jMehtaGraph.java:113)
	at de.deepamehta.core.impl.storage.MGStorageBridge.searchTopics(MGStorageBridge.java:146)
	at de.deepamehta.core.impl.service.EmbeddedService.searchTopics(EmbeddedService.java:147)
	... 42 more

while searchin for another topic type, same index mode, succeeds as described by you here.

http://stable.eduzen.tu-berlin.de:8080/core/topic?search=Graphen&field=tub.eduzen.excercise_description&wholeword=false

comment:6 Changed 12 years ago by jri

  • Cc dgf added

Oh, yes, that's a different story ;-)

I encounter this search issue too once in a while.
It looks like DM doesn't maintain the Lucene indexes properly in every situation. I guess ID 26986 is a stale index entry. The respective topic is meanwhile deleted and Neo4j reassigned that ID to another object, now to an DM association (MehtaEdge?).

Sorry, no solution for the moment.
I have to investigate it further.

Can you confirm that searching in tub.eduzen.excercise_name with other terms works correctly?

comment:7 Changed 12 years ago by jri

  • Status changed from new to closed
  • Resolution set to fixed

There is now a workaround in Neo4j MehtaGraph?. See #302.
This should work for the moment.

Note: See TracTickets for help on using tickets.