Ticket #302 (closed Defect: fixed)
"By-text" search doesn't work with special characters
Reported by: | joern | Owned by: | jri |
---|---|---|---|
Priority: | Critical | Milestone: | |
Component: | DeepaMehta Standard Distribution | Version: | 4.0.11 |
Keywords: | Cc: | dgf, Malte, tsc | |
Complexity: | 3 | Area: | |
Module: |
Description
When trying to search for e.g. latex-code, the webclient returns an error in some cases.
Example query:
$z_{1}{2}=\left(2\,\exp\left(i\,\frac{\pi}{4}\right)\right){2}=2{2}\,\left(\exp\left(i\,\frac{\pi}{4}\right)\right){2}=4\,\exp\left(2\, i\,\frac{\pi}{4}\right)=4\,\exp\left(i\,\frac{\pi}{2}\right)$
Error Msg:
ERROR while searching:
"RESTClientError: GET request failed (error: Internal Server Error)"
Change History
comment:4 Changed 12 years ago by jri
Puh, that's a wild search string ;-)
It distracts the Lucene (the search engine utilized by DM) query parser considerably.
In your search term you have to escape these characters when you want to search for them literally:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
These characters have special meanings for Lucene:
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/queryparsersyntax.html
Escaping works via backslash \ character.
So, when you want search for e.g. curly brace { you have to write it as
\{
To search for backslash \ itself:
\\
Also consider using "" when searching for phrases.
NOTE: the DM webclient passes the search term you entered almost literally to Lucene. Thus, you can e.g. work with wildcards and boolean operators. However, there is one exception: DM adds a * to your search term. This is intended to make begin-of-word searches easy.
To avoid the search problem you described DM could automatically escape all special characters. On the other hand this would prohibit the advanced query possibilities offered by Lucene. How to cope with these contradicting conditions and how to improve the Webclient's search GUI remains to be discussed. For the moment resort to escaping.
Thanks for your report!
comment:5 Changed 12 years ago by Jörg Richter
Core fix: Search with Lucene query syntax (#302).
In particular phrase search ("..") and escaping (\) are working.
Note: the Webclient's default search is now a whole-word search.
For a begin-of-word search you have to add the asterisk * manually, e.g. Hell*
IMPORTANT for plugin developers: the "wholeword" parameter is globally dropped.
Core Service Java API:
- dms.searchTopics() has no "wholeWord" parameter anymore
Core REST API:
- GET /core/topic expects no "wholeword" query parameter anymore
RESTClient:
- dm4c.restc.search_topics() has no "whole_word" and no "index" parameters anymore
Webclient REST API:
- GET /webclient/search expects no "wholeword" query parameter anymore
RESTClient:
- dm4c.restc.search_topics_and_create_bucket() has no "whole_word" parameter anymore
See ticket 302.
comment:6 Changed 12 years ago by joern
- Status changed from new to assigned
- Priority changed from Major to Critical
- Owner set to jri
I found another huge problem with the search. This probably just emerged with the update:
I get Rest API errors when searching for "komplexe zahlen", "komplex" and "zahlen". Even more strange, a query for "komplexezahlen" works fine. I changed the Ticket to critical, because we will need to use the search extensively.
comment:7 Changed 12 years ago by Jörg Richter
Deploy Neo4j MehtaGraph? 1.1-SNAPSHOT (#302).
The parent POM is updated to version 4 in order to deploy a fixed Neo4j MehtaGraph?.
This error doesn't appear anymore when searching:
java.lang.RuntimeException: Searching topics failed (searchTerm="komplexe", fieldUri="null", clientState={dm4_topicmap_id=10631, JSESSIONID=17vsf61a7y90guti3xxblerxc, dm4_workspace_id=9676}) at de.deepamehta.core.impl.service.EmbeddedService.searchTopics(EmbeddedService.java:145) at de.deepamehta.plugins.webclient.WebclientPlugin.searchTopics(WebclientPlugin.java:79) ... 47 more Caused by: java.lang.IllegalArgumentException: Reference error: ID 76173 refers to a MehtaEdge when the caller expects a MehtaNode at de.deepamehta.mehtagraph.impl.Neo4jBase.buildMehtaNode(Neo4jBase.java:71) at de.deepamehta.mehtagraph.impl.Neo4jMehtaGraph.queryMehtaNodes(Neo4jMehtaGraph.java:113) at de.deepamehta.core.impl.storage.MGStorageBridge.searchTopics(MGStorageBridge.java:143) at de.deepamehta.core.impl.service.EmbeddedService.searchTopics(EmbeddedService.java:140)
At client-side this error appears as
ERROR while searching: "RESTClientError: GET request failed (error: Internal Server Error)"
As a workaround MehtaEdge? results are just dropped.
The underlying problem remains to be investigated.
See ticket 302.