Ticket #302 (closed Defect: fixed)

Opened 12 years ago

Last modified 12 years ago

"By-text" search doesn't work with special characters

Reported by: joern Owned by: jri
Priority: Critical Milestone:
Component: DeepaMehta Standard Distribution Version: 4.0.11
Keywords: Cc: dgf, Malte, tsc
Complexity: 3 Area:
Module:

Description

When trying to search for e.g. latex-code, the webclient returns an error in some cases.

Example query:

$z_{1}{2}=\left(2\,\exp\left(i\,\frac{\pi}{4}\right)\right){2}=2{2}\,\left(\exp\left(i\,\frac{\pi}{4}\right)\right){2}=4\,\exp\left(2\, i\,\frac{\pi}{4}\right)=4\,\exp\left(i\,\frac{\pi}{2}\right)$

Error Msg:

ERROR while searching:

"RESTClientError: GET request failed (error: Internal Server Error)"

Change History

comment:1 Changed 12 years ago by jri

  • Cc dgf, Malte added

comment:2 Changed 12 years ago by jri

  • Cc Malt, tsc added; Malte removed

comment:3 Changed 12 years ago by jri

  • Cc Malte added; Malt removed

comment:4 Changed 12 years ago by jri

Puh, that's a wild search string ;-)
It distracts the Lucene (the search engine utilized by DM) query parser considerably.

In your search term you have to escape these characters when you want to search for them literally:

+  -  &&  ||  !  (  )  {  }  [  ]  ^  "  ~  *  ?  :  \

These characters have special meanings for Lucene:
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/queryparsersyntax.html

Escaping works via backslash \ character.
So, when you want search for e.g. curly brace { you have to write it as

\{

To search for backslash \ itself:

\\

Also consider using "" when searching for phrases.

NOTE: the DM webclient passes the search term you entered almost literally to Lucene. Thus, you can e.g. work with wildcards and boolean operators. However, there is one exception: DM adds a * to your search term. This is intended to make begin-of-word searches easy.

To avoid the search problem you described DM could automatically escape all special characters. On the other hand this would prohibit the advanced query possibilities offered by Lucene. How to cope with these contradicting conditions and how to improve the Webclient's search GUI remains to be discussed. For the moment resort to escaping.

Thanks for your report!

comment:5 Changed 12 years ago by Jörg Richter

Core fix: Search with Lucene query syntax (#302).

In particular phrase search ("..") and escaping (\) are working.

Note: the Webclient's default search is now a whole-word search.
For a begin-of-word search you have to add the asterisk * manually, e.g. Hell*

IMPORTANT for plugin developers: the "wholeword" parameter is globally dropped.

Core Service Java API:

  • dms.searchTopics() has no "wholeWord" parameter anymore

Core REST API:

  • GET /core/topic expects no "wholeword" query parameter anymore

RESTClient:

  • dm4c.restc.search_topics() has no "whole_word" and no "index" parameters anymore

Webclient REST API:

  • GET /webclient/search expects no "wholeword" query parameter anymore

RESTClient:

  • dm4c.restc.search_topics_and_create_bucket() has no "whole_word" parameter anymore

See ticket 302.

comment:6 Changed 12 years ago by joern

  • Status changed from new to assigned
  • Priority changed from Major to Critical
  • Owner set to jri

I found another huge problem with the search. This probably just emerged with the update:

I get Rest API errors when searching for "komplexe zahlen", "komplex" and "zahlen". Even more strange, a query for "komplexezahlen" works fine. I changed the Ticket to critical, because we will need to use the search extensively.

comment:7 Changed 12 years ago by Jörg Richter

Deploy Neo4j MehtaGraph? 1.1-SNAPSHOT (#302).

The parent POM is updated to version 4 in order to deploy a fixed Neo4j MehtaGraph?.

This error doesn't appear anymore when searching:

java.lang.RuntimeException: Searching topics failed (searchTerm="komplexe", fieldUri="null", clientState={dm4_topicmap_id=10631, JSESSIONID=17vsf61a7y90guti3xxblerxc, dm4_workspace_id=9676})
	at de.deepamehta.core.impl.service.EmbeddedService.searchTopics(EmbeddedService.java:145)
	at de.deepamehta.plugins.webclient.WebclientPlugin.searchTopics(WebclientPlugin.java:79)
	... 47 more
Caused by: java.lang.IllegalArgumentException: Reference error: ID 76173 refers to a MehtaEdge when the caller expects a MehtaNode
    at de.deepamehta.mehtagraph.impl.Neo4jBase.buildMehtaNode(Neo4jBase.java:71)
    at de.deepamehta.mehtagraph.impl.Neo4jMehtaGraph.queryMehtaNodes(Neo4jMehtaGraph.java:113)
    at de.deepamehta.core.impl.storage.MGStorageBridge.searchTopics(MGStorageBridge.java:143)
    at de.deepamehta.core.impl.service.EmbeddedService.searchTopics(EmbeddedService.java:140)

At client-side this error appears as

ERROR while searching:

"RESTClientError: GET request failed (error: Internal Server Error)"

As a workaround MehtaEdge? results are just dropped.
The underlying problem remains to be investigated.

See ticket 302.

comment:8 Changed 12 years ago by jri

  • Status changed from assigned to accepted

The Search should now be fixed on the EduZEN installation.
Please check it out.

comment:9 Changed 12 years ago by jri

  • Status changed from accepted to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.