Ticket #776 (closed Enhancement: worksforme)

Opened 7 years ago

Last modified 7 years ago

dm4-wdtk: for bulk import, allow periodic writing of new lucene indexes

Reported by: Malte Owned by:
Priority: Major Milestone: Release 4.5
Component: DeepaMehta Standard Distribution Version: 4.5
Keywords: Cc: jri
Complexity: 3 Area: Application Framework / API
Module: deepamehta-core

Description

Concerning:
http://wikidata-topics.beta.wmflabs.org/de.deepamehta.webclient

Source Code:
https://github.com/mukil/dm4-wikidata-toolkit

I have written a naive import-mechanism for wikidata dumps and deepamehta4. After running over the wikidata json-dump for 10seconds, this process starts.

INFO: Processed 328500 items from the wikidata json-dump.
Mar 02, 2015 5:43:34 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor printProcessingStatus
INFO: Identified 91695 human beings, 2714 institutions, 3128 cities and 47 countries
Mar 02, 2015 5:43:34 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor stop
INFO: ### Start creating Topics ..

Then, later, the following error is thrown and no single topic was imported/written to db.

### Starting to process dump for 10secs .. on a fresh database.
Mar 14, 2015 5:09:17 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor printProcessingStatus
INFO: Processed 23900 items from the wikidata json-dump.
Mar 14, 2015 5:09:17 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor printProcessingStatus
INFO: Identified 8611 human beings, 198 institutions, 589 cities and 38 countries.
Mar 14, 2015 5:09:17 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor stop
INFO: ### Start creating Topics ...
...
WARNING: object 369252 (typeUri="dm4.core.aggregation") is not assigned to any workspace -- READ permission is granted
Mar 14, 2015 5:58:12 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor stop
INFO: Skipping the import of relations, this works just with (and among) already imported items.
...
### Starting to process dump for 10secs .. for the second time on a database.
Mar 14, 2015 6:09:51 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor printProcessingStatus
INFO: Processed 30400 items from the wikidata json-dump.
Mar 14, 2015 6:09:51 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor printProcessingStatus
INFO: Identified 13160 human beings, 219 institutions, 634 cities and 38 countries.
Mar 14, 2015 6:09:51 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor stop
INFO: ### Start creating Topics ..
...
INFO: Created new "org.deepamehta.wikidata.mentor_of" relationship for Q65466 to 366694 (Heinrich Magnus) with GUID: "Q65466$028350e2-4545-9dc2-ac05-bce12837b281"
Mar 14, 2015 6:25:06 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor stop
INFO: DeepaMehta now recognizes 13160 human beings, 219 institutions, 634 cities, 38 countries by name.
Additionally DeepaMehta recorded 8399 associations among these items.
Mar 14, 2015 6:25:06 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor stop
INFO: Finished importing.
...
### Starting to process dump for 30secs .. for third time on the database.
...
INFO: Processed 92300 items from the wikidata json-dump.
Mar 14, 2015 6:53:04 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor printProcessingStatus
INFO: Identified 30436 human beings, 720 institutions, 1228 cities and 44 countries.
Mar 14, 2015 6:53:04 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor countEntity
INFO: Timeout. Aborting processing.

(approx. 80 minutes later between each "getAssocations()"-call there are 7 seconds pause, in which a person topic gets created)

Mar 14, 2015 8:17:44 PM de.deepamehta.core.impl.EmbeddedService getAssociations
INFO: topic1Id=1031830, topic2Id=3182, assocTypeUri="dm4.core.aggregation"
Mar 14, 2015 8:17:44 PM de.deepamehta.core.impl.AccessControlImpl hasPermission
WARNING: object 1031869 (typeUri="dm4.core.aggregation") is not assigned to any workspace -- READ permission is granted
Mar 14, 2015 8:17:50 PM de.deepamehta.core.impl.EmbeddedService getAssociations
INFO: topic1Id=1031871, topic2Id=3182, assocTypeUri="dm4.core.aggregation"
Mar 14, 2015 8:17:50 PM de.deepamehta.core.impl.AccessControlImpl hasPermission
WARNING: object 1031910 (typeUri="dm4.core.aggregation") is not assigned to any workspace -- READ permission is granted
Mar 14, 2015 8:17:57 PM de.deepamehta.core.impl.EmbeddedService getAssociations
INFO: topic1Id=1031912, topic2Id=3182, assocTypeUri="dm4.core.aggregation"
Mar 14, 2015 8:17:57 PM de.deepamehta.core.impl.AccessControlImpl hasPermission
WARNING: object 1031951 (typeUri="dm4.core.aggregation") is not assigned to any workspace -- READ permission is granted
Mar 14, 2015 8:18:04 PM de.deepamehta.core.impl.EmbeddedService getAssociations
INFO: topic1Id=1031953, topic2Id=3182, assocTypeUri="dm4.core.aggregation"
Mar 14, 2015 8:18:04 PM de.deepamehta.core.impl.AccessControlImpl hasPermission
WARNING: object 1031992 (typeUri="dm4.core.aggregation") is not assigned to any workspace -- READ permission is granted
Mar 14, 2015 8:18:09 PM de.deepamehta.core.impl.EmbeddedService getAssociations
INFO: topic1Id=1031994, topic2Id=3182, assocTypeUri="dm4.core.aggregation"
Mar 14, 2015 8:18:09 PM de.deepamehta.core.impl.AccessControlImpl hasPermission

(until after 140 minutes the following two error start to occur repeatingly... )

...
WARNING: object 1032033 (typeUri="dm4.core.aggregation") is not assigned to any workspace -- READ permission is granted
Mar 14, 2015 9:00:45 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor createPersonTopic
WARNING: Creating topic failed (topic (id=-1, uri="org.wikidata.entity.Q277049", typeUri="dm4.contacts.person", value="", childTopics={dm4.contacts.notes=topic (id=-1, uri="null", typeUri="dm4.contacts.notes", value="<p>Polish swimmer</p><p class="wd-item-footer">For more infos visit this items <a href="http://www.wikidata.org./entity/Q277049" target="_blank" title="Wikidata page for this item">page</a> on wikidata.org.</p>", childTopics={}), dm4.contacts.person_name=topic (id=-1, uri="null", typeUri="dm4.contacts.person_name", value="null", childTopics={dm4.contacts.first_name=topic (id=-1, uri="null", typeUri="dm4.contacts.first_name", value="Artur", childTopics={}), dm4.contacts.last_name=topic (id=-1, uri="null", typeUri="dm4.contacts.last_name", value="Wojdat", childTopics={})})}))
java.lang.RuntimeException: Creating topic failed (topic (id=-1, uri="org.wikidata.entity.Q277049", typeUri="dm4.contacts.person", value="", childTopics={dm4.contacts.notes=topic (id=-1, uri="null", typeUri="dm4.contacts.notes", value="<p>Polish swimmer</p><p class="wd-item-footer">For more infos visit this items <a href="http://www.wikidata.org./entity/Q277049" target="_blank" title="Wikidata page for this item">page</a> on wikidata.org.</p>", childTopics={}), dm4.contacts.person_name=topic (id=-1, uri="null", typeUri="dm4.contacts.person_name", value="null", childTopics={dm4.contacts.first_name=topic (id=-1, uri="null", typeUri="dm4.contacts.first_name", value="Artur", childTopics={}), dm4.contacts.last_name=topic (id=-1, uri="null", typeUri="dm4.contacts.last_name", value="Wojdat", childTopics={})})}))
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:718)
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:155)
	at org.deepamehta.plugins.wdtk.WikidataEntityProcessor.createPersonTopic(WikidataEntityProcessor.java:494)
	at org.deepamehta.plugins.wdtk.WikidataEntityProcessor.stop(WikidataEntityProcessor.java:710)
	at org.deepamehta.plugins.wdtk.WikidataToolkitPlugin.startProcesssingWikidataDumpfile(WikidataToolkitPlugin.java:193)
	at org.deepamehta.plugins.wdtk.WikidataToolkitPlugin.importWikidataEntities(WikidataToolkitPlugin.java:149)
	at org.deepamehta.plugins.wdtk.WikidataToolkitPlugin.importEntitiesFromWikidataDump(WikidataToolkitPlugin.java:82)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
	at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
	at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
	at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	at com.sun.jersey.server.impl.uri.rules.ResourceObjectRule.accept(ResourceObjectRule.java:100)
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1480)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1411)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1360)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1350)
	at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
	at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:69)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
	at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:240)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:77)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:370)
	at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
	at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
	at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot flush
	at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3599)
	at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3588)
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:452)
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
	at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:428)
	at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:448)
	at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:391)
	at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:497)
	at org.apache.lucene.index.IndexReader.reopen(IndexReader.java:681)
	at org.neo4j.index.impl.lucene.FullTxData.searcher(FullTxData.java:348)
	at org.neo4j.index.impl.lucene.FullTxData.internalQuery(FullTxData.java:199)
	at org.neo4j.index.impl.lucene.FullTxData.get(FullTxData.java:410)
	at org.neo4j.index.impl.lucene.TxDataHolder.get(TxDataHolder.java:56)
	at org.neo4j.index.impl.lucene.LuceneTransaction.getAddedIds(LuceneTransaction.java:214)
	at org.neo4j.index.impl.lucene.LuceneIndex.query(LuceneIndex.java:253)
	at org.neo4j.index.impl.lucene.LuceneIndex.get(LuceneIndex.java:207)
	at de.deepamehta.storage.neo4j.Neo4jStorage.checkUriUniqueness(Neo4jStorage.java:1113)
	at de.deepamehta.storage.neo4j.Neo4jStorage.storeTopic(Neo4jStorage.java:156)
	at de.deepamehta.core.impl.StorageDecorator.storeTopic(StorageDecorator.java:100)
	at de.deepamehta.core.impl.EmbeddedService.topicFactory(EmbeddedService.java:739)
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:714)
	... 52 more
Mar 14, 2015 9:01:27 PM org.deepamehta.plugins.wdtk.WikidataEntityProcessor createPersonTopic
WARNING: Creating topic failed (topic (id=1044697, uri="org.wikidata.entity.Q303259", typeUri="dm4.contacts.person", value="", childTopics={dm4.contacts.notes=topic (id=-1, uri="null", typeUri="dm4.contacts.notes", value="<p>Figure skater</p><p class="wd-item-footer">For more infos visit this items <a href="http://www.wikidata.org./entity/Q303259" target="_blank" title="Wikidata page for this item">page</a> on wikidata.org.</p>", childTopics={}), dm4.contacts.person_name=topic (id=1044698, uri="", typeUri="dm4.contacts.person_name", value="", childTopics={dm4.contacts.first_name=topic (id=1044699, uri="", typeUri="dm4.contacts.first_name", value="Sergei", childTopics={}), dm4.contacts.last_name=topic (id=-1, uri="null", typeUri="dm4.contacts.last_name", value="Sakhnovski", childTopics={})})}))
java.lang.RuntimeException: Creating topic failed (topic (id=1044697, uri="org.wikidata.entity.Q303259", typeUri="dm4.contacts.person", value="", childTopics={dm4.contacts.notes=topic (id=-1, uri="null", typeUri="dm4.contacts.notes", value="<p>Figure skater</p><p class="wd-item-footer">For more infos visit this items <a href="http://www.wikidata.org./entity/Q303259" target="_blank" title="Wikidata page for this item">page</a> on wikidata.org.</p>", childTopics={}), dm4.contacts.person_name=topic (id=1044698, uri="", typeUri="dm4.contacts.person_name", value="", childTopics={dm4.contacts.first_name=topic (id=1044699, uri="", typeUri="dm4.contacts.first_name", value="Sergei", childTopics={}), dm4.contacts.last_name=topic (id=-1, uri="null", typeUri="dm4.contacts.last_name", value="Sakhnovski", childTopics={})})}))
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:718)
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:155)
	at org.deepamehta.plugins.wdtk.WikidataEntityProcessor.createPersonTopic(WikidataEntityProcessor.java:494)
	at org.deepamehta.plugins.wdtk.WikidataEntityProcessor.stop(WikidataEntityProcessor.java:710)
	at org.deepamehta.plugins.wdtk.WikidataToolkitPlugin.startProcesssingWikidataDumpfile(WikidataToolkitPlugin.java:193)
	at org.deepamehta.plugins.wdtk.WikidataToolkitPlugin.importWikidataEntities(WikidataToolkitPlugin.java:149)
	at org.deepamehta.plugins.wdtk.WikidataToolkitPlugin.importEntitiesFromWikidataDump(WikidataToolkitPlugin.java:82)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
	at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
	at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
	at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	at com.sun.jersey.server.impl.uri.rules.ResourceObjectRule.accept(ResourceObjectRule.java:100)
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1480)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1411)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1360)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1350)
	at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
	at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:69)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
	at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:240)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:77)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:370)
	at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
	at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
	at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Storing the child topics of object 1044697 failed ({dm4.contacts.notes=topic (id=-1, uri="null", typeUri="dm4.contacts.notes", value="<p>Figure skater</p><p class="wd-item-footer">For more infos visit this items <a href="http://www.wikidata.org./entity/Q303259" target="_blank" title="Wikidata page for this item">page</a> on wikidata.org.</p>", childTopics={}), dm4.contacts.person_name=topic (id=1044698, uri="", typeUri="dm4.contacts.person_name", value="", childTopics={dm4.contacts.first_name=topic (id=1044699, uri="", typeUri="dm4.contacts.first_name", value="Sergei", childTopics={}), dm4.contacts.last_name=topic (id=-1, uri="null", typeUri="dm4.contacts.last_name", value="Sakhnovski", childTopics={})})})
	at de.deepamehta.core.impl.ValueStorage.storeChildTopics(ValueStorage.java:277)
	at de.deepamehta.core.impl.ValueStorage.storeValue(ValueStorage.java:107)
	at de.deepamehta.core.impl.EmbeddedService.topicFactory(EmbeddedService.java:740)
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:714)
	... 52 more
Caused by: java.lang.RuntimeException: Creating topic failed (topic (id=1044698, uri="", typeUri="dm4.contacts.person_name", value="", childTopics={dm4.contacts.first_name=topic (id=1044699, uri="", typeUri="dm4.contacts.first_name", value="Sergei", childTopics={}), dm4.contacts.last_name=topic (id=-1, uri="null", typeUri="dm4.contacts.last_name", value="Sakhnovski", childTopics={})}))
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:718)
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:155)
	at de.deepamehta.core.impl.ValueStorage.storeCompositionOne(ValueStorage.java:310)
	at de.deepamehta.core.impl.ValueStorage.storeChildTopics(ValueStorage.java:290)
	at de.deepamehta.core.impl.ValueStorage.storeChildTopics(ValueStorage.java:274)
	... 55 more
Caused by: java.lang.RuntimeException: Storing the child topics of object 1044698 failed ({dm4.contacts.first_name=topic (id=1044699, uri="", typeUri="dm4.contacts.first_name", value="Sergei", childTopics={}), dm4.contacts.last_name=topic (id=-1, uri="null", typeUri="dm4.contacts.last_name", value="Sakhnovski", childTopics={})})
	at de.deepamehta.core.impl.ValueStorage.storeChildTopics(ValueStorage.java:277)
	at de.deepamehta.core.impl.ValueStorage.storeValue(ValueStorage.java:107)
	at de.deepamehta.core.impl.EmbeddedService.topicFactory(EmbeddedService.java:740)
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:714)
	... 59 more
Caused by: java.lang.RuntimeException: Creating topic failed (topic (id=1044699, uri="", typeUri="dm4.contacts.first_name", value="Sergei", childTopics={}))
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:718)
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:155)
	at de.deepamehta.core.impl.ValueStorage.storeCompositionOne(ValueStorage.java:310)
	at de.deepamehta.core.impl.ValueStorage.storeChildTopics(ValueStorage.java:290)
	at de.deepamehta.core.impl.ValueStorage.storeChildTopics(ValueStorage.java:274)
	... 62 more
Caused by: java.lang.RuntimeException: Associating topic 1044699 with topic type "dm4.contacts.first_name" failed
	at de.deepamehta.core.impl.EmbeddedService.createTopicInstantiation(EmbeddedService.java:594)
	at de.deepamehta.core.impl.EmbeddedService.topicFactory(EmbeddedService.java:741)
	at de.deepamehta.core.impl.EmbeddedService.createTopic(EmbeddedService.java:714)
	... 66 more
Caused by: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot flush
	at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3599)
	at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3588)
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:452)
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
	at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:428)
	at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:448)
	at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:391)
	at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:497)
	at org.apache.lucene.index.IndexReader.reopen(IndexReader.java:681)
	at org.neo4j.index.impl.lucene.FullTxData.searcher(FullTxData.java:348)
	at org.neo4j.index.impl.lucene.FullTxData.internalQuery(FullTxData.java:199)
	at org.neo4j.index.impl.lucene.FullTxData.get(FullTxData.java:410)
	at org.neo4j.index.impl.lucene.TxDataHolder.get(TxDataHolder.java:56)
	at org.neo4j.index.impl.lucene.LuceneTransaction.getAddedIds(LuceneTransaction.java:214)
	at org.neo4j.index.impl.lucene.LuceneIndex.query(LuceneIndex.java:253)
	at org.neo4j.index.impl.lucene.LuceneIndex.get(LuceneIndex.java:207)
	at de.deepamehta.storage.neo4j.Neo4jStorage.fetchTopicNodeByUri(Neo4jStorage.java:1024)
	at de.deepamehta.storage.neo4j.Neo4jStorage.fetchTopicPlayerNode(Neo4jStorage.java:937)
	at de.deepamehta.storage.neo4j.Neo4jStorage.fetchPlayerNode(Neo4jStorage.java:927)
	at de.deepamehta.storage.neo4j.Neo4jStorage.storePlayerRelationship(Neo4jStorage.java:917)
	at de.deepamehta.storage.neo4j.Neo4jStorage.storeAssociation(Neo4jStorage.java:275)
	at de.deepamehta.core.impl.StorageDecorator.storeAssociation(StorageDecorator.java:269)
	at de.deepamehta.core.impl.EmbeddedService.createTopicInstantiation(EmbeddedService.java:590)
	... 68 more

As jri once already mentioned: A mechanism for bulk-importing topics and associations might be needed to prevent errors like this, or? Maybe we could/should need periodic write operations to prevent errors like this (unable to write more indexes).

Question: May the writing of new lucene indexes be related to unfinished/open transactions? Please note that all 100 000 topics get created within a GET request (which itself is annotated as @Transactional).

What do you think?
Thanks for your time & Cheers!

Change History

comment:1 Changed 7 years ago by Malte

Having filelogging only configured, i found another exception thrown to the gogo shell.

g! Exception in thread "Lucene Merge Thread #5962" org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:89)
        at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:62)
        at org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132)
        at org.apache.lucene.store.RAMOutputStream.writeBytes(RAMOutputStream.java:118)
        at org.apache.lucene.index.TermInfosWriter.writeTerm(TermInfosWriter.java:232)
        at org.apache.lucene.index.TermInfosWriter.add(TermInfosWriter.java:196)
        at org.apache.lucene.index.FormatPostingsDocsWriter.finish(FormatPostingsDocsWriter.java:123)
        at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:614)
        at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:541)
        at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:473)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4295)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3940)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
Exception in thread "Lucene Merge Thread #1121" org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:89)
        at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:62)
        at org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132)
        at org.apache.lucene.store.RAMOutputStream.writeByte(RAMOutputStream.java:107)
        at org.apache.lucene.store.DataOutput.writeVInt(DataOutput.java:74)
        at org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:98)
        at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:593)
        at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:541)
        at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:473)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4295)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3940)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

May it help us to better track down the issue.

comment:2 Changed 7 years ago by Malte

Please also note that running this plugin (or at least compiling it) requires to run DM 4 with Java 7 since the WDTK in use (https://github.com/Wikidata/Wikidata-Toolkit) depends on that (see also https://www.mediawiki.org/wiki/Wikidata_Toolkit).

BTW: Compiling and running DM 4 with Java 7 doesn't seem to be a problem in itself, operation looks smooth.

comment:3 Changed 7 years ago by jri

Yes, when creating 100.000 topics in a single transaction I can imagine that you run out-of-memory as the transaction log becomes huge.

Try this: avoid DM from creating the per-request transaction by removing the @Transactional annotation from your resource method. Instead create the transactions manually (via dms.beginTx()), for example one for every single topic.

The bulk updates I've once mentioned (#325) are related to /Migrations/. There the developer has no chance to avoid DM from creating the outer transaction. In contrast, in your case -- a resource method -- you have.

Last edited 7 years ago by jri (previous) (diff)

comment:4 Changed 7 years ago by Malte

  • Status changed from new to closed
  • Resolution set to worksforme

I changed the implementation according to your hint. The endurance test is yet to be done but i am very optimistic that this will solve the issue i described. The first release of this new wikidata-plugin is therefore about to come.

Thanks!

Note: See TracTickets for help on using tickets.