Ticket #913 (closed Defect: worksforme)

Opened 5 years ago

Last modified 5 years ago

seting up the dm4 database on a NFS looks to be problematic

Reported by: Malte Owned by:
Priority: Major Milestone: Release 4.8
Component: DeepaMehta Standard Distribution Version: 4.7
Keywords: Cc: jri, dgf, JuergeN
Complexity: 3 Area:
Module: deepamehta-storage-neo4j

Description

In an environment where resources like e.g the hard disk is mounted via NFS, neo4j might throw some I/O Errors and thus the platform needs an immediate restart.

Here is the responsible line for the file system where the deepamehta-db folder is located, taken from t "/etc/fstab/" output.

store.svc.fue.xyznet:/project/some-project/home	/home	nfs	rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,lookupcache=none,nofsc	0	0

I read a bit about each of the the mounting options present here but none of those seem related to the deployment requirements mentioned at http://neo4j.com/docs/1.9.9/deployment-requirements.html

For proper ACID behavior, the filesystem must support flush (fsync, fdatasync).
Minimum
ext4 (or similar)
Recommended
ext4, ZFS
Nov 24, 2015 9:38:34 AM de.deepamehta.core.impl.AttachedTopic _update
INFO: Updating topic 32633 (new topic (id=-1, uri="null", typeUri="dm4.topicmaps.translation", value="null", childTopics={dm4.topicmaps.translation_y=topic (id=-1, uri="null", typeUri="dm4.topicmaps.translation_y", value="320", childTopics={}), relating association (id=-1, uri="null", typeUri="null", value="null", childTopics={}, null, null), dm4.topicmaps.translation_x=topic (id=-1, uri="null", typeUri="dm4.topicmaps.translation_x", value="187", childTopics={}), relating association (id=-1, uri="null", typeUri="null", value="null", childTopics={}, null, null)}), relating association (id=-1, uri="null", typeUri="null", value="null", childTopics={}, null, null))
Nov 24, 2015 9:38:34 AM de.deepamehta.core.impl.AttachedTopic _update
INFO: Updating topic 32634 (new topic (id=-1, uri="null", typeUri="dm4.topicmaps.translation_x", value="187", childTopics={}), relating association (id=-1, uri="null", typeUri="null", value="null", childTopics={}, null, null))
Nov 24, 2015 9:38:34 AM de.deepamehta.core.impl.AttachedDeepaMehtaObject updateSimpleValue
INFO: ### Changing simple value of topic 32634 from "-154" -> "187"
Nov 24, 2015 9:38:34 AM de.deepamehta.core.impl.AttachedTopic _update
INFO: Updating topic 32643 (new topic (id=-1, uri="null", typeUri="dm4.topicmaps.translation_y", value="320", childTopics={}), relating association (id=-1, uri="null", typeUri="null", value="null", childTopics={}, null, null))
Nov 24, 2015 9:38:34 AM de.deepamehta.core.impl.AttachedDeepaMehtaObject updateSimpleValue
INFO: ### Changing simple value of topic 32643 from "252" -> "320"
Nov 24, 2015 9:38:34 AM de.deepamehta.core.util.UniversalExceptionMapper logException
SEVERE: Request "PUT /topicmap/32613/translation/187/320" failed. Responding with 500 (Internal Server Error). The original exception/error is:
org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
	at org.neo4j.kernel.TopLevelTransaction.finish(TopLevelTransaction.java:134)
	at de.deepamehta.storage.neo4j.Neo4jTransactionAdapter.finish(Neo4jTransactionAdapter.java:39)
	at de.deepamehta.core.impl.TransactionFactory$TransactionResourceFilter$2.filter(TransactionFactory.java:91)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1427)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1360)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1350)
	at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
	at org.apache.felix.http.base.internal.handler.ServletHandler.doHandle(ServletHandler.java:339)
	at org.apache.felix.http.base.internal.handler.ServletHandler.handle(ServletHandler.java:300)
	at org.apache.felix.http.base.internal.dispatch.ServletPipeline.handle(ServletPipeline.java:93)
	at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:50)
	at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31)
	at org.apache.felix.http.base.internal.dispatch.FilterPipeline.dispatch(FilterPipeline.java:76)
	at org.apache.felix.http.base.internal.dispatch.Dispatcher.dispatch(Dispatcher.java:49)
	at org.apache.felix.http.base.internal.DispatcherServlet.service(DispatcherServlet.java:67)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:370)
	at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
	at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
	at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.neo4j.graphdb.TransactionFailureException: commit threw exception but status is committed?
	at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:476)
	at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:400)
	at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:119)
	at org.neo4j.kernel.TopLevelTransaction.finish(TopLevelTransaction.java:118)
	... 40 more
Caused by: javax.transaction.xa.XAException
	at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:558)
	at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:438)
	... 43 more
Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: Unable to write record[3] @[27]
	at org.neo4j.kernel.impl.nioneo.store.PersistenceRow.writeContents(PersistenceRow.java:182)
	at org.neo4j.kernel.impl.nioneo.store.PersistenceRow.force(PersistenceRow.java:198)
	at org.neo4j.kernel.impl.nioneo.store.LockableWindow.writeOutAndClose(LockableWindow.java:64)
	at org.neo4j.kernel.impl.nioneo.store.LockableWindow.writeOutAndCloseIfFree(LockableWindow.java:146)
	at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.release(PersistenceWindowPool.java:237)
	at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.releaseWindow(CommonAbstractStore.java:469)
	at org.neo4j.kernel.impl.nioneo.store.NeoStore.setRecord(NeoStore.java:484)
	at org.neo4j.kernel.impl.nioneo.store.NeoStore.setLastCommittedTx(NeoStore.java:422)
	at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.applyCommit(WriteTransaction.java:498)
	at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.doCommit(WriteTransaction.java:440)
	at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:319)
	at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:482)
	at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
	at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:552)
	... 44 more
Caused by: java.io.IOException: Input/output error
	at sun.nio.ch.FileDispatcherImpl.pwrite0(Native Method)
	at sun.nio.ch.FileDispatcherImpl.pwrite(FileDispatcherImpl.java:66)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89)
	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
	at sun.nio.ch.FileChannelImpl.writeInternal(FileChannelImpl.java:739)
	at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:724)
	at org.neo4j.kernel.impl.nioneo.store.PersistenceRow.writeContents(PersistenceRow.java:176)
	... 57 more
Nov 24, 2015 9:38:35 AM de.deepamehta.core.util.UniversalExceptionMapper logException
SEVERE: Request "PUT /topicmap/32613/translation/187/320" failed. Responding with 500 (Internal Server Error). The original exception/error is:
org.neo4j.graphdb.TransactionFailureException: Unable to get transaction.
	at org.neo4j.kernel.InternalAbstractGraphDatabase.transactionRunning(InternalAbstractGraphDatabase.java:769)
	at org.neo4j.kernel.InternalAbstractGraphDatabase.beginTx(InternalAbstractGraphDatabase.java:742)
	at org.neo4j.kernel.TransactionBuilderImpl.begin(TransactionBuilderImpl.java:39)
	at org.neo4j.kernel.InternalAbstractGraphDatabase.beginTx(InternalAbstractGraphDatabase.java:737)
	at de.deepamehta.storage.neo4j.Neo4jTransactionAdapter.<init>(Neo4jTransactionAdapter.java:22)
	at de.deepamehta.storage.neo4j.Neo4jStorage.beginTx(Neo4jStorage.java:529)
	at de.deepamehta.core.impl.StorageDecorator.beginTx(StorageDecorator.java:633)
	at de.deepamehta.core.impl.EmbeddedService.beginTx(EmbeddedService.java:581)
	at de.deepamehta.core.impl.TransactionFactory$TransactionResourceFilter$1.filter(TransactionFactory.java:70)
	at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:281)
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	at com.sun.jersey.server.impl.uri.rules.ResourceObjectRule.accept(ResourceObjectRule.java:100)
	at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1480)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1411)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1360)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1350)
	at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
	at org.apache.felix.http.base.internal.handler.ServletHandler.doHandle(ServletHandler.java:339)
	at org.apache.felix.http.base.internal.handler.ServletHandler.handle(ServletHandler.java:300)
	at org.apache.felix.http.base.internal.dispatch.ServletPipeline.handle(ServletPipeline.java:93)
	at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:50)
	at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31)
	at org.apache.felix.http.base.internal.dispatch.FilterPipeline.dispatch(FilterPipeline.java:76)
	at org.apache.felix.http.base.internal.dispatch.Dispatcher.dispatch(Dispatcher.java:49)
	at org.apache.felix.http.base.internal.DispatcherServlet.service(DispatcherServlet.java:67)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:370)
	at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
	at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
	at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)
Caused by: javax.transaction.SystemException: TM has encountered some problem, please perform neccesary action (tx recovery/restart)
	at org.neo4j.kernel.impl.transaction.TxManager.assertTmOk(TxManager.java:349)
	at org.neo4j.kernel.impl.transaction.TxManager.getTransaction(TxManager.java:700)
	at org.neo4j.kernel.InternalAbstractGraphDatabase.transactionRunning(InternalAbstractGraphDatabase.java:765)
	... 52 more

Do we have someone with expert knowledge about this here to confirm these speculations?
Thanks for your support!

Change History

comment:1 Changed 5 years ago by jri

  • What makes you think NFS would not support fsync()?
  • Can you confirm that you didn't delete any files from deepamehta-db? (#912)

comment:2 Changed 5 years ago by jri

• How do you relate the errors to NFS at all?

javax.transaction.xa.XAException

javax.transaction.SystemException: TM has encountered some problem, please perform neccesary action (tx recovery/restart)

To me it looks rather like the transaction manager has a problem. Possibly due to missing transaction logs?
My hint would be to never delete transaction logs while Neo4j is running (#912), if deleting them is OK at all (I don't know about this at the moment).

Have you ever used DM/Neo4j on a NFS volume before?

comment:3 Changed 5 years ago by Malte

What makes you think NFS would not support fsync()?

I did not state so.

Can you confirm that you didn't delete any files from deepamehta-db? (#912)

Yes, absolutely. This log is from 24th of November and #912 i just discovered.

How do you relate the errors to NFS at all?

As i wrote, all just speculations and intuition.

It was the only change i made, moving the database onto the NFS (due to disk space running out in system space). Furthermore, the error occurred to me after a day and first everything seemed to be fine. Time might be a factor here too. I also thought of a not closed transaction but everything what happens is wrapped in a @Transactional statement and otherwise would have thrown an Exception (and thus ended that transaction).

Subsequently i was further speculating that, because no other activity happened during that timeframe of 24hous, the file-system might have been unmounted and mounted and this might produce such an error.

Its just a suspicion and to make it very clear again: i do not know what i am talking about here i am not an ops expert.

What works for me: I moved the DB back to its old place and i never had such an error again (as before).

When i am there the next time i will try to reproduce this.

comment:4 follow-up: ↓ 5 Changed 5 years ago by Malte

Regarding your question:

What makeswro you think NFS would not support fsync()?

In contrary, i wrote that i could not make up any relation between the requirements (after reading those docs at neo4j and informing myself about) and any of the mounting options (present in fstab on the NFS mount commands).

Originally:

I read a bit about each of the the mounting options present here but none of those seem related to the deployment requirements mentioned at ​http://neo4j.com/docs/1.9.9/deployment-requirements.html

comment:5 in reply to: ↑ 4 Changed 5 years ago by jri

Thank you for providing this background info!

To track this further I would search for the actual error message first:

org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: Unable to write record[3] @[27]

FYI: DM uses Neo4j 1.8.1

comment:6 Changed 5 years ago by jri

As the root cause is a generic I/O error chances are for, like you said, an unmounted drive, hardware failure, or something.

java.io.IOException: Input/output error

Did the error reoccur when you restart DM?

comment:7 Changed 5 years ago by jri

  • Status changed from new to closed
  • Resolution set to worksforme
Note: See TracTickets for help on using tickets.