Ticket #912 (new Enhancement)

Opened 5 years ago

Last modified 5 years ago

what are these "lucene.log.v" files for and can we delete them once in a while?

Reported by: Malte Owned by:
Priority: Major Milestone: Release 4.8
Component: DeepaMehta Standard Distribution Version: 4.7
Keywords: Cc: jri, JuergeN
Complexity: 3 Area:
Module:

Description

In the "index" folder of each "deepamehta-db" there are, in my case, currently 219 files named "lucene.log.n" on disk. Are these Lucene "segments"? If so we could probably easily optimize the number and size of them but i guess these are some other files, so what are these?

In my example these binaries make up 3.6GB of the 5.7GB overall and i am just curious about that. If we can do something to reduce the overall file size of the db i think we should try it (as it is cumbers tome to work with).

I found the following post on stackoverflow and it might seem related and it points to the neo4j-lucene implementation being in charge for these files:
http://stackoverflow.com/questions/29656457/neo4j-database-exploding-due-to-lucene-logs-when-properties-are-added-to-nodes

Can we do something here? Thanks for your support!

Change History

comment:1 Changed 5 years ago by Malte

Another post [1] on the web suggest that these files are related to a neo4j setting called "keep_logical_logs".

As it seems, this looks like "the source of truth" about these files:
http://neo4j.com/docs/1.9.9/configuration-logical-logs.html

As briefly mentioned in [1] this setting should also apply to these lucene logs.
If we doing backups of your DB it should be safe to get rid of those (from time to time).

[1] http://grokbase.com/t/gg/neo4j/131fmvgg0s/does-this-seem-normal-for-the-index-folder-size

comment:2 Changed 5 years ago by Malte

Yes!

So, as it looks, it might be possible to just remove all of those lucene.log.vxxx files (below the index folder) and also all the ones starting with nineo_logical.log.vXX in the root dir of deepamehta-db.

Though, to find out which of those are safe to delete on has to look at the "last modified" times of the resp. files and guess which were not touched lately (which is vague and related to neo4j internals).

The size of my deepamehta-db folder has now shrinked from 5.9GB to 1.5GB and i have yet run into troubles with this. I am so glad that i asked.

Maybe it would be the wisest option to pass through this neo4j configuration option to dm4-users. That would be the most flexible for all kind of dm4 operators.

Cheers!

comment:3 Changed 5 years ago by Malte

Here is the information from the source for neo4j version 1.8.3:
http://neo4j.com/docs/1.8.3/configuration-logical-logs.html

Logical logs in Neo4j are the journal of which operations happens and are the 
source of truth in scenarios where the database needs to be recovered after a 
crash or similar. Logs are rotated every now and then (defaults to when they 
surpass 25 Mb in size) and the amount of legacy logs to keep can be configured.
Purpose of keeping a history of logical logs include being able to serve 
incremental backups as well as keeping an HA cluster running. Regardless of 
configuration at least the latest non-empty logical log be kept.

For any given configuration at least the latest non-empty logical log will 
be kept, but configuration can be supplied to control how much more to keep. 
There are several different means of controlling it and the format in which 
configuration is supplied is:

keep_logical_logs=<true/false>
keep_logical_logs=<amount> <type>

For example:
# Will keep logical logs indefinitely
keep_logical_logs=true
 
# Will keep only the most recent non-empty log
keep_logical_logs=false
 
# Will keep logical logs which contains any transaction committed within 30 days
keep_logical_logs=30 days
 
# Will keep logical logs which contains any of the most recent 500 000 transactions
keep_logical_logs=500k txs
Note: See TracTickets for help on using tickets.