Before using the tool for gigantic datasets, I can suggest you few things I just learned importing millions of nodes in few minutes (Neo4j Community Edition for Windows).
Regarding Neo4j import tips:
Don't use the web interface to import such big datasets, memory overload is inevitable.
Instead, use a programming language to interact with Neo4j (I recently used the official Python module and it's simply to learn but you can do the same with the good-old Java).
Before using the LOAD CSV
, remember to write the USING PERIODIC COMMIT
instructions in order to import big sets of data each iteration.
Before importing relations from CSV, remember to use CREATE CONSTRAINT ON <...> ASSERT <...> IS UNIQUE
for the key-properties of your labels. It will have a huge impact on relationships creation.
Use MATCH(...)
, not CREATE(...)
for the relationship procedure. It will avoids duplicates.
Regarding Neo4j performance:
First of all: read the official Neo4j page for tuning performance: https://neo4j.com/docs/operations-manual/current/performance/
Set a proper memory configuration for your Windows machine: configure manually the dbms.memory.pagecache.size
parameter (in neo4j.conf
file), if necessary.
Remember: the Java Virtual Machine is not a black box; you can improve its performance specifically for your application (editing the neo4j-community.vmoptions file).
For example, you can set the max memory usage for the JVM (-Xmx
parameter), you can also set the -XX:+UseG1GC
parameter to using the G1 Garbage Collector (high performance, suggested by Oracle for production enviroment) (https://docs.oracle.com/cd/E40972_01/doc.70/e40973/cnf_jvmgc.htm#autoId0)
I'll post my neo4j.conf custom lines used for my configuration (just for reference, it may be a wrong setup for your application, beware):
dbms.memory.pagecache.size=3g
dbms.jvm.additional=-XX:+UseG1GC
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow
dbms.jvm.additional=-XX:+AlwaysPreTouch
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields
dbms.jvm.additional=-XX:+DisableExplicitGC
And my neo4j-community.vmoptions custom lines (again, just for reference):
-Xmx1024m
-XX:+UseG1GC
-OmitStackTraceInFastThrow
-XX:+AlwaysPreTouch
-XX:+UnlockExperimentalVMOptions
-XX:+TrustFinalNonStaticFields
-XX:+DisableExplicitGC
My test machine is a weak notebook equipped with an Core i3 (dual core), with 8GB of RAM, Windows 10 and Neo4j 3.2.1 Community Edition.
I'm capable of importing 7 millions of nodes in less than 3 minutes and 3.5 millions of relationships in less than 5 minutes (no recursive relationships).
In a more capable machine, with a specific crafted setup, Neo4j can do WAY better than this. Hope it helps.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…