Data Engineering Neo4j Pandas

Neo4j + Pandas = Inline Image

Sometimes, we experience image URL in Neo4j data, and as a Data Engineer / Data Scientist, we would like to see the image. The query result set in Neo4j doesn’t has the capabilities to display.

Lets assume, we have a dataset, containing the URL of images from as an attribute.

Jupyter notebook, CSV and loader script can be found at Github

My choice of python library to interact with Neo4j is py2neo. As always, pandas is most popular library for any data engineer / data scientist. To display inline images in Pandas dataframe, we need to import HTML method from IPython.display module.

import pandas as pd
from py2neo.database import Graph
import requests
import sys
from IPython.display import HTML

Connect to the graph database using

# initialize Graph context
graph = Graph('bolt://<ip_address>:7687', auth=('neo4j', '<password>'), name="<db_name>")

Create a dataframe

books = pd.DataFrame("match (b:book) return b.isbn,b.title,b.yop,b.url").to_table(),columns=['isbn','title','yop','url']) 


A small code-snippet function is created to get the relative path of the URL.

def path_to_image_html(path):
    return '<img src="'+ path + '"  >'

Now, we will utilized the HTML method from IPython to display the dataframe with images.

pd.set_option('display.max_colwidth', None)
display(HTML(books.to_html(escape=False ,formatters=dict(url=path_to_image_html))))

The resulting dataframe with images directly grabbed from the internet, not from local storage.

 126 total views,  1 views today

Docker Neo4j

Neo4j – Seed Docker with Data

Sometimes, during the project lifecycle, there is a need to quickly start a Neo4j docker with seeded data for QA or UAT environments. Creating a “vanilla” neo4j docker and executing all the data loader cypher queries takes huge amount of time.

To save time, we can bootstrap or seed the docker with all the required data.

Docker “COPY” command can copy a file from current working directory to a folder inside docker.

Docker “RUN” command can execute shell scripts. Note – There can be only ONE “RUN” command.


To load the data during the start, the neo4j initial password needs to be set before starting neo4j.

for demo, we will load countries.csv file into neo4j. This file is saved in the current directory.

AS,American Samoa
AG,Antigua And Barbuda

to load the countries.csv file, we will create the respective cypher query.

LOAD CSV WITH HEADERS FROM 'file:///countries.csv' AS row
MERGE (c:Country {,countryName:});

then we create a Dockerfile to copy the csv and cypher query into docker and execute cypher-shell.

FROM neo4j

ENV NEO4J_HOME="/var/lib/neo4j" \
COPY countries.csv ${NEO4J_HOME}/import/
COPY data_loader.cypher ${NEO4J_HOME}/import/

# set initial-password to start loading the data
# sleep for 10 secs for neo4j to start without any overlapping

CMD bin/neo4j-admin set-initial-password ${NEO4J_PASSWD} && \
    bin/neo4j start && sleep 10 && \
    if [ -f "${NEO4J_HOME}/import/data_loader.cypher" ]; then  \
        cat ${NEO4J_HOME}/import/data_loader.cypher | NEO4J_USERNAME=neo4j NEO4J_PASSWORD=${NEO4J_PASSWD} bin/cypher-shell --fail-fast && rm ${NEO4J_HOME}/import/*; \
    fi && /bin/bash

build the dockerfile

docker build -t neo4j:seed . 

+] Building 11.7s (9/9) FINISHED
  => [internal] load build definition from Dockerfile                                                                                                                 0.0s 
  => => transferring dockerfile: 686B                                                                                                                                 0.0s 
  => => transferring context: 2B                                                                                                                                      0.0s 
  => [internal] load metadata for                                                                                                     11.3s 
  => [auth] library/neo4j:pull token for                                                                                                         0.0s 
  => [1/3] FROM                                                       0.0s 
  => [internal] load build context                                                                                                                                    0.0s 
  => => transferring context: 230B                                                                                                                                    0.0s 
  => CACHED [2/3] COPY countries.csv /var/lib/neo4j/import/                                                                                                           0.0s 
  => [3/3] COPY data_loader.cypher /var/lib/neo4j/import/                                                                                                             0.1s 
  => exporting to image                                                                                                                                               0.1s 
  => => exporting layers                                                                                                                                              0.1s 
  => => writing image sha256:ac7113b7e0ae6abe7145f2d112dfbbe9b45aa6c6eb4e4147cfffbff691185cde                                                                         0.0s 
  => => naming to                                                                                                                        0.0s 

Once the build is successful, run the tagged “neo4j:seed” image

docker run -it -d  neo4j:seed

Verify the data –

PS C:\Users\domin> docker ps
 CONTAINER ID   IMAGE        COMMAND                  CREATED         STATUS         PORTS                     NAMES
 6c848fee3c72   neo4j:seed   "/sbin/tini -g -- /d…"   7 seconds ago   Up 6 seconds   7473-7474/tcp, 7687/tcp   ecstatic_neumann

 PS C:\Users\domin> docker exec -it 6c848fee3c72 cypher-shell
 username: neo4j
 password: **
 Connected to Neo4j 4.2.0 at neo4j://localhost:7687 as user neo4j.
 Type :help for a list of available commands or :exit to exit the shell.
 Note that Cypher queries must end with a semicolon.
 neo4j@neo4j> match (n) return (n);
 | n                                                         |
 | (:Country {id: "AF", countryName: "Afghanistan"})         |
 | (:Country {id: "AL", countryName: "Albania"})             |
 | (:Country {id: "DZ", countryName: "Algeria"})             |
 | (:Country {id: "AS", countryName: "American Samoa"})      |
 | (:Country {id: "AD", countryName: "Andorra"})             |
 | (:Country {id: "AO", countryName: "Angola"})              |
 | (:Country {id: "AI", countryName: "Anguilla"})            |
 | (:Country {id: "AQ", countryName: "Antarctica"})          |
 | (:Country {id: "AG", countryName: "Antigua And Barbuda"}) |
 9 rows available after 42 ms, consumed after another 4 ms

From the data, it is verified that the data is seeded / bootstrapped with neo4j database.

Happy Graphing …..

 385 total views,  1 views today


Neo4j Blog – Featured Community Member

Featured as a Community Member in Neo4j

 44 total views,  1 views today

Docker Neo4j

Neo4j Cluster(apoc+gds) Docker with Portainer

Like most of the RDBMS and NoSQL Databases, Neo4j also provides Clustering. Clustering provides three main features –

High Availability – Always available even if there are node failures.

Horizontal Scalability – Read Only Replicas distribute loads isolated from write nodes.

Consistency – when enabled, the client application call is guaranteed to read at least its own successful writes.

causal clustering

More documentation for Neo4j Casual Cluster can be found at Neo4j Official documentation.

For demo purpose, we will create a 3-Node Neo4j Cluster with apoc and graph data science latest plugin using Docker along with Portainer.

Portainer is an open-source and lightweight management UI which allows to easily manage Docker environments.

Github link -> docker-compose.yml

version: "3.8"

    hostname: core1
    image: neo4j:enterprise
      - neo4j_cluster_ntx
    container_name: core1
      - ./core1/neo4j/data:/var/lib/neo4j/data
      - ./core1/neo4j/import:/var/lib/neo4j/import
      - NEO4JLABS_PLUGINS=["apoc","graph-data-science"]
      - NEO4J_dbms_default__listen__address=
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*
      - NEO4J_dbms_security_procedures_allowlist=gds.*,apoc.*
      - NEO4J_dbms_mode=CORE
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__formation=3
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__runtime=3
      - NEO4J_causal__clustering_discovery__advertised__address=core1:5000
      - NEO4J_causal__clustering_transaction__advertised__address=core1:6000
      - NEO4J_causal__clustering_raft__advertised__address=core1:7000
      - NEO4J_causal__clustering_initial__discovery__members=core1:5000,core2:5000,core3:5000
      - NEO4J_causal__clustering_disable__middleware__logging=false
      - 7474:7474
      - 6477:6477
      - 7687:7687
    hostname: core2
    image: neo4j:enterprise
      - neo4j_cluster_ntx
    container_name: core2
      - ./core2/neo4j/data:/var/lib/neo4j/data
      - ./core2/neo4j/import:/var/lib/neo4j/import
      - NEO4JLABS_PLUGINS=["apoc","graph-data-science"]
      - NEO4J_dbms_default__listen__address=
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*
      - NEO4J_dbms_security_procedures_allowlist=gds.*,apoc.*
      - NEO4J_dbms_mode=CORE
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__formation=3
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__runtime=3
      - NEO4J_causal__clustering_discovery__advertised__address=core2:5000
      - NEO4J_causal__clustering_transaction__advertised__address=core2:6000
      - NEO4J_causal__clustering_raft__advertised__address=core2:7000
      - NEO4J_causal__clustering_initial__discovery__members=core1:5000,core2:5000,core3:5000
      - NEO4J_causal__clustering_disable__middleware__logging=false
      - NEO4J_dbms_connector_http_listen__address=:7475
      - NEO4J_dbms_connector_https_listen__address=:6478
      - NEO4J_dbms_connector_bolt_listen__address=:7688
      - 7475:7475
      - 6478:6478
      - 7688:7688
    hostname: core3
    image: neo4j:enterprise
      - neo4j_cluster_ntx
    container_name: core3
      - ./core3/neo4j/data:/var/lib/neo4j/data
      - ./core3/neo4j/import:/var/lib/neo4j/import
      - NEO4JLABS_PLUGINS=["apoc","graph-data-science"]
      - NEO4J_dbms_default__listen__address=
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*
      - NEO4J_dbms_security_procedures_allowlist=gds.*,apoc.*
      - NEO4J_dbms_mode=CORE
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__formation=3
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__runtime=3
      - NEO4J_causal__clustering_discovery__advertised__address=core3:5000
      - NEO4J_causal__clustering_transaction__advertised__address=core3:6000
      - NEO4J_causal__clustering_raft__advertised__address=core3:7000
      - NEO4J_causal__clustering_initial__discovery__members=core1:5000,core2:5000,core3:5000
      - NEO4J_causal__clustering_disable__middleware__logging=false
      - NEO4J_dbms_connector_http_listen__address=:7476
      - NEO4J_dbms_connector_https_listen__address=:6479
      - NEO4J_dbms_connector_bolt_listen__address=:7689
      - 7476:7476
      - 6479:6479
      - 7689:7689
    image: portainer/portainer
      - neo4j_cluster_ntx
      - /var/run/docker.sock:/var/run/docker.sock
      - ./pt-data:/data
      - "9000:9000"

docker-compose up takes some time to completely setup and bootstrap the data on all neo4j nodes.

Note During the build process, at the portainer step, you have to login to localhost:9000 to create the admin user within 5 mins.

Neo4j Cluster

Portainer Setup

Portainer setup –

Create a user, and in the next screen, select “Local” as we are using an open-source portainer setup, and click “Connect”.

once the setup is finished, the portainer dashboard is displayed.

Selecting the containers from the left menu, shows various containers

Portainer has a great feature for performance metrics. It can display CPU, Memory and Network usage with refresh rate start from 1sec to 1 min

Neo4j Cluster

After few mins of bootstrapping and syncing with the 3 nodes, the neo4j cluster is ready to use. (wait for Remote interface available)

 core1        | 2021-01-18 00:37:01.407+0000 INFO  Called db.clearQueryCaches(): Query cache already empty.
 core2        | 2021-01-18 00:37:04.411+0000 INFO  Started downloading snapshot for database 'neo4j'…
 core3        | 2021-01-18 00:37:04.415+0000 INFO  Started downloading snapshot for database 'neo4j'…
 core2        | 2021-01-18 00:37:08.827+0000 INFO  Download of snapshot for database 'neo4j' complete.
 core3        | 2021-01-18 00:37:08.912+0000 INFO  Download of snapshot for database 'neo4j' complete.
 core3        | 2021-01-18 00:37:12.648+0000 INFO  Called db.clearQueryCaches(): Query cache already empty.
 core2        | 2021-01-18 00:37:13.007+0000 INFO  Called db.clearQueryCaches(): Query cache already empty.
 core3        | 2021-01-18 00:37:16.529+0000 INFO  Connected to core2/ [raft version:3.0]
 core2        | 2021-01-18 00:37:16.611+0000 INFO  Connected to core3/ [raft version:3.0]
 core1        | 2021-01-18 00:37:20.275+0000 INFO  Sending metrics to CSV file at /var/lib/neo4j/metrics
 core1        | 2021-01-18 00:37:20.349+0000 INFO  Bolt enabled on
 core1        | 2021-01-18 00:37:22.063+0000 INFO  Remote interface available at http://localhost:7474/
 core1        | 2021-01-18 00:37:22.064+0000 INFO  Started.
 core2        | 2021-01-18 00:37:31.467+0000 INFO  Sending metrics to CSV file at /var/lib/neo4j/metrics
 core3        | 2021-01-18 00:37:31.485+0000 INFO  Sending metrics to CSV file at /var/lib/neo4j/metrics
 core2        | 2021-01-18 00:37:31.553+0000 INFO  Bolt enabled on
 core3        | 2021-01-18 00:37:31.561+0000 INFO  Bolt enabled on
 core2        | 2021-01-18 00:37:33.468+0000 INFO  Remote interface available at http://localhost:7475/
 core2        | 2021-01-18 00:37:33.469+0000 INFO  Started.
 core3        | 2021-01-18 00:37:33.471+0000 INFO  Remote interface available at http://localhost:7476/
 core3        | 2021-01-18 00:37:33.472+0000 INFO  Started.

Open a web-browser and navigate to localhost:7474 to access the core1 node (default username and password is “neo4j”). After setting the neo4j password, execute call dbms.cluster.overview()

│"id"                                  │"addresses"                                      │"databases"                             │"groups"│
│"a861c553-4cd5-4f8b-baae-3279fa92c65a"│["bolt://localhost:7688","http://localhost:7475"]│{"neo4j":"FOLLOWER","system":"FOLLOWER"}│[]      │
│"e38564bb-1f42-4738-9629-29e5cf1f0bb3"│["bolt://localhost:7689","http://localhost:7476"]│{"neo4j":"LEADER","system":"FOLLOWER"}  │[]      │
│"2614662c-7f3a-4024-8900-f3555e26630d"│["bolt://localhost:7687","http://localhost:7474"]│{"neo4j":"FOLLOWER","system":"LEADER"}  │[]      │

 167 total views,  1 views today

Docker Neo4j

Neo4j 4.x + GraphAware UUID

Starting from Nov 2020, GraphAware has started to support GraphAware framework and UUID for Neo4j 4.x, although rest of the products like recommendation-engine, elasticsearch, expire, resttest, timetree and triggers still support Neo4j 3.x only.

Natively Neo4j supports creation of UUID (v4) through cypher, but they either have to be created during data insert or running bulk inserts via apoc and there isn’t much control of type of UUID’s creation.

Default Neo4j’s UUID creates a “-” in-between the ID’s.

One of the alternate solution is to add GraphAware UUID plugin. GraphAware UUID can create UUID on all nodes (or) specific nodes labels and on relationships as per specification in graphaware.conf file.

For the purpose of the Demo, docker is used to create Neo4j 4.2.2 community version instance. For Neo4j Enterprise, you need to contact GraphAware for License.

(for Non-Docker) Installation is pretty easy. Just add the respective jar versions of GraphAware Framework and GraphAware UUID to the neo4j plugins folder.

Direct Download Link ->

I have created a Dockerfile and docker-compose.yml that will automatically download the plugins and start the database.



FROM neo4j

COPY ./graphaware.conf "${NEO4J_HOME}"/conf
ADD "${NEO4J_HOME}"/plugins
ADD "${NEO4J_HOME}"/plugins


version: '3'

    build: .
    hostname: neo4j_graphaware
    container_name: neo4j_graphaware
      - ./neo4j/data:/var/lib/neo4j/data
      - ./neo4j/import:/var/lib/neo4j/import
      - NEO4J_dbms_default__listen__address=
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*
      - NEO4J_dbms_security_procedures_allowlist=gds.*,apoc.*
      - "7474:7474"
      - "7687:7687"
      - "7473:7473"

The next step is to configure the UUID generation. For the demo purpose, we will generate the UUID for node -> Person, relationship -> REPORTS_TO and UUID without “-” , in the graphaware.conf, which would be later copied into the conf folder by docker.


#UIDM becomes the module ID:

#optional, default is uuid:

#optional, default is false:

#optional, default is all nodes:

#optional, default is no relationships:

Start the neo4j with docker-compose up, and monitor any errors during startup. A successful startup of Neo4j along with GraphAware Framework and UUID should look like below.

Creating network "graphaware_default" with the default driver
Building neo4j
Step 1/5 : FROM neo4j

Step 2/5 : ENV NEO4J_ACCEPT_LICENSE_AGREEMENT=yes     NEO4J_HOME="/var/lib/neo4j"

 ---> a1acb75ca51d
Step 3/5 : COPY ./graphaware.conf "${NEO4J_HOME}"/conf

 ---> 74b7274bd560
Step 4/5 : ADD "${NEO4J_HOME}"/plugins

 ---> 644f9f5eba63
Step 5/5 : ADD "${NEO4J_HOME}"/plugins

 ---> e097a691e2e4

Successfully built e097a691e2e4
Successfully tagged graphaware_neo4j:latest
Creating neo4j_graphaware ... done
Attaching to neo4j_graphaware
neo4j_graphaware | Directories in use:
neo4j_graphaware |   home:         /var/lib/neo4j
neo4j_graphaware |   config:       /var/lib/neo4j/conf
neo4j_graphaware |   logs:         /logs
neo4j_graphaware |   plugins:      /var/lib/neo4j/plugins
neo4j_graphaware |   import:       /var/lib/neo4j/import
neo4j_graphaware |   data:         /var/lib/neo4j/data
neo4j_graphaware |   certificates: /var/lib/neo4j/certificates
neo4j_graphaware |   run:          /var/lib/neo4j/run
neo4j_graphaware | Starting Neo4j.
neo4j_graphaware | 2021-01-13 00:14:04.193+0000 WARN  Unrecognized setting. No declared setting with name: metrics.enabled
neo4j_graphaware | 2021-01-13 00:14:04.207+0000 INFO  Starting...
neo4j_graphaware | 2021-01-13 00:14:05.807+0000 INFO  ======== Neo4j 4.2.2 ========
neo4j_graphaware | 2021-01-13 00:14:07.095+0000 INFO  GraphAware Runtime disabled for database system.
neo4j_graphaware | 2021-01-13 00:14:08.931+0000 INFO  Performing postInitialization step for component 'security-users' with version 2 and status CURRENT
neo4j_graphaware | 2021-01-13 00:14:08.931+0000 INFO  Updating the initial password in component 'security-users'
neo4j_graphaware | 2021-01-13 00:14:09.516+0000 INFO  GraphAware Runtime enabled for database neo4j, bootstrapping...
neo4j_graphaware | 2021-01-13 00:14:09.528+0000 INFO  Bootstrapping module with order 1, ID UIDM, using com.graphaware.module.uuid.UuidBootstrapper for database neo4j
neo4j_graphaware | 2021-01-13 00:14:09.531+0000 INFO  Node Inclusion Policy set to com.graphaware.common.policy.inclusion.composite.CompositeNodeInclusionPolicy@21e78ee8       
neo4j_graphaware | 2021-01-13 00:14:09.533+0000 INFO  Relationship Inclusion Policy set to com.graphaware.common.policy.inclusion.composite.CompositeRelationshipInclusionPolicy@658e6daa@658e6daa
neo4j_graphaware | 2021-01-13 00:14:09.534+0000 INFO  uuidProperty set to uuid
neo4j_graphaware | 2021-01-13 00:14:09.534+0000 INFO  stripHyphens set to true
neo4j_graphaware | 2021-01-13 00:14:09.535+0000 INFO  Registering module UIDM with GraphAware Runtime for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.536+0000 INFO  GraphAware Runtime bootstrapped for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.578+0000 INFO  Starting GraphAware Runtime for database neo4j...
neo4j_graphaware | 2021-01-13 00:14:09.578+0000 INFO  Starting GraphAware Runtime modules for database neo4j...
neo4j_graphaware | 2021-01-13 00:14:09.579+0000 INFO  Starting module UIDM for database neo4j...
neo4j_graphaware | 2021-01-13 00:14:09.584+0000 INFO  Started module UIDM for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.586+0000 INFO  GraphAware Runtime modules started for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.586+0000 INFO  Started GraphAware Runtime for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.668+0000 INFO  Bolt enabled on
neo4j_graphaware | 2021-01-13 00:14:10.518+0000 INFO  Remote interface available at http://localhost:7474/
neo4j_graphaware | 2021-01-13 00:14:10.519+0000 INFO  Started.                                                                                                                  

Node UUID Generation Testing

1. Create a sample data with label Person, to verify if UUID is created.

MERGE (n:Person {name: "Dominic"}) RETURN n;
│"n"               │

match (n:Person) return labels(n),n
│"labels(n)"│"n"                                                         │
│["Person"] │{"name":"Dominic","uuid":"9a77dfec7b95484eb9f22c848b0ad8f5"}│

MERGE (n:Person {name: "Kumar"}) RETURN n;
│"n"             │

match (n:Person) return labels(n),n;
│"labels(n)"│"n"                                                         │
│["Person"] │{"name":"Kumar","uuid":"5f36188e4079406d9c202c4e0feaf785"}  │
│["Person"] │{"name":"Dominic","uuid":"9a77dfec7b95484eb9f22c848b0ad8f5"}│

the above output shows that UUID is created for label Person automatically as specified in the graphaware configuration file.

2. We will test if UUID is created for any other labels, not specified in the graphaware configuration file.

merge (s:State{name:"Texas"}) return s
│"s"             │

match (n:State) return labels(n),n
│"labels(n)"│"n"             │
│["State"]  │{"name":"Texas"}│

from the above output, it clearly shows that UUID is not created for Label-> State, because, it was not specified in the graphaware configuration file.

Relationship UUID Generation Testing

To test the automatic UUID generation for relationship, we will create two types of relationships – (1) RESIDES_IN and (2) REPORTS_TO, the former is Not specified in the graphaware.conf configuration file, whereas the latter is.

match (p:Person{name:"Dominic"}), (s:State{name:"Texas"}) merge (p)-[:RESIDES_IN]->(s)

Created 1 relationship, completed after 33 ms.

match (p1:Person{name:"Dominic"}), (p2:Person{name:"Kumar"}) merge (p1)-[:REPORTS_TO]->(p2)

Created 1 relationship, completed after 8 ms.

match (m)-[r]->(n)   return labels(m) as labels_1 , ,type(r) as rel_label ,r as rel ,labels(n) as labels_2 ,

│"labels_1"│"" │"rel_label" │"rel"                                      │"labels_2"│""│
│["Person"]│"Dominic"│"RESIDES_IN"│{}                                         │["State"] │"Texas" │
│["Person"]│"Dominic"│"REPORTS_TO"│{"uuid":"2596a1bf5bd14a148571550ce275d743"}│["Person"]│"Kumar" │

As you can see, UUID is automatically created for relationship REPORTS_TO only.

Happy Graphing …..

 453 total views,  1 views today

Data Engineering Docker Neo4j

Neo4j Spatial Docker

Docker Hub

Starting from Neo4j 4.x, the Neo4j Spatial plugin is incompatible and it will fail to start the database. So, I have a created an Docker image that creates a Neo4j 3.5.25 image along with all the plugins that are required for Spatial queries.

Docker pull command

docker pull dominicvivek06/neo4j_spatial


version: '3'

    image: dominicvivek06/neo4j_spatial
    hostname: neo4j_35_spatial
    container_name: neo4j_35_spatial
      - ./neo4j/data:/var/lib/neo4j/data
      - ./neo4j/import:/var/lib/neo4j/import
      - NEO4J_dbms_connectors_default__listen__address=
      - NEO4J_metrics_enabled=false 
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_apoc_spatial_geocode_provider=osm
      - NEO4J_apoc_spatial_geocode_osm_throttle=5000
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*,spatial.*
      - NEO4J_dbms_security_procedures_whitelist=gds.*,apoc.*,spatial.*
      - "7474:7474"
      - "7687:7687"
      - "7473:7473"


FROM alpine:latest
ADD /tmp
RUN unzip

FROM neo4j:3.5-enterprise
ADD "${NEO4J_HOME}"/plugins
ADD "${NEO4J_HOME}"/plugins
COPY --from=0 /tmp/*.jar "${NEO4J_HOME}"/plugins

Anyone can download the files from my github.

Note – The above docker image assumes you have a valid Neo4j License for production. If not, please, contact Neo4j.

 199 total views,  2 views today

Neo4j python

Neo4j – Pivot Functionality

Neo4j lacks Pivot functionality, so I have created a simple demo in Python jupyter notebook to illustrate, how it can be worked out.

Github URL ->

 136 total views,  2 views today


Neo4j 4.2.0 (17th Nov 2020)



  • ALIGNED store format – A new store format that reduces the total number of I/O operations can be set at startup for new databases.
  • Procedures to observe the internal scheduler – New functions to observe the execution of background tasks have been introduced.
  • Dynamic settings at startup – Configuration can be set using the new –expand-commands argument and by executing calls to external components prior to startup.
  • WAIT/NOWAIT in Database Management – Command to manage databases (CREATE/DROP/START/STOP) can be executed with a WAIT or NOWAIT option. With WAIT, a user can wait up to the completion of the command, a timeout or a given number of seconds.
  • Index and constraint administration commands – Cypher provides commands to create, drop and view indexes and constraints (CREATE/DROP/SHOW INDEX/CONSTRAINT).
  • Filtering in SHOW commands – Cypher SHOW commands provide a new, simple way to retrieve a selection of columns, filter and aggregate the results.
  • Backup/Restore improvements – DBAs can create backups and restore databases in a multi-database environment with more sophisticated features: metadata can be saved aside of a database and applied to their restore on demand, multiple or all databases can be backed up or restored.
  • Compress metrics on rotation – CSV metric files can be compressed on rotation.
  • Database namespace for metrics – Metrics can be organized per database on request.
  • neo4j-admin improvements – The tool has improvements in operations for “copy”, “store-info” and “memrec”.
  • HTTP port selective settings – The HTTP ports can be enabled or disabled separately for Browser, HTTP API, transactional endpoints, management endpoints and unmanaged extensions.

Causal Cluster

  • Run/Pause Read Replicas – The replication of individual databases can be paused or resumed in read replicas.
  • Database quarantine – Databases with internal errors can be selectively quarantined on a member of the cluster.


  • Planner improvements – Cypher planner has extended the use of the “index-backed order by” feature, i.e. it may set more efficient plans in more cases when an ORDER BY clause is present and it is supported by an index.
  • Octal literals – Octal numbers in Cypher queriey in Neo4j 4.2 start with ‘0o’ – Cypher .

Functions and Procedures

  • round() function – round() has been improved to select the precision of the returned value.
  • dbms.functions() procedure – functions are organised in categories.


  • Procedures and user defined function privileges – DBAs can grant, deny or revoke access for user to specific procedures and user-defined functions. Users may also have access with boosted privileges, compared to their current security profile.
  • Role-Based Access Control Default graph – Permissions can be granted, denied or revoked against the default graph, regardless of the default setting.
  • PLAINTEXT and ENCRYPTED password in user creation – Passwords can be set in plain text or in a one-way encrypted format on request.
  • SHOW CURRENT USER – Users can see the profile of their current user.
  • SHOW PRIVILEGES as commands – DBAs can visualize the commands to execute to recreate a security profile.
  • OCSP stapling support for Java driver – The driver provides support for OCSP stapling.

Important Changes

We made some changes in the behavior of new Neo4j installations. In general, upgrades are not affected by the change, but please check carefully this list, but there two exceptions:

  1. The metrics.filter setting has been introduced and by default it reduces the number of metrics collected also in upgraded installations.
  2. JMX metrics, governed by metrics.jmx.enabled is set to false by default.

This is the complete list of changes:

  • metrics.csv.interval – In new installations, the default value is 30 seconds (it was 3 seconds).
  • metrics.csv.rotation.compression – This is a new parameter, CSV metric files are compressed on rotation by default (they were not compressed in previous versions).
  • metrics.jmx.enabled – In new installations, JMX metrics are disabled by default.
  • metrics.filter – The number of metrics set by default has been reduced. The current default set includes:
    • bolt.connections
    • bolt.messages_received
    • bolt.messages_started
    • *dbms.pool.bolt.total_size
    • *dbms.pool.bolt.total_used
    • *dbms.pool.bolt.used_heap
    • *causal_clustering.core.is_leader
    • *causal_clustering.core.last_leader_message
    • *causal_clustering.core.replication_attempt
    • *causal_clustering.core.replication_fail
    • *check_point.duration
    • *check_point.total_time
    • *cypher.replan_events
    • *ids_in_use.node
    • *
    • *ids_in_use.relationship
    • *pool.transaction..total_used
    • pool.transaction..used_heap
    • pool.transaction..used_native
    • store.size
    • transaction.active_read
    • *transaction.active_write
    • *transaction.committed
    • transaction.last_committed_tx_id
    • *transaction.peak_concurrent
    • *transaction.rollbacks
    • page_cache.hit
    • page_cache.page_faults
    • *page_cache.usage_ratio
    • *vm.file.descriptors.count
    • *vm.gc.time
    • vm.heap.used
    • *
    • *vm.memory.pool.g1_eden_space
    • *vm.memory.pool.g1_old_gen
    • *vm.pause_time
    • *vm.thread


The following procedures and functions have been deprecated:

  • “whitelist” settings have been replaced by “allowlist” settings: dbms.dynamic.setting.whitelist, dbms.memory.pagecache.warmup.preload.whitelist, and
  • Log rotation delay settings:, dbms.logs.user.rotation.delay, dbms.logs.debug.rotation.delay. These settings are no longer needed after replacing the custom logging framework with log4j.
  • Log and logger interfaces: debugLogger, InforLogger, warnLogger, errorLogger and bulk method on the Log interface. The debug ,info, warn and error methods on the Log interface can be used directly instead.
  • Octal syntax: writing octals with syntax 0… is deprecated in favor of new 0o….
  • Hexadecimal syntax: writing hexadecimals with syntax 0X… is deprecated since 0x… is favored
  • db.createIndex(): deprecated in favor of the CREATE INDEX command
  • db.createNodeKey(): deprecated in favor of the CREATE CONSTRAINT … IS NODE KEY command
  • db.createUniquePropertyConstraint(): deprecated in favor of the CREATE CONSTRAINT … IS UNIQUE command
  • db.indexes(): replaced by SHOW INDEXES
  • db.indexDetails(): replaced by SHOW INDEXES VERBOSE OUTPUT
  • db.constraints(): replaced by SHOW CONSTRAINTS
  • replaced by EXECUTE BOOSTED privileges (for both procedures and user-defined functions)
  • replaced by EXECUTE BOOSTED privileges (for both procedures and user-defined functions)
  • When an expression is returned by a subquery, a warning is raised if it does not have an alias
Data Neo4j SQL Server

MS SQL Server Database Logical Entity-Relationship Model in Neo4j

In Neo4j 4.0 onward, we can have multi database serving in the same instance of neo4j.

Sometimes, we need to store the E-R metadata of SQL Server in Neo4j database.

So, with neo4j, we can create a separate metadata database for a given SQL Server database, and extract the metadata of the Entities and Attributes from SQL Server and load it into Neo4j as a reference.

USE < database > 

/*extract all tables */

   'MERGE (' + TABLE_NAME + ':' + TABLE_NAME + ');' 

/* extract all releationships */ 

   'MATCH (' + OBJECT_NAME(referenced_object_id) + ':table{table_name:"' + OBJECT_NAME(referenced_object_id) + '"}),(' + OBJECT_NAME(parent_object_id) + ':table{table_name:"' + OBJECT_NAME(parent_object_id) + '"}) MERGE (' + OBJECT_NAME(referenced_object_id) + ')-[r:' + OBJECT_NAME(constraint_object_id) + ']->(' + OBJECT_NAME(parent_object_id) + ');' 

For the demo, I used “AdventureWorks” sample database from SQL Server, and executed the above queries in SQL Server Management Studio. Make sure, you have selected “Results to Text” in SSMS for easy copying of the results.



Copy the results and execute it in Neo4j database.


  1. The script will also work in Neo4j 3.x.
  2. Some of the relationships from SQL Server is larger than 25 characters. Creating those relationships in neo4j will fail. In my next release of the code, I will generate an unique relationship names.

 166 total views


Generate Demo Data in Neo4j using neo4j-faker


There are always times, when we need to quickly generate demo datasets in Neo4j quickly to brainstorm solution for critical application, before creating or loading actual data.

Graphaware’s GraphGen was useful when generating a small-to-medium data. But, if we need a higher volume and flexibility on labels and properties, then it becomes a challenge.

For such situations, we can use neo4j-faker contribution from neo4j.


  1. Download the latest release from GitHub into a temp directory.
  2. Unzip the distribution zip file into a directory.
  3. Copy all the contents of the dist directory to the neo4j server plugin directory.
  4. The dist directory has the following contents
    1. neo4jFaker-x.x.x.jar – The main jar file for faker.
    2. ddgress – Directory where loader files, csv and text files are stored to be used along with faker. (Example – cities.txt contains a list of cities to be used for data generation).
  5. Modify the below lines in neo4j.conf, and restart neo4j service
    • Add the following line in the neo4j.conf file to allow access to the faker functions.
    • Add the following line to your neo4j.conf file
      • dbms.unmanaged_extension_classes=org.neo4j.faker.ump=/testdata

Installation Verification

  1. Create a test database by name – “demo” in neo4j.
  2. For demo purpose, lets quickly create a demo dataset using an existing property file in ddgress folder – example1.props
  3. Browse the URL from localhost or use curl.
    • curl
  4. If there are no errors in the installation, then 2,830 nodes, 5,600 relations and 28,090 properties will be created, as shown at the end of the log file.
DemoDataGen version 0.9.0
STARTING on Tue Jun 30 20:45:55 EDT 2020
tdgRoot /root/neo4j-enterprise-4.1.0/plugins/ Property file to be loaded /root/neo4j-enterprise-4.1.0/plugins/ddgres/example1.props
Tue Jun 30 20:45:55 EDT 2020 start :
Commit size 2000
Reading node list
tgd_id INDEX : true
processing node definition p1:Person:2000
vals.length 3
-- 0 p1 -- 1 Person -- 2 2000
found property ~Person for label Person
found property caption for label Person
found property address for label Person
found property credit for label Person
processing node definition p2:Person:800
vals.length 3
-- 0 p2 -- 1 Person -- 2 800
found property ~Person for label Person
found property caption for label Person
found property address for label Person
found property credit for label Person
found property caption for alias p2
processing node definition cit:City:30
vals.length 3
-- 0 cit -- 1 City -- 2 30
found property country for label City
found property name for label City
Reading lookup nodes
Reading relation definitions
found property weight for relation personPerson1
found property weight for relation personPerson2
Initializing faker library
Initializing Name generator
found node -alias p1 -label:Person -indexProperty: tdg_id -amount:2000
found node -alias p2 -label:Person -indexProperty: tdg_id -amount:800
found node -alias cit -label:City -indexProperty: tdg_id -amount:30
processing relation personPerson1
START node identifier p1
END node identifier p2
processing p1 many to one 2000 - p2
processing relation personPerson2
START node identifier p2
END node identifier p1
processing p2 many to one 800 - p1
processing relation personCity1
START node identifier p1
END node identifier cit
processing p1 many to one 2000 - cit
processing relation personCity2
START node identifier p2
END node identifier cit
processing p2 many to one 800 - cit
nodeCount 2830
relation count 5600
property count 28090
finished total time was (ms): 1070
load speed node and relations per second: 7000.0
Tue Jun 30 20:45:56 EDT 2020 END :

5. Query the neo4j database from browser, to verify the data created by faker.

6. Query schema of created data set.


Lets create demo data using faker procedure and functions from available plugin.

For our project, lets create 3 labels – Person, CreditCard and City.

Each Person owns one or more Credit Cards and reside in a City.

First lets create Person data –

foreach (i in range(0,100) |
create (p:Person { uid : i })
set p += fkr.person('1960-01-01','2001-01-01')
set p.ssn = fkr.code('### ### ####')
set =

The above command creates

  • 100 Person nodes as mentioned in the range.
  • Set Date of Birth between 1960-01-01 and 2001-01-01 of the Person.
  • Sets the Person’s ssn in the “### ### ####” format along with email, using fkr.code.
    • fkr.code generate random number in the format specified.
  • The Person nodes automatically creates First Name, Last Name and concatenates First Name and Last Name into Full Name.
  • Email address is generated with first name, last name with domain names like, etc.

Lets create Credit Card data –

foreach (a in range(0,500) |
        create (c:CreditCard {cardnum : fkr.code('#### #### #### ####')})
        set c.limit = fkr.longElement('5000,7500,8000,2500,1200,2000,4000,10000')
        set c.balance = fkr.number(10,1000)
        set c.card_type = fkr.nextStringElement('VISA,MasterCard,Paypal,Discover,AMEX')

The above code creates

  • 500 Credit Cards as mentioned in the range.
  • Sets the limit of the card using fkr.longElement.
  • Sets balance on each Credit Card with the range of 10 to 10,000.
  • Sets a Card type from the list of type mentioned in fkr.nextStringElement

Lets create Cities

foreach (ci in range(0,40) |
     merge (cit:City { name : fkr.stringFromFile("cities.txt") })

The above code creates

  • 40 Nodes with randomly picked City names as described in “cities.txt” inside ddgress folder.

Finally, we will create Relationships between Person, CreditCard and City nodes.

The below command uses fkr.createRelation to create Relationships.

match (person:Person) with collect(person) as persons
match (card:CreditCard) with persons, collect(card) as cards
match (city:City) with persons,cards, collect(city) as cities
   with persons, cards,cities
      call fkr.createRelations(persons, "HAS_CARDS", cards, "1-n") yield relationships as cardsRelations
      call fkr.createRelations(persons, "LIVES_IN", cities, "n-1") yield relationships as citieselations
return persons, cards, cities;

The fkr.createRelation procedure has a signature, accepting 4 parameters

  • startNodes
  • relationshipType
  • endNodes
  • cardinality, where
    • ‘1-n’ -> The end node may have max 1 relation from the start node
    • ‘n-1’ -> The start node may have max one relation to the end node
    • ‘1-1’ -> The start and end node may have max one relationship of this type

Query Neo4j for Data

As you can see, generating demo data using neo4j-faker is pretty easy and quick.

The above method was implemented using procedures and functions. In my next blog, we will use properties files in ddgress to generate data.

Stay tuned …..

 158 total views