Categories
Data Engineering Neo4j Pandas

Neo4j + Pandas = Inline Image

Sometimes, we experience image URL in Neo4j data, and as a Data Engineer / Data Scientist, we would like to see the image. The query result set in Neo4j doesn’t has the capabilities to display.

Lets assume, we have a dataset, containing the URL of images from amazon.com as an attribute.

Jupyter notebook, CSV and loader script can be found at Github

My choice of python library to interact with Neo4j is py2neo. As always, pandas is most popular library for any data engineer / data scientist. To display inline images in Pandas dataframe, we need to import HTML method from IPython.display module.

import pandas as pd
from py2neo.database import Graph
import requests
import sys
from IPython.display import HTML

Connect to the graph database using

# initialize Graph context
graph = Graph('bolt://<ip_address>:7687', auth=('neo4j', '<password>'), name="<db_name>")

Create a dataframe

books = pd.DataFrame(graph.run("match (b:book) return b.isbn,b.title,b.yop,b.url").to_table(),columns=['isbn','title','yop','url']) 

books.head()

A small code-snippet function is created to get the relative path of the URL.

def path_to_image_html(path):
    return '<img src="'+ path + '"  >'

Now, we will utilized the HTML method from IPython to display the dataframe with images.

pd.set_option('display.max_colwidth', None)
display(HTML(books.to_html(escape=False ,formatters=dict(url=path_to_image_html))))

The resulting dataframe with images directly grabbed from the internet, not from local storage.

 126 total views,  1 views today

Categories
Cloud GCP

Google Cloud Platform – Compute Engine Management through CLI

Sometimes it much easy to create a GCP Compute Engines in Command Line Interface, than going through process of slow gcp portal.

One of the pre-requisite before creating compute engines, is to make sure gcloud command-line tools are installed. The GCP Cloud SDK can be installed from https://cloud.google.com/sdk

To create a compute engine execute, using the default zone

gcloud compute instances create <name>

More option available at https://cloud.google.com/sdk/gcloud/reference/compute/instances

Since we are aware of creating the compute engine using bare-minimum options, its always to reuse the code to orchestrate the creation process, than providing the various options everytime creating compute engine. So, creating a shell script to reuse the code is quick and easy. For demo purpose, n1-standard and europe-west2-a machine and zone are used.

-----------
Bash Script
-----------

create_vm.sh

echo "creating "$1 " compute engine"
gcloud compute instances create $1 --machine-type n1-standard-1 --zone europe-west2-a

chmod 755 create_vm.sh

Now its very easy and quick to create compute engines

CREATE DEV VM
-------------

./create_vm.sh dev
creating dev  compute engine
Created [https://www.googleapis.com/compute/v1/projects/abc-xyz/zones/europe-west2-a/instances/dev].
NAME  ZONE            MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP   STATUS
dev   europe-west2-a  n1-standard-1               10.154.0.9   34.89.55.187  RUNNING

CREATE TEST VM
---------------

./create_vm.sh test
creating test  compute engine
Created [https://www.googleapis.com/compute/v1/projects/abc-xyz/zones/europe-west2-a/instances/test].
NAME  ZONE            MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
test  europe-west2-a  n1-standard-1               10.154.0.10  35.197.212.108  RUNNING

CREATE PROD VM
---------------

./create_vm.sh prod

creating prod  compute engine
Created [https://www.googleapis.com/compute/v1/projects/abc-xyz/zones/europe-west2-a/instances/prod].
NAME  ZONE            MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
prod  europe-west2-a  n1-standard-1               10.154.0.11  35.197.198.16  RUNNING

To list the instances, execute gcloud compute instances list

gcloud compute instances list

NAME  ZONE            MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
dev   europe-west2-a  n1-standard-1               10.154.0.9   34.89.55.187    RUNNING
prod  europe-west2-a  n1-standard-1               10.154.0.11  35.197.198.16   RUNNING
test  europe-west2-a  n1-standard-1               10.154.0.10  35.197.212.108  RUNNING

To delete compute engines, execute gcloud compute instances delete

-----------
Bash Script
-----------

cat destroy_vm.sh

echo "delete  "$1 "compute engine"
gcloud compute instances delete $1  --zone europe-west2-a


DELETE DEV VM
./destroy_vm.sh dev

delete  dev compute engine
The following instances will be deleted. Any attached disks configured to be auto-deleted will be deleted unless they are attached to any other instances or the --keep-disks flag is given and specifies them for keeping. Deleting a disk is irreversible and any data on the disk will be lost.
[dev] in [europe-west2-a] 
Do you want to continue (Y/n)?  Y
Deleted [https://www.googleapis.com/compute/v1/projects/abc-xyz/zones/europe-west2-a/instances/dev].

DELETE TEST VM

./destroy_vm.sh test
delete  test compute engine
The following instances will be deleted. Any attached disks configured to be auto-deleted will be deleted unless they are attached to any other instances or the --keep-disks flag is given and specifies them for keeping. Deleting a disk is irreversible and any data on the disk will be lost.
[test] in [europe-west2-a] 
Do you want to continue (Y/n)?  Y
Deleted [https://www.googleapis.com/compute/v1/projects/abc-xyz/zones/europe-west2-a/instances/test].

DELETE PROD VM

./destroy_vm.sh prod
delete  prod compute engine
The following instances will be deleted. Any attached disks configured to be auto-deleted will be deleted unless they are attached to any other instances or the --keep-disks flag is given and specifies them for keeping. Deleting a disk is irreversible and any data on the disk will be lost.
[prod] in [europe-west2-a] 
Do you want to continue (Y/n)?  Y
Deleted [https://www.googleapis.com/compute/v1/projects/abc-xyz/zones/europe-west2-a/instances/prod].

 30 total views

Categories
Data Science python

Python – Cheat sheet

 16 total views,  1 views today

Categories
CentOS OS VM

VMware Workstation 16 Pro – VMware Tools – CentOS 7

VMware Tools are set of drivers and utilities that improves the performance of both virtual machine’s and guest operating system and enhances the interactions between the guest and the host operating system. VMware Tools are optional to install.

VMware Tools has three main components –

  1. VMware Device Drivers – ensures smooth mouse and keyboard operations, folder sharing and improve the performance of sound, graphics and network.
  2. VMware User Process – provides the users the functionality of shared clipboard for copy and paste between guest and host operating system. The program file for VMware User Process is called vmtoolsd.exe on Windows guest operating systems.
  3. VMware Services – improves the communications of resources between the guest and host operating systems. It also helps in synchronized time between the guest and host. vmtoolsd.exe in windows runs in the background for synchronizing.

Install VMware Tools in CentOS 7

  1. Make sure the Linux VM is powered on.
  2. Run yum update to make sure the Linux kernel and other packages are out-to-date.
  3. Right-click on the VM, and select “Install VMware Tools”
  4. To create a mount point, run: mkdir /mnt/cdrom
  5. To mount the CDROM, run: mount /dev/cdrom /mnt/cdrom
  6. Copy the gzip tar file to /tmp folder. run: cp /mnt/cdrom/VMwareTools-version.tar.gz /tmp/
  7. Change /tmp folder
  8. Untar the VMwareTools-version.tar.gz, using tar -zxvf VMwareTools-version.tar.gz
  9. Once extracted, change to vmware-tools-distrib directory and run: ./vmware-install.pl

While executing vmware-install.pl, there is a possibility of encountering an error like below –

./vmware-install.pl: ./vmware-install.real.pl: /usr/bin/perl: bad interpreter: No such file or directory

The above error is a due to CentOS 7 was installed in “minimal” installation mode, and perl package is not installed.

Install perl package by running yum install perl.

Once perl packages are installed, try to execute tools script by running ./vmware-install.pl command from the directory vmware-tools-distirb.

Accept the default options during the setup. The default options are already set to best performance. If needed, it can be modified according to the users preferences.

There is no need to reboot the VM CentOS image, because it automatically detects the guest operating system. But, it is always advisable to restart the kernel for best performance and optimization.

 322 total views

Categories
Docker Neo4j

Neo4j – Seed Docker with Data

Sometimes, during the project lifecycle, there is a need to quickly start a Neo4j docker with seeded data for QA or UAT environments. Creating a “vanilla” neo4j docker and executing all the data loader cypher queries takes huge amount of time.

To save time, we can bootstrap or seed the docker with all the required data.

Docker “COPY” command can copy a file from current working directory to a folder inside docker.

Docker “RUN” command can execute shell scripts. Note – There can be only ONE “RUN” command.

github

To load the data during the start, the neo4j initial password needs to be set before starting neo4j.

for demo, we will load countries.csv file into neo4j. This file is saved in the current directory.

id,name
AF,Afghanistan
AL,Albania
DZ,Algeria
AS,American Samoa
AD,Andorra
AO,Angola
AI,Anguilla
AQ,Antarctica
AG,Antigua And Barbuda

to load the countries.csv file, we will create the respective cypher query.

LOAD CSV WITH HEADERS FROM 'file:///countries.csv' AS row
WITH row WHERE row.id IS NOT NULL
MERGE (c:Country {id:row.id,countryName: row.name});

then we create a Dockerfile to copy the csv and cypher query into docker and execute cypher-shell.

FROM neo4j

ENV NEO4J_HOME="/var/lib/neo4j" \
    NEO4J_PASSWD=neo4j_seed
    
COPY countries.csv ${NEO4J_HOME}/import/
COPY data_loader.cypher ${NEO4J_HOME}/import/

# set initial-password to start loading the data
# sleep for 10 secs for neo4j to start without any overlapping

CMD bin/neo4j-admin set-initial-password ${NEO4J_PASSWD} && \
    bin/neo4j start && sleep 10 && \
    if [ -f "${NEO4J_HOME}/import/data_loader.cypher" ]; then  \
        cat ${NEO4J_HOME}/import/data_loader.cypher | NEO4J_USERNAME=neo4j NEO4J_PASSWORD=${NEO4J_PASSWD} bin/cypher-shell --fail-fast && rm ${NEO4J_HOME}/import/*; \
    fi && /bin/bash

build the dockerfile

docker build -t neo4j:seed . 

+] Building 11.7s (9/9) FINISHED
  => [internal] load build definition from Dockerfile                                                                                                                 0.0s 
  => => transferring dockerfile: 686B                                                                                                                                 0.0s 
  => => transferring context: 2B                                                                                                                                      0.0s 
  => [internal] load metadata for docker.io/library/neo4j:latest                                                                                                     11.3s 
  => [auth] library/neo4j:pull token for registry-1.docker.io                                                                                                         0.0s 
  => [1/3] FROM docker.io/library/neo4j@sha256:c7f24de1dc1d2020ab24a884b8a39538937c1b14bc0ca1da3ddb2573b6fc412f                                                       0.0s 
  => [internal] load build context                                                                                                                                    0.0s 
  => => transferring context: 230B                                                                                                                                    0.0s 
  => CACHED [2/3] COPY countries.csv /var/lib/neo4j/import/                                                                                                           0.0s 
  => [3/3] COPY data_loader.cypher /var/lib/neo4j/import/                                                                                                             0.1s 
  => exporting to image                                                                                                                                               0.1s 
  => => exporting layers                                                                                                                                              0.1s 
  => => writing image sha256:ac7113b7e0ae6abe7145f2d112dfbbe9b45aa6c6eb4e4147cfffbff691185cde                                                                         0.0s 
  => => naming to docker.io/library/neo4j:seed                                                                                                                        0.0s 

Once the build is successful, run the tagged “neo4j:seed” image

docker run -it -d  neo4j:seed
6c848fee3c728333deff359ed8ec5ef400c4e063ad610e2ebb42f046d9009561

Verify the data –

PS C:\Users\domin> docker ps
 CONTAINER ID   IMAGE        COMMAND                  CREATED         STATUS         PORTS                     NAMES
 6c848fee3c72   neo4j:seed   "/sbin/tini -g -- /d…"   7 seconds ago   Up 6 seconds   7473-7474/tcp, 7687/tcp   ecstatic_neumann


 PS C:\Users\domin> docker exec -it 6c848fee3c72 cypher-shell
 username: neo4j
 password: **
 Connected to Neo4j 4.2.0 at neo4j://localhost:7687 as user neo4j.
 Type :help for a list of available commands or :exit to exit the shell.
 Note that Cypher queries must end with a semicolon.
 neo4j@neo4j>
 neo4j@neo4j>
 neo4j@neo4j> match (n) return (n);
 +-----------------------------------------------------------+
 | n                                                         |
 +-----------------------------------------------------------+
 | (:Country {id: "AF", countryName: "Afghanistan"})         |
 | (:Country {id: "AL", countryName: "Albania"})             |
 | (:Country {id: "DZ", countryName: "Algeria"})             |
 | (:Country {id: "AS", countryName: "American Samoa"})      |
 | (:Country {id: "AD", countryName: "Andorra"})             |
 | (:Country {id: "AO", countryName: "Angola"})              |
 | (:Country {id: "AI", countryName: "Anguilla"})            |
 | (:Country {id: "AQ", countryName: "Antarctica"})          |
 | (:Country {id: "AG", countryName: "Antigua And Barbuda"}) |
 +-----------------------------------------------------------+
 9 rows available after 42 ms, consumed after another 4 ms
 neo4j@neo4j>

From the data, it is verified that the data is seeded / bootstrapped with neo4j database.

Happy Graphing …..

 384 total views

Categories
Neo4j

Neo4j Blog – Featured Community Member

Featured as a Community Member in Neo4j

 43 total views

Categories
Docker Neo4j

Neo4j Cluster(apoc+gds) Docker with Portainer

Like most of the RDBMS and NoSQL Databases, Neo4j also provides Clustering. Clustering provides three main features –

High Availability – Always available even if there are node failures.

Horizontal Scalability – Read Only Replicas distribute loads isolated from write nodes.

Consistency – when enabled, the client application call is guaranteed to read at least its own successful writes.

causal clustering

More documentation for Neo4j Casual Cluster can be found at Neo4j Official documentation.

For demo purpose, we will create a 3-Node Neo4j Cluster with apoc and graph data science latest plugin using Docker along with Portainer.

Portainer is an open-source and lightweight management UI which allows to easily manage Docker environments.

Github link -> docker-compose.yml

version: "3.8"

services:
  core1:
    hostname: core1
    image: neo4j:enterprise
    networks:
      - neo4j_cluster_ntx
    container_name: core1
    volumes:
      - ./core1/neo4j/data:/var/lib/neo4j/data
      - ./core1/neo4j/import:/var/lib/neo4j/import
    environment:
      - NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
      - NEO4JLABS_PLUGINS=["apoc","graph-data-science"]
      - NEO4J_dbms_default__listen__address=0.0.0.0
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*
      - NEO4J_dbms_security_procedures_allowlist=gds.*,apoc.*
      - NEO4J_dbms_mode=CORE
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__formation=3
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__runtime=3
      - NEO4J_causal__clustering_discovery__advertised__address=core1:5000
      - NEO4J_causal__clustering_transaction__advertised__address=core1:6000
      - NEO4J_causal__clustering_raft__advertised__address=core1:7000
      - NEO4J_causal__clustering_initial__discovery__members=core1:5000,core2:5000,core3:5000
      - NEO4J_causal__clustering_disable__middleware__logging=false
    ports:
      - 7474:7474
      - 6477:6477
      - 7687:7687
  core2:
    hostname: core2
    image: neo4j:enterprise
    networks:
      - neo4j_cluster_ntx
    container_name: core2
    volumes:
      - ./core2/neo4j/data:/var/lib/neo4j/data
      - ./core2/neo4j/import:/var/lib/neo4j/import
    environment:
      - NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
      - NEO4JLABS_PLUGINS=["apoc","graph-data-science"]
      - NEO4J_dbms_default__listen__address=0.0.0.0
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*
      - NEO4J_dbms_security_procedures_allowlist=gds.*,apoc.*
      - NEO4J_dbms_mode=CORE
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__formation=3
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__runtime=3
      - NEO4J_causal__clustering_discovery__advertised__address=core2:5000
      - NEO4J_causal__clustering_transaction__advertised__address=core2:6000
      - NEO4J_causal__clustering_raft__advertised__address=core2:7000
      - NEO4J_causal__clustering_initial__discovery__members=core1:5000,core2:5000,core3:5000
      - NEO4J_causal__clustering_disable__middleware__logging=false
      - NEO4J_dbms_connector_http_listen__address=:7475
      - NEO4J_dbms_connector_https_listen__address=:6478
      - NEO4J_dbms_connector_bolt_listen__address=:7688
    ports:
      - 7475:7475
      - 6478:6478
      - 7688:7688
  core3:
    hostname: core3
    image: neo4j:enterprise
    networks:
      - neo4j_cluster_ntx
    container_name: core3
    volumes:
      - ./core3/neo4j/data:/var/lib/neo4j/data
      - ./core3/neo4j/import:/var/lib/neo4j/import
    environment:
      - NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
      - NEO4JLABS_PLUGINS=["apoc","graph-data-science"]
      - NEO4J_dbms_default__listen__address=0.0.0.0
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*
      - NEO4J_dbms_security_procedures_allowlist=gds.*,apoc.*
      - NEO4J_dbms_mode=CORE
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__formation=3
      - NEO4J_causal__clustering_minimum__core__cluster__size__at__runtime=3
      - NEO4J_causal__clustering_discovery__advertised__address=core3:5000
      - NEO4J_causal__clustering_transaction__advertised__address=core3:6000
      - NEO4J_causal__clustering_raft__advertised__address=core3:7000
      - NEO4J_causal__clustering_initial__discovery__members=core1:5000,core2:5000,core3:5000
      - NEO4J_causal__clustering_disable__middleware__logging=false
      - NEO4J_dbms_connector_http_listen__address=:7476
      - NEO4J_dbms_connector_https_listen__address=:6479
      - NEO4J_dbms_connector_bolt_listen__address=:7689
    ports:
      - 7476:7476
      - 6479:6479
      - 7689:7689
  portainer:
    image: portainer/portainer
    networks:
      - neo4j_cluster_ntx
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./pt-data:/data
    ports:
      - "9000:9000"
networks:
  neo4j_cluster_ntx:

docker-compose up takes some time to completely setup and bootstrap the data on all neo4j nodes.

Note During the build process, at the portainer step, you have to login to localhost:9000 to create the admin user within 5 mins.

Neo4j Cluster

Portainer Setup

Portainer setup –

Create a user, and in the next screen, select “Local” as we are using an open-source portainer setup, and click “Connect”.

once the setup is finished, the portainer dashboard is displayed.

Selecting the containers from the left menu, shows various containers

Portainer has a great feature for performance metrics. It can display CPU, Memory and Network usage with refresh rate start from 1sec to 1 min

Neo4j Cluster

After few mins of bootstrapping and syncing with the 3 nodes, the neo4j cluster is ready to use. (wait for Remote interface available)

 core1        | 2021-01-18 00:37:01.407+0000 INFO  Called db.clearQueryCaches(): Query cache already empty.
 core2        | 2021-01-18 00:37:04.411+0000 INFO  Started downloading snapshot for database 'neo4j'…
 core3        | 2021-01-18 00:37:04.415+0000 INFO  Started downloading snapshot for database 'neo4j'…
 core2        | 2021-01-18 00:37:08.827+0000 INFO  Download of snapshot for database 'neo4j' complete.
 core3        | 2021-01-18 00:37:08.912+0000 INFO  Download of snapshot for database 'neo4j' complete.
 core3        | 2021-01-18 00:37:12.648+0000 INFO  Called db.clearQueryCaches(): Query cache already empty.
 core2        | 2021-01-18 00:37:13.007+0000 INFO  Called db.clearQueryCaches(): Query cache already empty.
 core3        | 2021-01-18 00:37:16.529+0000 INFO  Connected to core2/172.21.0.4:7000 [raft version:3.0]
 core2        | 2021-01-18 00:37:16.611+0000 INFO  Connected to core3/172.21.0.5:7000 [raft version:3.0]
 core1        | 2021-01-18 00:37:20.275+0000 INFO  Sending metrics to CSV file at /var/lib/neo4j/metrics
 core1        | 2021-01-18 00:37:20.349+0000 INFO  Bolt enabled on 0.0.0.0:7687.
 core1        | 2021-01-18 00:37:22.063+0000 INFO  Remote interface available at http://localhost:7474/
 core1        | 2021-01-18 00:37:22.064+0000 INFO  Started.
 core2        | 2021-01-18 00:37:31.467+0000 INFO  Sending metrics to CSV file at /var/lib/neo4j/metrics
 core3        | 2021-01-18 00:37:31.485+0000 INFO  Sending metrics to CSV file at /var/lib/neo4j/metrics
 core2        | 2021-01-18 00:37:31.553+0000 INFO  Bolt enabled on 0.0.0.0:7688.
 core3        | 2021-01-18 00:37:31.561+0000 INFO  Bolt enabled on 0.0.0.0:7689.
 core2        | 2021-01-18 00:37:33.468+0000 INFO  Remote interface available at http://localhost:7475/
 core2        | 2021-01-18 00:37:33.469+0000 INFO  Started.
 core3        | 2021-01-18 00:37:33.471+0000 INFO  Remote interface available at http://localhost:7476/
 core3        | 2021-01-18 00:37:33.472+0000 INFO  Started.

Open a web-browser and navigate to localhost:7474 to access the core1 node (default username and password is “neo4j”). After setting the neo4j password, execute call dbms.cluster.overview()

╒══════════════════════════════════════╤═════════════════════════════════════════════════╤════════════════════════════════════════╤════════╕
│"id"                                  │"addresses"                                      │"databases"                             │"groups"│
╞══════════════════════════════════════╪═════════════════════════════════════════════════╪════════════════════════════════════════╪════════╡
│"a861c553-4cd5-4f8b-baae-3279fa92c65a"│["bolt://localhost:7688","http://localhost:7475"]│{"neo4j":"FOLLOWER","system":"FOLLOWER"}│[]      │
├──────────────────────────────────────┼─────────────────────────────────────────────────┼────────────────────────────────────────┼────────┤
│"e38564bb-1f42-4738-9629-29e5cf1f0bb3"│["bolt://localhost:7689","http://localhost:7476"]│{"neo4j":"LEADER","system":"FOLLOWER"}  │[]      │
├──────────────────────────────────────┼─────────────────────────────────────────────────┼────────────────────────────────────────┼────────┤
│"2614662c-7f3a-4024-8900-f3555e26630d"│["bolt://localhost:7687","http://localhost:7474"]│{"neo4j":"FOLLOWER","system":"LEADER"}  │[]      │
└──────────────────────────────────────┴─────────────────────────────────────────────────┴────────────────────────────────────────┴────────┘

 166 total views

Categories
Docker Neo4j

Neo4j 4.x + GraphAware UUID

Starting from Nov 2020, GraphAware has started to support GraphAware framework and UUID for Neo4j 4.x, although rest of the products like recommendation-engine, elasticsearch, expire, resttest, timetree and triggers still support Neo4j 3.x only.

Natively Neo4j supports creation of UUID (v4) through cypher, but they either have to be created during data insert or running bulk inserts via apoc and there isn’t much control of type of UUID’s creation.

Default Neo4j’s UUID creates a “-” in-between the ID’s.

One of the alternate solution is to add GraphAware UUID plugin. GraphAware UUID can create UUID on all nodes (or) specific nodes labels and on relationships as per specification in graphaware.conf file.

For the purpose of the Demo, docker is used to create Neo4j 4.2.2 community version instance. For Neo4j Enterprise, you need to contact GraphAware for License.

(for Non-Docker) Installation is pretty easy. Just add the respective jar versions of GraphAware Framework and GraphAware UUID to the neo4j plugins folder.

Direct Download Link -> https://products.graphaware.com/

I have created a Dockerfile and docker-compose.yml that will automatically download the plugins and start the database.

Github

Dockerfile

FROM neo4j
ENV NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
    NEO4J_HOME="/var/lib/neo4j"

COPY ./graphaware.conf "${NEO4J_HOME}"/conf
ADD https://neo4j-plugins-public.s3.eu-west-1.amazonaws.com/graphaware-server-community-4.2.0.58.jar "${NEO4J_HOME}"/plugins
ADD https://neo4j-plugins-public.s3.eu-west-1.amazonaws.com/graphaware-uuid-4.1.4.58.19.jar "${NEO4J_HOME}"/plugins

docker-compose.yml

version: '3'

services:
  neo4j:
    build: .
    hostname: neo4j_graphaware
    container_name: neo4j_graphaware
    volumes:
      - ./neo4j/data:/var/lib/neo4j/data
      - ./neo4j/import:/var/lib/neo4j/import
    environment:
      - NEO4J_dbms_default__listen__address=0.0.0.0
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*
      - NEO4J_dbms_security_procedures_allowlist=gds.*,apoc.*
    ports:
      - "7474:7474"
      - "7687:7687"
      - "7473:7473"

The next step is to configure the UUID generation. For the demo purpose, we will generate the UUID for node -> Person, relationship -> REPORTS_TO and UUID without “-” , in the graphaware.conf, which would be later copied into the conf folder by docker.

graphaware.conf

#UIDM becomes the module ID:
com.graphaware.module.neo4j.UIDM.1=com.graphaware.module.uuid.UuidBootstrapper

#optional, default is uuid:
com.graphaware.module.neo4j.UIDM.uuidProperty=uuid

#optional, default is false:
com.graphaware.module.neo4j.UIDM.stripHyphens=true

#optional, default is all nodes:
com.graphaware.module.neo4j.UIDM.node=hasLabel('Person')

#optional, default is no relationships:
com.graphaware.module.neo4j.UIDM.relationship=isType('REPORTS_TO')

Start the neo4j with docker-compose up, and monitor any errors during startup. A successful startup of Neo4j along with GraphAware Framework and UUID should look like below.

Creating network "graphaware_default" with the default driver
Building neo4j
Step 1/5 : FROM neo4j

Step 2/5 : ENV NEO4J_ACCEPT_LICENSE_AGREEMENT=yes     NEO4J_HOME="/var/lib/neo4j"

 ---> a1acb75ca51d
Step 3/5 : COPY ./graphaware.conf "${NEO4J_HOME}"/conf

 ---> 74b7274bd560
Step 4/5 : ADD https://neo4j-plugins-public.s3.eu-west-1.amazonaws.com/graphaware-server-community-4.2.0.58.jar "${NEO4J_HOME}"/plugins


 ---> 644f9f5eba63
Step 5/5 : ADD https://neo4j-plugins-public.s3.eu-west-1.amazonaws.com/graphaware-uuid-4.1.4.58.19.jar "${NEO4J_HOME}"/plugins

 
 ---> e097a691e2e4

Successfully built e097a691e2e4
Successfully tagged graphaware_neo4j:latest
Creating neo4j_graphaware ... done
Attaching to neo4j_graphaware
neo4j_graphaware | Directories in use:
neo4j_graphaware |   home:         /var/lib/neo4j
neo4j_graphaware |   config:       /var/lib/neo4j/conf
neo4j_graphaware |   logs:         /logs
neo4j_graphaware |   plugins:      /var/lib/neo4j/plugins
neo4j_graphaware |   import:       /var/lib/neo4j/import
neo4j_graphaware |   data:         /var/lib/neo4j/data
neo4j_graphaware |   certificates: /var/lib/neo4j/certificates
neo4j_graphaware |   run:          /var/lib/neo4j/run
neo4j_graphaware | Starting Neo4j.
neo4j_graphaware | 2021-01-13 00:14:04.193+0000 WARN  Unrecognized setting. No declared setting with name: metrics.enabled
neo4j_graphaware | 2021-01-13 00:14:04.207+0000 INFO  Starting...
neo4j_graphaware | 2021-01-13 00:14:05.807+0000 INFO  ======== Neo4j 4.2.2 ========
neo4j_graphaware | 2021-01-13 00:14:07.095+0000 INFO  GraphAware Runtime disabled for database system.
neo4j_graphaware | 2021-01-13 00:14:08.931+0000 INFO  Performing postInitialization step for component 'security-users' with version 2 and status CURRENT
neo4j_graphaware | 2021-01-13 00:14:08.931+0000 INFO  Updating the initial password in component 'security-users'
neo4j_graphaware | 2021-01-13 00:14:09.516+0000 INFO  GraphAware Runtime enabled for database neo4j, bootstrapping...
neo4j_graphaware | 2021-01-13 00:14:09.528+0000 INFO  Bootstrapping module with order 1, ID UIDM, using com.graphaware.module.uuid.UuidBootstrapper for database neo4j
neo4j_graphaware | 2021-01-13 00:14:09.531+0000 INFO  Node Inclusion Policy set to com.graphaware.common.policy.inclusion.composite.CompositeNodeInclusionPolicy@21e78ee8       
neo4j_graphaware | 2021-01-13 00:14:09.533+0000 INFO  Relationship Inclusion Policy set to com.graphaware.common.policy.inclusion.composite.CompositeRelationshipInclusionPolicy@658e6daa@658e6daa
neo4j_graphaware | 2021-01-13 00:14:09.534+0000 INFO  uuidProperty set to uuid
neo4j_graphaware | 2021-01-13 00:14:09.534+0000 INFO  stripHyphens set to true
neo4j_graphaware | 2021-01-13 00:14:09.535+0000 INFO  Registering module UIDM with GraphAware Runtime for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.536+0000 INFO  GraphAware Runtime bootstrapped for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.578+0000 INFO  Starting GraphAware Runtime for database neo4j...
neo4j_graphaware | 2021-01-13 00:14:09.578+0000 INFO  Starting GraphAware Runtime modules for database neo4j...
neo4j_graphaware | 2021-01-13 00:14:09.579+0000 INFO  Starting module UIDM for database neo4j...
neo4j_graphaware | 2021-01-13 00:14:09.584+0000 INFO  Started module UIDM for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.586+0000 INFO  GraphAware Runtime modules started for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.586+0000 INFO  Started GraphAware Runtime for database neo4j.
neo4j_graphaware | 2021-01-13 00:14:09.668+0000 INFO  Bolt enabled on 0.0.0.0:7687.
neo4j_graphaware | 2021-01-13 00:14:10.518+0000 INFO  Remote interface available at http://localhost:7474/
neo4j_graphaware | 2021-01-13 00:14:10.519+0000 INFO  Started.                                                                                                                  

Node UUID Generation Testing

1. Create a sample data with label Person, to verify if UUID is created.

MERGE (n:Person {name: "Dominic"}) RETURN n;
╒══════════════════╕
│"n"               │
╞══════════════════╡
│{"name":"Dominic"}│
└──────────────────┘

match (n:Person) return labels(n),n
╒═══════════╤════════════════════════════════════════════════════════════╕
│"labels(n)"│"n"                                                         │
╞═══════════╪════════════════════════════════════════════════════════════╡
├───────────┼────────────────────────────────────────────────────────────┤
│["Person"] │{"name":"Dominic","uuid":"9a77dfec7b95484eb9f22c848b0ad8f5"}│
└───────────┴────────────────────────────────────────────────────────────┘

MERGE (n:Person {name: "Kumar"}) RETURN n;
╒════════════════╕
│"n"             │
╞════════════════╡
│{"name":"Kumar"}│
└────────────────┘


match (n:Person) return labels(n),n;
╒═══════════╤════════════════════════════════════════════════════════════╕
│"labels(n)"│"n"                                                         │
╞═══════════╪════════════════════════════════════════════════════════════╡
│["Person"] │{"name":"Kumar","uuid":"5f36188e4079406d9c202c4e0feaf785"}  │
├───────────┼────────────────────────────────────────────────────────────┤
│["Person"] │{"name":"Dominic","uuid":"9a77dfec7b95484eb9f22c848b0ad8f5"}│
└───────────┴────────────────────────────────────────────────────────────┘

the above output shows that UUID is created for label Person automatically as specified in the graphaware configuration file.

2. We will test if UUID is created for any other labels, not specified in the graphaware configuration file.

merge (s:State{name:"Texas"}) return s
╒════════════════╕
│"s"             │
╞════════════════╡
│{"name":"Texas"}│
└────────────────┘

match (n:State) return labels(n),n
╒═══════════╤════════════════╕
│"labels(n)"│"n"             │
╞═══════════╪════════════════╡
│["State"]  │{"name":"Texas"}│
└───────────┴────────────────┘

from the above output, it clearly shows that UUID is not created for Label-> State, because, it was not specified in the graphaware configuration file.

Relationship UUID Generation Testing

To test the automatic UUID generation for relationship, we will create two types of relationships – (1) RESIDES_IN and (2) REPORTS_TO, the former is Not specified in the graphaware.conf configuration file, whereas the latter is.

match (p:Person{name:"Dominic"}), (s:State{name:"Texas"}) merge (p)-[:RESIDES_IN]->(s)

Created 1 relationship, completed after 33 ms.


match (p1:Person{name:"Dominic"}), (p2:Person{name:"Kumar"}) merge (p1)-[:REPORTS_TO]->(p2)

Created 1 relationship, completed after 8 ms.


match (m)-[r]->(n)   return labels(m) as labels_1 ,m.name ,type(r) as rel_label ,r as rel ,labels(n) as labels_2 ,n.name


╒══════════╤═════════╤════════════╤═══════════════════════════════════════════╤══════════╤════════╕
│"labels_1"│"m.name" │"rel_label" │"rel"                                      │"labels_2"│"n.name"│
╞══════════╪═════════╪════════════╪═══════════════════════════════════════════╪══════════╪════════╡
│["Person"]│"Dominic"│"RESIDES_IN"│{}                                         │["State"] │"Texas" │
├──────────┼─────────┼────────────┼───────────────────────────────────────────┼──────────┼────────┤
│["Person"]│"Dominic"│"REPORTS_TO"│{"uuid":"2596a1bf5bd14a148571550ce275d743"}│["Person"]│"Kumar" │
└──────────┴─────────┴────────────┴───────────────────────────────────────────┴──────────┴────────┘

As you can see, UUID is automatically created for relationship REPORTS_TO only.

Happy Graphing …..

 453 total views,  1 views today

Categories
Data Engineering Docker Neo4j

Neo4j Spatial Docker

Docker Hub

Starting from Neo4j 4.x, the Neo4j Spatial plugin is incompatible and it will fail to start the database. So, I have a created an Docker image that creates a Neo4j 3.5.25 image along with all the plugins that are required for Spatial queries.

Docker pull command

docker pull dominicvivek06/neo4j_spatial

docker-compose.yml

version: '3'

services:
  neo4j:
    image: dominicvivek06/neo4j_spatial
    hostname: neo4j_35_spatial
    container_name: neo4j_35_spatial
    volumes:
      - ./neo4j/data:/var/lib/neo4j/data
      - ./neo4j/import:/var/lib/neo4j/import
    environment:
      - NEO4J_dbms_connectors_default__listen__address=0.0.0.0
      - NEO4J_metrics_enabled=false 
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=2G
      - NEO4J_dbms_memory_heap_max__size=4G
      - NEO4J_dbms_directories_import=/var/lib/neo4j/import
      - NEO4J_apoc_spatial_geocode_provider=osm
      - NEO4J_apoc_spatial_geocode_osm_throttle=5000
      - NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*,spatial.*
      - NEO4J_dbms_security_procedures_whitelist=gds.*,apoc.*,spatial.*
    ports:
      - "7474:7474"
      - "7687:7687"
      - "7473:7473"

Dockerfile

FROM alpine:latest
WORKDIR /tmp
ADD https://s3-eu-west-1.amazonaws.com/com.neo4j.graphalgorithms.dist/graph-data-science/neo4j-graph-data-science-1.1.6-standalone.zip /tmp
RUN unzip neo4j-graph-data-science-1.1.6-standalone.zip


FROM neo4j:3.5-enterprise
ENV NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
    NEO4J_HOME="/var/lib/neo4j"
    
ADD https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.5.0.13/apoc-3.5.0.13-all.jar "${NEO4J_HOME}"/plugins
ADD https://github.com/neo4j-contrib/spatial/releases/download/0.26.2-neo4j-3.5.2/neo4j-spatial-0.26.2-neo4j-3.5.2-server-plugin.jar "${NEO4J_HOME}"/plugins
COPY --from=0 /tmp/*.jar "${NEO4J_HOME}"/plugins

Anyone can download the files from my github.

Note – The above docker image assumes you have a valid Neo4j License for production. If not, please, contact Neo4j.

 198 total views,  1 views today

Categories
SQL Server

SQL Join

 83 total views