Categories
Neo4j python

Neo4j – Pivot Functionality

Neo4j lacks Pivot functionality, so I have created a simple demo in Python jupyter notebook to illustrate, how it can be worked out.

Github URL -> https://github.com/dominicvivek06/neo4j/tree/master/community/pivot_neo4j

 136 total views,  2 views today

Categories
aws Data Engineering Neo4j

AWS Translate + Python + Neo4j

When working with multi-region applications, sometimes there is a need to store regional translated language in the database to be displayed in the frontend.

In my previous blog -> Link, I explained how to use the AWS Translate function.

In the current article, I have created a sample data engineering process, that loads list of countries from a csv file into neo4j. Using the update_lang_property , any node and its properties can be updated with the corresponding language provided through the update_lang_property function.

For full languages and codes refer to -> AWS Translate documentation

update_lang_property(node_name,column_name,src_lang_code,language,lang_code) takes 5 required arguments

  • node_name -> name of the node or label to be updated (required)
  • column_name -> attribute name to be translated (required)
  • src_lang_code -> source language code. for code refer to AWS Translate documentation
  • language -> full text of language. for full text refer to AWS Translate documentation
  • lang_code -> the language code for translation

A complete jupyter notebook can be downloaded from my -> github page.

After loading CSV file to Neo4j Database –

All the attributes / properties on “Country” node –

After the function is executed, the new attribute / property is added to the node along with column_name:code key.

and the result is (results filtered for readability)-

For the demo, I have translated English to Tamil (my native language).

More information on Tamil language –

Wikipedia -> Tamil

Tamil Anthem -> YouTube

Neo4j Global Database Celebration Day (2019), Chennai, Tamil Nadu, India. -> Github

 359 total views,  1 views today

Categories
aws machine learning

AWS Language Translator

Amazon Translate is a neural machine translation service for translating text to and from English across a breadth of supported languages. Powered by deep-learning technologies, Amazon Translate delivers fast, high-quality, and affordable language translation. It provides a managed, continually trained solution so you can easily translate company and user-authored content or build applications that require support across multiple languages. The machine translation engine has been trained on a wide variety of content across different domains to produce quality translations that serve any industry need.

AWS Translate Developer Guide

  • Prerequisites –
    • AWS SDK installed for python – boto. Documentation
    • An AWS IAM Account(programmatic) with TranslateFullAccess permission.

boto3 uses translate_text() method to translate the given string, its source language and into the target language. AWS Method Documentation

blog_post

 252 total views

Categories
python ubuntu

Ubuntu Python-3 Configuration

# Remove exiting python2 version
sudo apt purge -y python2.7-minimal

#Soft link "python3" to python executable
sudo ln -s /usr/bin/python3 /usr/bin/python
# softlink for pip3 to pip
sudo apt install -y python3-pipsudo ln -s /usr/bin/pip3 /usr/bin/pip
# Confirm the new version of Python: 3
python --version

 76 total views

Categories
Neo4j

Neo4j 4.2.0 (17th Nov 2020)

Highlights

Administration

  • ALIGNED store format – A new store format that reduces the total number of I/O operations can be set at startup for new databases.
  • Procedures to observe the internal scheduler – New functions to observe the execution of background tasks have been introduced.
  • Dynamic settings at startup – Configuration can be set using the new –expand-commands argument and by executing calls to external components prior to startup.
  • WAIT/NOWAIT in Database Management – Command to manage databases (CREATE/DROP/START/STOP) can be executed with a WAIT or NOWAIT option. With WAIT, a user can wait up to the completion of the command, a timeout or a given number of seconds.
  • Index and constraint administration commands – Cypher provides commands to create, drop and view indexes and constraints (CREATE/DROP/SHOW INDEX/CONSTRAINT).
  • Filtering in SHOW commands – Cypher SHOW commands provide a new, simple way to retrieve a selection of columns, filter and aggregate the results.
  • Backup/Restore improvements – DBAs can create backups and restore databases in a multi-database environment with more sophisticated features: metadata can be saved aside of a database and applied to their restore on demand, multiple or all databases can be backed up or restored.
  • Compress metrics on rotation – CSV metric files can be compressed on rotation.
  • Database namespace for metrics – Metrics can be organized per database on request.
  • neo4j-admin improvements – The tool has improvements in operations for “copy”, “store-info” and “memrec”.
  • HTTP port selective settings – The HTTP ports can be enabled or disabled separately for Browser, HTTP API, transactional endpoints, management endpoints and unmanaged extensions.

Causal Cluster

  • Run/Pause Read Replicas – The replication of individual databases can be paused or resumed in read replicas.
  • Database quarantine – Databases with internal errors can be selectively quarantined on a member of the cluster.

Cypher

  • Planner improvements – Cypher planner has extended the use of the “index-backed order by” feature, i.e. it may set more efficient plans in more cases when an ORDER BY clause is present and it is supported by an index.
  • Octal literals – Octal numbers in Cypher queriey in Neo4j 4.2 start with ‘0o’ – Cypher .

Functions and Procedures

  • round() function – round() has been improved to select the precision of the returned value.
  • dbms.functions() procedure – functions are organised in categories.

Security

  • Procedures and user defined function privileges – DBAs can grant, deny or revoke access for user to specific procedures and user-defined functions. Users may also have access with boosted privileges, compared to their current security profile.
  • Role-Based Access Control Default graph – Permissions can be granted, denied or revoked against the default graph, regardless of the default setting.
  • PLAINTEXT and ENCRYPTED password in user creation – Passwords can be set in plain text or in a one-way encrypted format on request.
  • SHOW CURRENT USER – Users can see the profile of their current user.
  • SHOW PRIVILEGES as commands – DBAs can visualize the commands to execute to recreate a security profile.
  • OCSP stapling support for Java driver – The driver provides support for OCSP stapling.

Important Changes

We made some changes in the behavior of new Neo4j installations. In general, upgrades are not affected by the change, but please check carefully this list, but there two exceptions:

  1. The metrics.filter setting has been introduced and by default it reduces the number of metrics collected also in upgraded installations.
  2. JMX metrics, governed by metrics.jmx.enabled is set to false by default.

This is the complete list of changes:

  • metrics.csv.interval – In new installations, the default value is 30 seconds (it was 3 seconds).
  • metrics.csv.rotation.compression – This is a new parameter, CSV metric files are compressed on rotation by default (they were not compressed in previous versions).
  • metrics.jmx.enabled – In new installations, JMX metrics are disabled by default.
  • metrics.filter – The number of metrics set by default has been reduced. The current default set includes:
    • bolt.connections
    • bolt.messages_received
    • bolt.messages_started
    • dbms.pool.bolt.free
    • *dbms.pool.bolt.total_size
    • *dbms.pool.bolt.total_used
    • *dbms.pool.bolt.used_heap
    • *causal_clustering.core.is_leader
    • *causal_clustering.core.last_leader_message
    • *causal_clustering.core.replication_attempt
    • *causal_clustering.core.replication_fail
    • *check_point.duration
    • *check_point.total_time
    • *cypher.replan_events
    • *ids_in_use.node
    • *ids_in_use.property
    • *ids_in_use.relationship
    • *pool.transaction..total_used
    • pool.transaction..used_heap
    • pool.transaction..used_native
    • store.size
    • transaction.active_read
    • *transaction.active_write
    • *transaction.committed
    • transaction.last_committed_tx_id
    • *transaction.peak_concurrent
    • *transaction.rollbacks
    • page_cache.hit
    • page_cache.page_faults
    • *page_cache.usage_ratio
    • *vm.file.descriptors.count
    • *vm.gc.time
    • vm.heap.used
    • *vm.memory.buffer.direct.used
    • *vm.memory.pool.g1_eden_space
    • *vm.memory.pool.g1_old_gen
    • *vm.pause_time
    • *vm.thread

Deprecations

The following procedures and functions have been deprecated:

  • “whitelist” settings have been replaced by “allowlist” settings: dbms.dynamic.setting.whitelist, dbms.memory.pagecache.warmup.preload.whitelist, dbms.security.procedures.whitelist and dbms.security.http_auth_whitelist.
  • Log rotation delay settings: dbms.logs.security.rotation.delay, dbms.logs.user.rotation.delay, dbms.logs.debug.rotation.delay. These settings are no longer needed after replacing the custom logging framework with log4j.
  • Log and logger interfaces: debugLogger, InforLogger, warnLogger, errorLogger and bulk method on the Log interface. The debug ,info, warn and error methods on the Log interface can be used directly instead.
  • Octal syntax: writing octals with syntax 0… is deprecated in favor of new 0o….
  • Hexadecimal syntax: writing hexadecimals with syntax 0X… is deprecated since 0x… is favored
  • db.createIndex(): deprecated in favor of the CREATE INDEX command
  • db.createNodeKey(): deprecated in favor of the CREATE CONSTRAINT … IS NODE KEY command
  • db.createUniquePropertyConstraint(): deprecated in favor of the CREATE CONSTRAINT … IS UNIQUE command
  • db.indexes(): replaced by SHOW INDEXES
  • db.indexDetails(): replaced by SHOW INDEXES VERBOSE OUTPUT
  • db.constraints(): replaced by SHOW CONSTRAINTS
  • db.schemaStatements(): replaced by SHOW INDEXES VERBOSE OUTPUT and SHOW CONSTRAINTS VERBOSE OUTPUT
  • dbms.security.procedures.roles: replaced by EXECUTE BOOSTED privileges (for both procedures and user-defined functions)
  • dbms.security.procedures.default_allowed: replaced by EXECUTE BOOSTED privileges (for both procedures and user-defined functions)
  • When an expression is returned by a subquery, a warning is raised if it does not have an alias
Categories
Data Architecture

ISO 8001-61

Data quality management: Process reference model

Categories
devops

DevOps Periodic Table

https://digital.ai/periodic-table-of-devops-tools

 214 total views

Categories
Data Neo4j SQL Server

MS SQL Server Database Logical Entity-Relationship Model in Neo4j

In Neo4j 4.0 onward, we can have multi database serving in the same instance of neo4j.

Sometimes, we need to store the E-R metadata of SQL Server in Neo4j database.

So, with neo4j, we can create a separate metadata database for a given SQL Server database, and extract the metadata of the Entities and Attributes from SQL Server and load it into Neo4j as a reference.


USE < database > 
go 

/*extract all tables */

SELECT
   'MERGE (' + TABLE_NAME + ':' + TABLE_NAME + ');' 
FROM
   INFORMATION_SCHEMA.TABLES Tab 
WHERE
   TABLE_TYPE = 'BASE TABLE' 
ORDER BY
   table_name;
go 


/* extract all releationships */ 

SELECT
   'MATCH (' + OBJECT_NAME(referenced_object_id) + ':table{table_name:"' + OBJECT_NAME(referenced_object_id) + '"}),(' + OBJECT_NAME(parent_object_id) + ':table{table_name:"' + OBJECT_NAME(parent_object_id) + '"}) MERGE (' + OBJECT_NAME(referenced_object_id) + ')-[r:' + OBJECT_NAME(constraint_object_id) + ']->(' + OBJECT_NAME(parent_object_id) + ');' 
FROM
   sys.foreign_key_columns 
ORDER BY
   referenced_object_id;

For the demo, I used “AdventureWorks” sample database from SQL Server, and executed the above queries in SQL Server Management Studio. Make sure, you have selected “Results to Text” in SSMS for easy copying of the results.

Tables

Relationships

Copy the results and execute it in Neo4j database.

Note:

  1. The script will also work in Neo4j 3.x.
  2. Some of the relationships from SQL Server is larger than 25 characters. Creating those relationships in neo4j will fail. In my next release of the code, I will generate an unique relationship names.

 166 total views

Categories
Neo4j

Generate Demo Data in Neo4j using neo4j-faker

Introduction

There are always times, when we need to quickly generate demo datasets in Neo4j quickly to brainstorm solution for critical application, before creating or loading actual data.

Graphaware’s GraphGen was useful when generating a small-to-medium data. But, if we need a higher volume and flexibility on labels and properties, then it becomes a challenge.

For such situations, we can use neo4j-faker contribution from neo4j.

Installation

  1. Download the latest release from GitHub into a temp directory.
  2. Unzip the distribution zip file into a directory.
  3. Copy all the contents of the dist directory to the neo4j server plugin directory.
  4. The dist directory has the following contents
    1. neo4jFaker-x.x.x.jar – The main jar file for faker.
    2. ddgress – Directory where loader files, csv and text files are stored to be used along with faker. (Example – cities.txt contains a list of cities to be used for data generation).
  5. Modify the below lines in neo4j.conf, and restart neo4j service
    • Add the following line in the neo4j.conf file to allow access to the faker functions.
      • dbms.security.procedures.unrestricted=fkr.*
    • Add the following line to your neo4j.conf file
      • dbms.unmanaged_extension_classes=org.neo4j.faker.ump=/testdata

Installation Verification

  1. Create a test database by name – “demo” in neo4j.
  2. For demo purpose, lets quickly create a demo dataset using an existing property file in ddgress folder – example1.props
  3. Browse the URL from localhost or use curl.
    • curl http://127.0.0.1:7474/testdata/tdg/pfile/example1.props?dbname=demo
  4. If there are no errors in the installation, then 2,830 nodes, 5,600 relations and 28,090 properties will be created, as shown at the end of the log file.
DemoDataGen version 0.9.0
STARTING on Tue Jun 30 20:45:55 EDT 2020
tdgRoot /root/neo4j-enterprise-4.1.0/plugins/ Property file to be loaded /root/neo4j-enterprise-4.1.0/plugins/ddgres/example1.props
Tue Jun 30 20:45:55 EDT 2020 start :
Commit size 2000
Reading node list
tgd_id INDEX : true
processing node definition p1:Person:2000
vals.length 3
-- 0 p1 -- 1 Person -- 2 2000
found property ~Person for label Person
found property caption for label Person
found property address for label Person
found property credit for label Person
processing node definition p2:Person:800
vals.length 3
-- 0 p2 -- 1 Person -- 2 800
found property ~Person for label Person
found property caption for label Person
found property address for label Person
found property credit for label Person
found property caption for alias p2
processing node definition cit:City:30
vals.length 3
-- 0 cit -- 1 City -- 2 30
found property country for label City
found property name for label City
Reading lookup nodes
Reading relation definitions
found property weight for relation personPerson1
found property weight for relation personPerson2
Initializing faker library
Initializing Name generator
found node -alias p1 -label:Person -indexProperty: tdg_id -amount:2000
found node -alias p2 -label:Person -indexProperty: tdg_id -amount:800
found node -alias cit -label:City -indexProperty: tdg_id -amount:30
processing relation personPerson1
START node identifier p1
END node identifier p2
processing p1 many to one 2000 - p2
processing relation personPerson2
START node identifier p2
END node identifier p1
processing p2 many to one 800 - p1
processing relation personCity1
START node identifier p1
END node identifier cit
processing p1 many to one 2000 - cit
processing relation personCity2
START node identifier p2
END node identifier cit
processing p2 many to one 800 - cit
nodeCount 2830
relation count 5600
property count 28090
finished total time was (ms): 1070
load speed node and relations per second: 7000.0
Tue Jun 30 20:45:56 EDT 2020 END :

5. Query the neo4j database from browser, to verify the data created by faker.

6. Query schema of created data set.

Project

Lets create demo data using faker procedure and functions from available plugin.

For our project, lets create 3 labels – Person, CreditCard and City.

Each Person owns one or more Credit Cards and reside in a City.

First lets create Person data –

foreach (i in range(0,100) |
create (p:Person { uid : i })
set p += fkr.person('1960-01-01','2001-01-01')
set p.ssn = fkr.code('### ### ####')
set p.email = fkr.email()
);

The above command creates

  • 100 Person nodes as mentioned in the range.
  • Set Date of Birth between 1960-01-01 and 2001-01-01 of the Person.
  • Sets the Person’s ssn in the “### ### ####” format along with email, using fkr.code.
    • fkr.code generate random number in the format specified.
  • The Person nodes automatically creates First Name, Last Name and concatenates First Name and Last Name into Full Name.
  • Email address is generated with first name, last name with domain names like yahoo.com, google.com etc.

Lets create Credit Card data –

foreach (a in range(0,500) |
        create (c:CreditCard {cardnum : fkr.code('#### #### #### ####')})
        set c.limit = fkr.longElement('5000,7500,8000,2500,1200,2000,4000,10000')
        set c.balance = fkr.number(10,1000)
        set c.card_type = fkr.nextStringElement('VISA,MasterCard,Paypal,Discover,AMEX')
);

The above code creates

  • 500 Credit Cards as mentioned in the range.
  • Sets the limit of the card using fkr.longElement.
  • Sets balance on each Credit Card with the range of 10 to 10,000.
  • Sets a Card type from the list of type mentioned in fkr.nextStringElement

Lets create Cities

foreach (ci in range(0,40) |
     merge (cit:City { name : fkr.stringFromFile("cities.txt") })
);

The above code creates

  • 40 Nodes with randomly picked City names as described in “cities.txt” inside ddgress folder.

Finally, we will create Relationships between Person, CreditCard and City nodes.

The below command uses fkr.createRelation to create Relationships.

match (person:Person) with collect(person) as persons
match (card:CreditCard) with persons, collect(card) as cards
match (city:City) with persons,cards, collect(city) as cities
   with persons, cards,cities
      call fkr.createRelations(persons, "HAS_CARDS", cards, "1-n") yield relationships as cardsRelations
      call fkr.createRelations(persons, "LIVES_IN", cities, "n-1") yield relationships as citieselations
return persons, cards, cities;

The fkr.createRelation procedure has a signature, accepting 4 parameters

  • startNodes
  • relationshipType
  • endNodes
  • cardinality, where
    • ‘1-n’ -> The end node may have max 1 relation from the start node
    • ‘n-1’ -> The start node may have max one relation to the end node
    • ‘1-1’ -> The start and end node may have max one relationship of this type

Query Neo4j for Data

As you can see, generating demo data using neo4j-faker is pretty easy and quick.

The above method was implemented using procedures and functions. In my next blog, we will use properties files in ddgress to generate data.

Stay tuned …..

 158 total views

Categories
Neo4j

Neo4j 4.1.0

Neo4j 4.1.0 release. below are few highlights


Graph privileges in Role-Based Access Control (RBAC) security
 – The new privileges provide fine-grained control over write operations in a graph: administrators can grant, deny or revoke users the privilege to create, delete or overwrite a graph element, create, remove or update a property, create or remove a node label.

Database privileges for transaction management – Administrators can grant, deny or revoke users the privilege to show or terminate transactions executed by a given list of users in a given list of databases.

Database management privileges – Administrators can grant, deny or revoke users the privilege to create, or remove databases.

User management privileges – Administrators can grant, deny or revoke users the privilege to create, modify, remove or show users in a standalone or clustered environment.

Role and privilege management privileges – Administrators can grant, deny or revoke users the privilege to create, modify, remove, assign or show roles in a standalone or clustered environment. Privileges are also set to assign, remove or show privileges in roles.

PUBLIC built-in role – The PUBLIC role is automatically assigned to all users, it allows administrators to set a default security model for a standalone or clustered environment.

SHOW commands improvements – The SHOW DATABASES, SHOW PRIVILEGES, SHOW ROLES and SHOW USERS commands provide new filtering and sorting capabilities for columns and rows.

Rolling upgrades improvements – Clustered installations can be upgraded without downtime. Rolling upgrades can be applied from any patch to the next patched version. This feature is also available when administrators upgrade Neo4j from any 4.0 version to 4.1.

Memory management settings and memory monitoring procedures – Administrators can set the amount of heap space used to manage transactions. Settings are per trasanction, per database and for the whole DBMS. Transactions that exceed the threshold are killed. The amount of memory used by transactions or by other internal components can be monitored with the use of the two procedures dbms.listTransactions() and dbms.listPools().

Embedded Causal Clustering – Neo4j 4.1 can be used as a library embedded into custom applications, either as a standalone system or as a clustered environment using the Raft consensus protocol available with Causal Cluster.

Cluster Leadership Control – Neo4j 4.1 provides more control over the leadership transfer with the Raft consensus protocol. Administrators can define preferrable leaders in order to balance readonly and readwrite operations according to the system infrastructure, for example when a cluster is formed by members in multiple data centers.

Cluster Leadership Balancing – In Neo4j 4.1, leaders of different Raft groups in a multidatabase environment can be automatically balanced among the members of the cluster.

Cypher Query Replanning Option – Query replanning can be controlled with pre-parser options of Cypher. Queries can be forced to be replanned or the replanning can be skipped, overrulling the threshold rules in place for planning. This is particularly efficient when administrators set up batched replanning scripts and seek for more predictable query response times during interactive operations.

Cypher PIPELINED Runtime operators – The PIPELINED runtime can now serve more than 85% of the cases in read queries. Query performance are generally better with the new runtime compared to the still available SLOTTED runtime: tests show throughput improvements from 10% to 35%, depending on scale factor.

Improvements in EXPLAIN and PROFILE commands – In Neo4j 4.1, EXPLAIN and PROFILE output provide more details of the query plan adopted by the Cypher runtime. Improvements include operator specific information, multirow descriptions, variables used in the query execution and memory usage.

Database routing in Cypher – Cypher DML and DDL commands can be preceeded by a USE clause that allows a user to route a query to a specific database.

Automatic routing of administration commands – Cypher administration commands used for authentication and authorization management, database management etc. are automatically routed to the system database. Prior to Neo4j 4.1, administrators had to instantiate a new session to the system database in order to issue and administration command.

Server-side routing for read/write queries in a clustered environment – In Neo4j 4.1 a client application can connect to any member of a cluster, the Cypher runtime can route readwrite transactions to the leader of a Raft group, whilst readonly transactions can be executed by any member of the cluster. In order to use this feature, client applications must use 4.1 compatible drivers.

New configuration for Bolt connections – The internal Bolt server in Neo4j 4.1 provides settings to keep the connection alive during long running transactions. This setting is particularly useful in environments where load balancers and firewalls have strict rules over the killing of idle connections. The new settings also provide a better prevention over malificent client attacks with unsuccessful authentication.