This is a course from Graphlet AI on full-stack graph machine learning taught by Russell Jurney.
This class uses a Docker image rjurney/graphml-class. To bring it up as the jupyter service along with neo4j, run:
# Pull the Docker images BEFORE class starts, or it can take a while on a shared connection
docker compose pull
# Run a Jupyter Notebook container in the background with all requirements.txt installed
docker compose up -d
# Tail the Jupyter logs to see the JupyterLab url to connect in your browser
docker logs jupyter -f --tail 100To shut down docker, be in this folder and type:
docker compose downYou say potato, I say patato... the docker compose command changed in recent versions :)
NOTE: older versions of docker may use the command docker-compose rather than the two word command docker compose.
To edit code in VSCode you may want a local Anaconda Python environment with the class's PyPi libraries installed. This will enable VSCode to parse the code, understand APIs and highlight errors.
Note: if you do not use Anaconda, consider using it :) You can use a Python 3 venv in the same way as conda.
Create a new Anaconda environment:
conda create -n graphml python=3.10.11 -yActivate the environment:
conda activate graphmlInstall the project's libraries:
poetry installYou can use a Python environment in VSCode by typing:
SHIFT-CMD-Pto bring up a command search window. Now type Python or Interpreter or if you see it, select Python: Select Interpreter. Now choose the path to your conda environment. It will include the name of the environment, such as:
Python 3.10.11 ('graphml') /opt/anaconda3/envs/graphml/bin/pythonNote: the Python version is set to 3.10.11 because Jupyter Stacks have not been updated more recently.
We build a knowledge graph from the Stack Exchange Archive for the network motif section of the course.
To run a bash shell in the Jupyter container, type:
docker exec -it jupyter bashOnce you're there, you can run the following commands to download and prepare the data for the course.
First, download the data:
graphml_class/stats/download.py stats.metaThen you will need to convert the data from XML to Parquet:
spark-submit --packages "com.databricks:spark-xml_2.12:0.18.0" graphml_class/stats/xml_to_parquet.pyThe course covers knowledge graph construction in PySpark in graphml_class.stats.graph.py.
spark-submit graphml_class/stats/graph.pyThis course now covers network motifs in property graphs (frequent patterns of structure) using pyspark / GraphFrames (see motif.py, no notebook yet).
It supports directed motifs, not undirected. All the 4-node motifs are outlined below. Note that GraphFrames can also filter the
paths returned by its f.find() method using any Spark DataFrame filter - enabling temporal and complex property graph motifs.

