Running a JSON Schema Validator in Python using the Java Native Interface
Feb 21, 2020 11:50 PMIn a quest to find an easy way to run a Java library from Python, I stumbled upon the pyjnius
library. It hooks into the Java Native Interface (JNI) so Python can call into Java classes. It’s pleasant to use, especially for string processing tasks that are easier to script in Python.
I used jnius
and a few tricks to upgrade a validation workflow in a JSON Schema repository. We once used RapidJSON in our data platform for validation at ingestion-time, which only supports v4 of the JSON Schema specification. However, we migrated over to Google Cloud Platform and now use the everit-org.json-schema
Java library for JSON Schema. This means we are not longer tied down to v4 of the specification.
I chose to use Python as the language for putting together the validation suite for the schema repository; pytest
makes it easy to dynamically generate cases from files and I wanted to compare everit-org.json-schema
to the native jsonschema
Python library. In this post, I’ll illustrate the workflow that’s necessary to call everit-org.json-schema
from Python.
We will get the following code snippet to run:
#!/usr/bin/env python3
import json
# omitted: adding jars into the class path
from jnius import autoclass
# validation logic
JSONObject = autoclass("org.json.JSONObject")
SchemaLoader = autoclass("org.everit.json.schema.loader.SchemaLoader")
schema_data = JSONObject(json.dumps(...))
payload_data = JSONObject(json.dumps(...))
schema = SchemaLoader.load(schema_data)
assert schema.validate(JSONObject(payload_data)
First we need to make any dependencies are available to the JVM. The easiest way to manage these are through Maven, a Java package manager.
In a pom.xml
, we create a minimal example.
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>example</artifactId>
<version>1</version>
<dependencies>
<dependency>
<groupId>com.github.everit-org.json-schema</groupId>
<artifactId>org.everit.json.schema</artifactId>
<version>1.12.1</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
</project>
Now run maven to copy dependencies into the local directory.
mvn dependency:copy-dependencies
This downloads the transitive dependencies needed for everit-org.json-schema
e.g. org.JSON
into a target/dependencies
folder. We include the downloaded JARs into our python script by setting the CLASSPATH
variable. Each jar is separated by a colon, :
.
#!/usr/bin/env python3
# omitted: other imports
import os
from pathlib import Path
os.environ["CLASSPATH"] = ":".join(
[p.resolve().as_posix() for p in Path("target").glob("**/*.jar")]
from jnius import autoclass
# omitted: validation logic
We also need to make sure that JAVA_HOME
is set up correctly. To avoid issues in our local environment, we will set up a docker container instead.
FROM centos:centos8
RUN dnf -y update &&
dnf -y install epel-release &&
dnf -y install
which
python36
java-1.8.0-openjdk-devel
maven
&& dnf clean all
COPY main.py /app/main.py
COPY pom.xml /app/pom.xml
RUN mvn dependency:copy-dependencies
RUN pip3 install pyjnius
CMD python3 main.py
Running this will let us successfully validate a schema.
#!/bin/bash
tag=validator-blog:latest
docker build -t $tag .
docker run -it $tag
If you would like to see a real world example, see this PR to mozilla-pipeline-schemas
.
I’m satisfied with the end result, since it was easier than I expected to set up. We now have continuous integration that runs our schemas against examples using two independent implementations of the JSON Schema specification, through the help of Python and the JNI.