Running a JSON Schema Validator in Python using the Java Native Interface

Feb 21, 2020 11:50 PM

In a quest to find an easy way to run a Java library from Python, I stumbled upon the pyjnius library. It hooks into the Java Native Interface (JNI) so Python can call into Java classes. It’s pleasant to use, especially for string processing tasks that are easier to script in Python.

I used jnius and a few tricks to upgrade a validation workflow in a JSON Schema repository. We once used RapidJSON in our data platform for validation at ingestion-time, which only supports v4 of the JSON Schema specification. However, we migrated over to Google Cloud Platform and now use the everit-org.json-schema Java library for JSON Schema. This means we are not longer tied down to v4 of the specification.

I chose to use Python as the language for putting together the validation suite for the schema repository; pytest makes it easy to dynamically generate cases from files and I wanted to compare everit-org.json-schema to the native jsonschema Python library. In this post, I’ll illustrate the workflow that’s necessary to call everit-org.json-schema from Python.

We will get the following code snippet to run:

#!/usr/bin/env python3
import json

# omitted: adding jars into the class path
from jnius import autoclass

# validation logic
JSONObject = autoclass("org.json.JSONObject")
SchemaLoader = autoclass("org.everit.json.schema.loader.SchemaLoader")

schema_data = JSONObject(json.dumps(...))
payload_data = JSONObject(json.dumps(...))

schema = SchemaLoader.load(schema_data)
assert schema.validate(JSONObject(payload_data)

First we need to make any dependencies are available to the JVM. The easiest way to manage these are through Maven, a Java package manager.

In a pom.xml, we create a minimal example.

<project>
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.example</groupId>
    <artifactId>example</artifactId>
    <version>1</version>
    <dependencies>
        <dependency>
            <groupId>com.github.everit-org.json-schema</groupId>
            <artifactId>org.everit.json.schema</artifactId>
            <version>1.12.1</version>
        </dependency>
    </dependencies>
    <repositories>
        <repository>
            <id>jitpack.io</id>
            <url>https://jitpack.io</url>
        </repository>
    </repositories>
</project>

Now run maven to copy dependencies into the local directory.

mvn dependency:copy-dependencies

This downloads the transitive dependencies needed for everit-org.json-schema e.g. org.JSON into a target/dependencies folder. We include the downloaded JARs into our python script by setting the CLASSPATH variable. Each jar is separated by a colon, :.

#!/usr/bin/env python3
# omitted: other imports
import os
from pathlib import Path

os.environ["CLASSPATH"] = ":".join(
  [p.resolve().as_posix() for p in Path("target").glob("**/*.jar")]

from jnius import autoclass

# omitted: validation logic

We also need to make sure that JAVA_HOME is set up correctly. To avoid issues in our local environment, we will set up a docker container instead.

FROM centos:centos8

RUN dnf -y update && 
    dnf -y install epel-release && 
    dnf -y install 
        which 
        python36 
        java-1.8.0-openjdk-devel 
        maven 
    && dnf clean all

COPY main.py /app/main.py
COPY pom.xml /app/pom.xml
RUN mvn dependency:copy-dependencies
RUN pip3 install pyjnius

CMD python3 main.py

Running this will let us successfully validate a schema.

#!/bin/bash

tag=validator-blog:latest
docker build -t $tag .
docker run -it $tag

If you would like to see a real world example, see this PR to mozilla-pipeline-schemas.

I’m satisfied with the end result, since it was easier than I expected to set up. We now have continuous integration that runs our schemas against examples using two independent implementations of the JSON Schema specification, through the help of Python and the JNI.