1 of 5

Reference

Python Types

Python4j and python types

Types overview

Python4j's types by default come with all of the python primitive types supported. These types include: 1. STR: the string type, mapped to java's string type 2. INT: the integer type, mapped to java's long type 3. FLOAT: the float type, mapped to java's double type 4. BOOL: the boolean type, mapped to java's boolean type 5. BYTES: the bytes type, mapped to java's byte array

These built in types are wrapped in a PythonTypes class for users.

Numpy arrays

If a user wants direct support for numpy arrays as types, a separate dependency of python4j-numpy is required. Including python4j-numpy in your project for a number of projects can be found here

Our numpy support means seamless interop with the numpy format and using numpy arrays inside and outside of python scripts using the exact same memory references with zero copy. This will allow users to benefit from their numpy expertise, but get reasonable performance using the nd4j framework for the core data structure in java.

Note that only dense arrays are supported at this time. When nd4j supports sparse and complex types, we will update this page. Please feel free to ask about this on our support forums

Custom types

If user needs to specify a custom type, all you need to do is extend our PythonType class.

In order for python4j to discover which custom types are available. These are loaded using the java service loader interface. This means a user needs to have a META-INF/services folder on the classpath (usually under src/main/resources) in their project with the fully qualified class name: org.nd4j.python4j.PythonType . All types specified should be fully qualified class names like the abstract class org.nd4j.python4j.PythonType.

For sample implementations of PythonType, please see the inline declarations in our PythonTypes top level class and use those as a starting point. You will need to implement an adapt method and specify the appropriate mapping of python type to java type.

Python Path

Python4j and custom python path

The default python path

By default, javacpp provides a python path for us by listing its bundled dependencies that are provided in the javacpp jar files. This includes numpy as well if the user is using our python4j-numpy module. However, in real world applications many users will need additional libraries in order to run the scripts they built in other environments.

##Specifying a custom python path

Python4j allows a user to specify a custom python path. This python path should be the same version as the python version being provided by python4j. In order to specify a custom python path, a user should be aware of 3 properties: 1. org.eclipse.python4j.path: This system property is where a user can specify which python path to use. A user can obtain this python path with this small snippet:

import sys
import os
print(os.pathsep.join(sys.path))

org.eclipse.python4j.path.append: This system property is how to interoperate with the python path provided
by javacpp. A user can select none, before, or after. This affects the loading order for all libraries.
A dependency clash can happen if a user uses a different version of numpy from the one in javacpp for example.
In order to avoid clashes, specify none for the system property.

Creating a custom python path setup using miniconda

It is recommended to use an embedded miniconda zipped up as an archive for distributing any dependencies needed for a target platform. In order to setup miniconda, please see anaconda's installation guide

Afterwards, run the needed conda install commands from the miniconda install directory on the target system. From there, run the command specified above in our custom python path section

This will print the python path you need to pass to python4j before it initializes.

Garbage Collection

Python4j Garbage Collection and interactions with the JVM

Python garbage collection

Python garbage collection uses reference counting This happens all within the cpython runtime. Python4j bundles this runtime.

Javacpp's memory management

Javacpp uses reference counters and phantom references for all of its pointer objects. Please see here for the core behavior.

Javacpp garbage collection with python GC

End users should be aware of potential race conditions between javacpp's memory management and deleting pointers with python's GC. When accessing in memory python variables from java, depending on what code is written in python and how variables are managed, variables maybe garbage collected before referencing causing a crash.

In order to avoid issues, try to use python scripts transactionally. This means within a try/with block that locks the gil as described in the overview

If python variables need to be kept in memory, ensure proper context management of the in memory python variables. Context Management in this case means using a separate python context (essentially a separate interpreter) if a user wants full isolation. Full isolation is not required in most cases, but maybe desirable for certain use cases. For now, if you would like more information on this topic, please ask on our forums and see our unit test for usage.

Python Script Execution

Python4j Python Script Execution

Script execution overview

Python4j runs and manages python interpreters as well as the GIL for a user. When a user attempts to execute a python script, they are calling in to cpython's PyRun rountines. This invokes cpython directly. This functionality can be found in the PythonExecutioner

When invoking cpython, we actually also wrap the execution code the user specifies in a python script. This script can be found here. This script mainly ensures we can properly print python stack traces if an error occurs. It does this by ensuring stdout/stderr for the given in memory python interpreter are flushed properly.

Initialization

Note that before a python script can execute, python4j needs to initialize itself. This happens in the static initialization block of PythonExecutioner Be aware of that when executing python scripts.

Python variable input and output

Python4j allows users to pass in and retrieve in memory python variables. A simple example:

try(PythonGIL pythonGIL = PythonGIL.lock()) {
            List<PythonVariable> inputs = new ArrayList<>();
            inputs.add(new PythonVariable<>("x", PythonTypes.STR, "Hello "));
            inputs.add(new PythonVariable<>("y", PythonTypes.STR, "World"));
            String code = "print(x + y)";
            PythonExecutioner.exec(code, inputs, null);
        }

In our case here, we're not retreiving variables, but just passing 2 strings in for printing.

For retrieving variables, we can do either of the following:

 String code = "a = 5\nb = '10'\nc = 20.0";
 List<PythonVariable> vars = PythonExecutioner.execAndReturnAllVariables(code);

Exec and return all variables allows us to retrieve any variable tha twas created during the python execution by name. The returned PythonVariables will be named as they were in the python script.

Optionally, a user may also specify a list of variables to be returned. This can be achieved by passing in an output variable list to PythonExecutioner.exec(..) as follows:

 try(PythonGIL pythonGIL = PythonGIL.lock()) {
            List<PythonVariable> inputs = new ArrayList<>();
            inputs.add(new PythonVariable<>("x", PythonTypes.STR, "Hello "));
            inputs.add(new PythonVariable<>("y", PythonTypes.STR, "World"));
            PythonVariable out = new PythonVariable<>("z", PythonTypes.STR);
            String code = "z = x + y";
            PythonExecutioner.exec(code, inputs, Collections.singletonList(out));

        }

Execution in a multi threaded environment

Python4j is capable of multi threaded execution of python scripts. The user can manage multiple threads by ensuring that all python calls are wrapped in the double try/with block mentioned in the overview By locking the GIL and watching the garbage collection in any python call, the GIL management is automatically handled for the user.

Optionally, a user may also use the PythonContextManager - creating one context per thread. A "context" is essentially a separate python interpreter with its own variables, memory etc.