Python4j and python types
Python4j's types by default come with all of the python primitive types supported. These types include: 1. STR: the string type, mapped to java's string type 2. INT: the integer type, mapped to java's long type 3. FLOAT: the float type, mapped to java's double type 4. BOOL: the boolean type, mapped to java's boolean type 5. BYTES: the bytes type, mapped to java's byte array
These built in types are wrapped in a PythonTypes class for users.
If a user wants direct support for numpy arrays as types, a separate dependency of python4j-numpy is required. Including python4j-numpy in your project for a number of projects can be found here
Our numpy support means seamless interop with the numpy format and using numpy arrays inside and outside of python scripts using the exact same memory references with zero copy. This will allow users to benefit from their numpy expertise, but get reasonable performance using the nd4j framework for the core data structure in java.
Note that only dense arrays are supported at this time. When nd4j supports sparse and complex types, we will update this page. Please feel free to ask about this on our support forums
If user needs to specify a custom type, all you need to do is extend our PythonType class.
In order for python4j to discover which custom types are available. These are loaded using the java service loader interface. This means a user needs to have a META-INF/services folder on the classpath (usually under src/main/resources) in their project with the fully qualified class name: org.nd4j.python4j.PythonType . All types specified should be fully qualified class names like the abstract class org.nd4j.python4j.PythonType.
For sample implementations of PythonType, please see the inline declarations in our PythonTypes top level class and use those as a starting point. You will need to implement an adapt method and specify the appropriate mapping of python type to java type.
Python4j Garbage Collection and interactions with the JVM
Python garbage collection uses reference counting This happens all within the cpython runtime. Python4j bundles this runtime.
Javacpp uses reference counters and phantom references for all of its pointer objects. Please see here for the core behavior.
End users should be aware of potential race conditions between javacpp's memory management and deleting pointers with python's GC. When accessing in memory python variables from java, depending on what code is written in python and how variables are managed, variables maybe garbage collected before referencing causing a crash.
In order to avoid issues, try to use python scripts transactionally. This means within a try/with block that locks the gil as described in the overview
If python variables need to be kept in memory, ensure proper context management of the in memory python variables. Context Management in this case means using a separate python context (essentially a separate interpreter) if a user wants full isolation. Full isolation is not required in most cases, but maybe desirable for certain use cases. For now, if you would like more information on this topic, please ask on our forums and see our unit test for usage.
Python4j Python Script Execution
Python4j runs and manages python interpreters as well as the GIL for a user. When a user attempts to execute a python script, they are calling in to cpython's PyRun rountines. This invokes cpython directly. This functionality can be found in the PythonExecutioner
When invoking cpython, we actually also wrap the execution code the user specifies in a python script. This script can be found here. This script mainly ensures we can properly print python stack traces if an error occurs. It does this by ensuring stdout/stderr for the given in memory python interpreter are flushed properly.
Note that before a python script can execute, python4j needs to initialize itself. This happens in the static initialization block of PythonExecutioner Be aware of that when executing python scripts.
Python4j allows users to pass in and retrieve in memory python variables. A simple example:
In our case here, we're not retreiving variables, but just passing 2 strings in for printing.
For retrieving variables, we can do either of the following:
Exec and return all variables allows us to retrieve any variable tha twas created during the python execution by name. The returned PythonVariables will be named as they were in the python script.
Optionally, a user may also specify a list of variables to be returned. This can be achieved by passing in an output variable list to PythonExecutioner.exec(..) as follows:
Python4j is capable of multi threaded execution of python scripts. The user can manage multiple threads by ensuring that all python calls are wrapped in the double try/with block mentioned in the overview By locking the GIL and watching the garbage collection in any python call, the GIL management is automatically handled for the user.
Optionally, a user may also use the PythonContextManager - creating one context per thread. A "context" is essentially a separate python interpreter with its own variables, memory etc.
Python4j and custom python path
By default, javacpp provides a python path for us by listing its bundled dependencies that are provided in the javacpp jar files. This includes numpy as well if the user is using our python4j-numpy module. However, in real world applications many users will need additional libraries in order to run the scripts they built in other environments.
##Specifying a custom python path
Python4j allows a user to specify a custom python path. This python path should be the same version as the python version being provided by python4j. In order to specify a custom python path, a user should be aware of 3 properties: 1. org.eclipse.python4j.path: This system property is where a user can specify which python path to use. A user can obtain this python path with this small snippet:
org.eclipse.python4j.path.append: This system property is how to interoperate with the python path provided
by javacpp. A user can select none, before, or after. This affects the loading order for all libraries.
A dependency clash can happen if a user uses a different version of numpy from the one in javacpp for example.
In order to avoid clashes, specify none for the system property.
It is recommended to use an embedded miniconda zipped up as an archive for distributing any dependencies needed for a target platform. In order to setup miniconda, please see anaconda's installation guide
Afterwards, run the needed conda install commands from the miniconda install directory on the target system. From there, run the command specified above in our custom python path section
This will print the python path you need to pass to python4j before it initializes.