A Gentle Introduction to Computer Vision, Part 2: Building Tensorflow for the Jetson Xavier NX

9 min readJan 23, 2021

In the previous article in this series, we compiled OpenCV, a computer vision library. While OpenCV may be able to do inference, training must be done on a training framework. Tensorflow is a modern tool used for design, training, and inference of neural networks. This technology has revolutionized computer vision since Yann LeCun developed the convolutional neural network (CNN) in 1989. In this author’s opinion, the application of the CNN to computer vision is what finally ended the AI winter and kickstarted the AI revolution. The study of neural networks will be imperative in our introduction to computer vision.

Introduction

To optimize Tensorflow, it must be compiled for a particular CPU and GPU architecture. While Google themselves do not recommend that you compile from source (it’s not very accessible), it’s important to do so, especially if the precompiled package doesn’t actually work for your development platform. Let’s get started.

Tensorflow will only build under certain versions of supporting libraries. You must respect this, or your build will fail. I’ve successfully built the following configuration:

Tensorflow v2.3.1
Python 3.8.7
GCC 7.5.0 compiler
Bazel 3.1.0 build tool
NVIDIA cuDNN 8.0
NVIDIA CUDA 10.2

Prepare the Build Environment

Verify Library Versions

$ cat /sys/module/tegra_fuse/parameters/tegra_chip_id
25   # Xavier NX and AGX Xavier
$ echo /usr/lib/aarch64-linux-gnu/libnvinfer.so.?
/usr/lib/aarch64-linux-gnu/libnvinfer.so.7   # TensorRT Version 7
$ cat /usr/local/cuda/version.txt
CUDA Version 10.2.89
$ cat /usr/include/cudnn_version.h | grep "#define CUDNN_MAJOR"
#define CUDNN_MAJOR 8   # CuDNN Version 8

Install Package Dependencies

$ sudo apt install python3-dev python3-pip npm node-gyp nodejs-dev libssl1.0-dev openjdk-11-jdk-headless build-essential zip unzip git wget autoconf automake libtool curl make g++ unzip libhdf5-serial-dev hdf5-tools libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libfftw3-dev libopenblas-dev autoconf

Clone the Tensorflow Repository

$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout v2.3.1
$ cd ..

Upgrade Python

As of January 2021, L4T ships with Python 3.6.9. I upgraded mine to the latest Python version recommended by Google, Python 3.8.7:

$ git clone https://github.com/python/cpython
$ cd cpython
$ git checkout v3.8.7
$ ./configure --prefix=/usr/local --enable-shared --enable-static
$ make -j($nproc)
$ sudo make altinstall
$ cd ..

This will install a python3.8 to /usr/local without overwriting your system’s interpreter (v3.6.9).

Setup Virtual Environment

We will be using installing Tensorflow to a virtual environment. I like to keep versioned environments of each varying configuration. Let’s set that up:

$ mkdir envs
$ python3.8 -m venv envs/py3.8tf2.3.1
$ source envs/py3.8tf2.3.1/bin/activate

(py3.8tf2.3.1) $ pip install -U pip 
(py3.8tf2.3.1) $ pip install -U six Cython wheel setuptools mock 'future>=0.17.1' 'gast==0.3.3' typing_extensions 
(py3.8tf2.3.1) $ pip install -U h5py
(py3.8tf2.3.1) $ pip install -U keras_applications --no-deps
(py3.8tf2.3.1) $ pip install -U keras_preprocessing --no-deps
(py3.8tf2.3.1) $ pip install cython

Build Numpy

We need Numpy v1.18.5 because Numpy v1.19 and on introduced breaking ABI changes that will cause the build to fail. From the previous article on compiling OpenCV, I found that the Numpy from pip caused errors when importing cv2. To be safe, I built my own Numpy:

(py3.8tf2.3.1) $ git clone https://github.com/numpy/numpy
(py3.8tf2.3.1) $ cd numpy
(py3.8tf2.3.1) $ git checkout v1.18.5
(py3.8tf2.3.1) $ python setup.py build -j $(nproc) install
(py3.8tf2.3.1) $ cd ..
(py3.8tf2.3.1) $ ln -sf envs/py3.8tf2.3.1/lib/python3.8/site-packages/numpy-1.18.5-py3.8-linux-aarch64.egg/numpy \
envs/py3.8tf2.3.1/lib/python3.8/site-packages/numpy

Tensorflow cannot find numpy in the egg directory it’s installed in, hence the symlink to the plain numpy directory.

Setup Paths

Setup the library search paths and configure the tmp directory:

(py3.8tf2.3.1) $ export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
(py3.8tf2.3.1) $ export TMP=/tmp

Build Protobuf

Upgrade protobuf to >3.8.0. This solves the “extremely long model loading time problem” of TF-TRT. JK Jung has this to say regarding the root cause:

the default ‘python implementation’ of python3 ‘protobuf’ module runs too slowly on the Jetson platforms. And the solution is simply to replace it with ‘cpp implementaion’ of that same module.

So, let’s build and install protobuf and protoc also:

(py3.8tf2.3.1) $ wget https://github.com/protocolbuffers/protobuf/releases/download/v3.9.2/protobuf-python-3.9.2.zip
(py3.8tf2.3.1) $ wget https://github.com/protocolbuffers/protobuf/releases/download/v3.9.2/protoc-3.9.2-linux-aarch_64.zip
(py3.8tf2.3.1) $ unzip -o protoc-3.9.2-linux-aarch_64.zip -d protoc-3.9.2
(py3.8tf2.3.1) $ sudo cp protoc-3.9.2/bin/protoc /usr/local/bin/protoc
(py3.8tf2.3.1) $ unzip -o protobuf-python-3.9.2.zip
(py3.8tf2.3.1) $ cd protobuf-3.9.2
(py3.8tf2.3.1) $ ./autogen.sh
(py3.8tf2.3.1) $ ./configure --prefix=/usr/local
(py3.8tf2.3.1) $ make -j$(nproc)
(py3.8tf2.3.1) $ make check
(py3.8tf2.3.1) $ sudo make install
(py3.8tf2.3.1) $ sudo ldconfig
(py3.8tf2.3.1) $ python setup.py build --cpp_implementation
(py3.8tf2.3.1) $ python setup.py test --cpp_implementation
(py3.8tf2.3.1) $ sudo python setup.py install --cpp_implementation

Build Bazel

Tensorflow 2.3.1 requires Bazel 3.1.0, so let’s build it:

(py3.8tf2.3.1) $ mkdir bazel-3.1.0-dist && cd bazel-3.1.0-dist
(py3.8tf2.3.1) $ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-dist.zip
(py3.8tf2.3.1) $ unzip -o bazel-3.1.0-dist.zip
(py3.8tf2.3.1) $ ./compile.sh

The binary should be found in bazel-3.1.0-dist/output/bazel.

Build Tensorflow

Next, clean out the Tensorflow project directory, if necesssary:

(py3.8tf2.3.1) $ cd tensorflow
(py3.8tf2.3.1) $ git clean -fxd
(py3.8tf2.3.1) $ ../bazel-3.1.0-dist/output/bazel clean --expunge

Configure Tensorflow. The configure.py script will prioritize virtual environment paths:

(py3.8tf2.3.1) $ python configure.py                        
WARNING: current bazel installation is not a release version.
Make sure you are running at least bazel 3.1.0
Please specify the location of python. [Default is /home/username/dev/envs/py3.8tf2.3.1/bin/python]: [enter]Found possible Python library paths:
  /home/username/dev/envs/py3.8tf2.3.1/lib/python3.8/site-packages
Please input the desired Python library path to use.  Default is [/home/username/dev/envs/py3.8tf2.3.1/lib/python3.8/site-packages] [enter]Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
No OpenCL SYCL support will be enabled for TensorFlow.Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.Do you wish to build TensorFlow with CUDA support? [y/N]: Y
CUDA support will be enabled for TensorFlow.Do you wish to build TensorFlow with TensorRT support? [y/N]: Y
TensorRT support will be enabled for TensorFlow.Found CUDA 10.2 in:
    /usr/local/cuda-10.2/targets/aarch64-linux/lib
    /usr/local/cuda-10.2/targets/aarch64-linux/include
Found cuDNN 8 in:
    /usr/lib/aarch64-linux-gnu
    /usr/include
Found TensorRT 7 in:
    /usr/lib/aarch64-linux-gnu
    /usr/include/aarch64-linux-gnuPlease specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 7.2Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: [enter]Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: [enter]Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
        --config=ngraph         # Build with Intel nGraph support.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=nonccl         # Disable NVIDIA NCCL support.

Now we can finally begin to build Tensorflow. I stopped counting the hours, but I believe this build took me over 60 hours:

(py3.8tf2.3.1) $ nvpmodel -m 0
(py3.8tf2.3.1) $ ../bazel-3.1.0-dist/output/bazel build \
 --config=opt \
 --config=cuda \
 --config=noaws \
 --local_cpu_resources=2 \
 --local_ram_resources=5800 \
 --verbose_failures \
 //tensorflow/tools/pip_package:build_pip_package

config=opt optimizes the build for your hardware.
config=cuda builds Tensorflow with CUDA support.
config=noaws disables building AWS S3 filesystem support, which I did not plan on using.
local_cpu_resources is the number of CPUs to use during compilation. I set the nvpmodel to 0 for this compilation. This was to limit RAM consumption during the build process.
local_ram_resources is the amount of RAM to use during compilation. Tensorflow requires a lot of RAM… that simply doesn’t exist on the Xavier NX. So, rather than create a gigantic swap file, I opted to build with two 1.9GHz cores instead. This value was the free RAM I had left according to my system monitor.
verbose_failures is useful for showing failures during the build process, which you hopefully should not have to deal with.

Finally, build and install Tensorflow’s Python wheel:

(py3.8tf2.3.1) $ bazel-bin/tensorflow/tools/pip_package/build_pip_package wheel/tensorflow_pkg
(py3.8tf2.3.1) $ sudo pip install wheel/tensorflow_pkg/tensorflow-2.3.1-*.whl

Import Tensorflow

(py3.8tf2.3.1) $ python
Python 3.8.7 (tags/v3.8.7:6503f05dd5, Jan 18 2021, 18:17:40) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-01-22 21:35:24.589798: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
>>> tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
>>> print('tensorflow version: %s' % tf.__version__)
tensorflow version: 2.3.1
>>> print('tensorflow.test.is_built_with_cuda(): %s' % tf.test.is_built_with_cuda())
tensorflow.test.is_built_with_cuda(): True
>>> print('tensorflow.test.is_gpu_available(): %s' % tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None))
2021-01-22 21:35:57.083354: W tensorflow/core/platform/profile_utils/cpu_utils.cc:108] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-01-22 21:35:57.084673: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555c232200 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-22 21:35:57.084757: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-22 21:35:57.093901: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-01-22 21:35:57.192654: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:949] ARM64 does not support NUMA - returning NUMA node zero
2021-01-22 21:35:57.193151: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555b0602d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-22 21:35:57.193321: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Xavier, Compute Capability 7.2
2021-01-22 21:35:57.193933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:949] ARM64 does not support NUMA - returning NUMA node zero
2021-01-22 21:35:57.194132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.109GHz coreCount: 6 deviceMemorySize: 7.59GiB deviceMemoryBandwidth: 66.10GiB/s
2021-01-22 21:35:57.194302: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2021-01-22 21:35:57.198842: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-01-22 21:35:57.202260: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-01-22 21:35:57.203044: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-01-22 21:35:57.207169: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-01-22 21:35:57.210435: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-01-22 21:35:57.211082: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2021-01-22 21:35:57.211485: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:949] ARM64 does not support NUMA - returning NUMA node zero
2021-01-22 21:35:57.211818: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:949] ARM64 does not support NUMA - returning NUMA node zero
2021-01-22 21:35:57.211990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-22 21:35:57.212113: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2021-01-22 21:35:59.174719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-22 21:35:59.174851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2021-01-22 21:35:59.174920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2021-01-22 21:35:59.175401: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:949] ARM64 does not support NUMA - returning NUMA node zero
2021-01-22 21:35:59.175741: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:949] ARM64 does not support NUMA - returning NUMA node zero
2021-01-22 21:35:59.175917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 2627 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
tensorflow.test.is_gpu_available(): True

We’ve successfully built Tensorflow from source!

Summary

Built Python 3.8.7.
Setup a virtual environment for use with Tensorflow.
Built Numpy v1.18.5.
Built Protobuf 3.9.2.
Configured and built Tensorflow 2.3.1.