fbpx

DeepFaceLab in Linux – How to handle cuda libraries

I’ve been struggling with cuda libraries and DeepFaceLab for the last day, and failed making it work properly in Linux. The most common case when it comes to Linux failures (at least for this one) is that people tend to be happy with their systems, while the rest of the world is actually moving forward in development. So very much of old instructions quickly become outdated. This was the exact case with DeepFaceLab.

So I ended up restarting my little project with FULL instructions of making DeepFaceLab work as of July 2021. Because developers are usually not maintaining documentation properly. This probably also includes me. When I started this I ended up with a nvidia-machine-learning package. During todays session, this package did not show up anywhere, why I believe that it is not actually required. So here we go!

Environment

  • Linux MINT (Tricia) with Ubuntu 18-bionic core
  • NVIDIA GeForce GTX 1060

In this instructive document, we will try to install deepfacelab with the drivers recommended for the graphics card. First out, I’d like to find my recommended graphics driver. I ended up with 470. Example:

# export LC_ALL=C.UTF-8
# export LANG=C.UTF-8
# ubuntu-drivers devices

Output:

WARNING:root:_pkg_get_support nvidia-driver-390: package has invalid Support Legacyheader, cannot determine support level
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001C03sv00001043sd000085AEbc03sc00i00
vendor   : NVIDIA Corporation
model    : GP106 [GeForce GTX 1060 6GB]
driver   : nvidia-driver-460-server - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-470 - distro non-free recommended
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-460 - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

Complication and dependencies may be a problem here since I had other drivers installed since last round. In my case it was solved by installing one of them first. I assume that in ”normal systems” libnvidia-compute does not have to be installed first.

apt-get -y install libnvidia-compute-470
apt-get -y install nvidia-driver-470

You probably want to reboot now, since nvidia-smi (/usr/bin/nvidia-smi) will not work properly until then. After rebooting, nividia-smi would look like this.

NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4

Now, Tensorflow will be your greatest enemy. For perl runs this is the driver that tries to find whether you can run deepfacelab with a GPU or not. So first of all, we need as much as possible of the cuda libraries since python needs to find them when it’s time for tensorflow to find the graphics card. Go to https://developer.nvidia.com/cuda-toolkit and download cuda toolkit While trying to make this work manually, the deb-packages seems to be ”problematic”. However, when running the local repo package it seem to actually work! During the local install I see a couple of libraries being installed which has been missing during the last struggle with libs.

At this moment, when installation is down, we now should have a /usr/local/cuda which was – the last time – missing all files necessary for python to find the necessary librarties. I can now see that it contains a lib64 too, which wasnt the case last time. You can run the command below to check the libraries.

# ldconfig -p | grep cuda

And this is what we look for:

At this point, according to ”easy-tensorflow” above, we are also adding this into the .bashrc:

export CUDA_ROOT=/usr/local/cuda/bin/
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/

However, I also chose to put this in the /etc/profile so it will become system global. According to the instructions at that page, we also need more drivers (cudnn for Cuda Deep Neural Networks). As described, an account – that is free – are required. At this moment, I have no idead whether URL is https://developer.nvidia.com/cudnn this is required for deepfacelab or not, but I chose to install this anyway. If someone know anything about this, please tell. The deb files may or may not work, but according to easy-tensorflow they used the tar-archive. The fix-broken below rather removed the drivers again, than installed them.

apt-get install libcupti-dev

Before going somewhere at this state, we should now see whether tensorflow works properly. I realized this after getting an error from python that looked like this:

Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64/

TensorRT can be found here: https://developer.nvidia.com/tensorrt-getting-started.
This is something that Nvidia themselves links to during ”error messaging”. So, now we must ensure that this is also in place. While testing the python (below), I also had to do some symlink magic, since some of the files as missing the links. The files ARE there but versioned differently. The exception is libcudnn.so.7 that is needed in my case.

The next step was necessary to run (at least for me) since the symlinks that is requested by tensorflow. It may be different to yours if you install this on another system.

ln -s libcudart.so libcudart.so.10.0
ln -s libcublas.so libcublas.so.10.0
ln -s libcufft.so libcufft.so.10.0
ln -s libcurand.so libcurand.so.10.0
ln -s libcusolver.so libcusolver.so.10.0
ln -s libcusolver.so libcusolver.so.10
ln -s libcusparse.so libcusparse.so.10.0
ln -s libcudnn.so.8 libcudnn.so.7

Make sure you install tensorflow like this before trying the next step:

python -m pip install tensorflow-gpu==2.0.0

Installing a higher version of tensorflow-gpu could be more silent. Start python and do this:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

The output result should look like this:

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 4200197191832695499
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 7983059117765893941
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 315957471147119305
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 5468454912
locality {
  bus_id: 1
  links {
  }
}
incarnation: 11492666565116267717
physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

When I last tested this, it failed. But now, the devices finally shows up here. So now it is time for anaconda. We will use the recommended installer at https://www.anaconda.com/products/individual. Currently they offer Anaconda with Python 3.8. It should be quite straight forward to install anaconda.

Now go for the deepfacelab for Linux — https://github.com/nagadit/DeepFaceLab_Linux. This may be a bit tricky, since the instructions are pointing at other versions than we just installed. If unsure here, after installation do this, to make sure that your environment is ready. You could also try to reboot one time.

# source ~/.bashrc

The go for the source and do the rest as described at the github repo, With some minor updates. They will be shown after the codeblock.

# conda create -n deepfacelab -c main python=3.8 cudnn=8.2.1 cudatoolkit=11.3.1
# conda activate deepfacelab
# git clone --depth 1 https://github.com/nagadit/DeepFaceLab_Linux.git
# cd DeepFaceLab_Linux
# git clone --depth 1 https://github.com/iperov/DeepFaceLab.git

As you can see above we use python 3.8, cudnn 8.2.1 and cudatoolkid 11.3.1 instead of 3.7/7.6.5/10.1.243. The instructions at github seem to have become a bit old, so this has to be changed at first. Secondly you also have to change some values in the requirements-cuda.txt before you can proceed, since python also changes over time. The errors below is what I got during the pip-install.

opencv-python==4.1.0.25

Should instead be:

opencv-python

Otherwise you’d at least get one error (at least I did), that looks like this:

ERROR: No matching distribution found for opencv-python==4.1.0.25

Now, install the requirements:

# python -m pip install -r ./DeepFaceLab/requirements-cuda.txt

At this point you may be finished. However, the environment file delivered in the current linux repo is still pointing to the python version that they use there. This has to be changed. Go for this row:

export DFL_PYTHON="python3.8"

and change it to

export DFL_PYTHON="python3.8"

If anaconda changes supported versions, you need to remember that too, or you will get further errors when starting DeepFaceLab.

And now, you should be finished!


Upptäck mer från Tornevall

Prenumerera för att få de senaste inläggen skickade till din e-post.

You may also like