Create Linux GPU server

A guide to creating a Proxmox VM with GPU passthrough.

Warning

Make sure the underlying prox host is configured to serve GPUs. You can check that guide here

STEP 1 - Deploy VM

Create a Proxmox Virtual Machine with the optimal (or working) option as described in the GPU VM Settings section at the end of this file.

Warning

The GPU (PCI) device should not be connected yet.

STEP 2 - Install the preferred OS

For our purposes, we will deploy an Ubuntu 24.04 LTS or Ubuntu 22.04.

You can run through the installer as you prefer.

Do NOT install a desktop environment (keep it in multi-user (cli) mode), as there have been issues with the desktop environment once the GPU device is attached.

STEP 3 - Update OS packages & install qemu-agent

We first want to ensure that the OS is up to date and that the VM can communicate to the proxmox hosts.

# Sync repo information to local"
sudo apt update
# Upgrade all packages
sudo apt upgrade -y
# Install guest tools
sudo apt install -y qemu-guest-agent
# Enable guest tools
sudo systemctl enable --now qemu-guest-agent

STEP 4 - Blacklist Nouveau GPU driver

We need to blacklist the Nouveau driver so that it does not automatically take ownership of the GPU when it is attached.

# Blacklist Nouveau
sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
update-initramfs -u

Shut down the VM

STEP 5 - Install the latest official drivers

Information

You can now attach a GPU to the VM (as a PCI device (if the GPU passthrough management is not available yet))

We can now start the VM and install the relevant nvidia drivers.

# Add graphics repo to sources
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
# Get and install latest Nvidia drivers
ubuntu-drivers devices
sudo ubuntu-drivers autoinstall

Please restart the VM again.

STEP 6 - You can now check if the GPU is online

Running the below command show now show an NVIDIA GPU as available for use.

# Testing that GPU is now managed by nvidia driver
nvidia-smi

CUDA Testing (Optional)

This will install the CUDA framework, build a simple file and run it on the GPU and CPU.

Create the following files

You can download the files from Gitlab Project

do-simple-cuda-test

#!/bin/bash
echo " - Installing CUDA toolkit..."
sudo apt install nvidia-cuda-toolkit
echo " - Verify CUDA toolkit..."
nvcc --version nvida-smi
echo " - Install build utilities..."
sudo apt install gcc
gcc --version
echo " - Build CUDA binary..."
nvcc simple-cuda-test-source.cu -o gpu_test
echo " - Run GPU test binary..."
./gpu_test

simple-cuda-test-source.cu

#include <stdio.h> 

__global__ void runOnGPU(void) {
    // Each thread prints its unique ID within the block
    printf("Run on [GPU] thread %d!\n", threadIdx.x);
} 

int main(void) {
    printf("---\nRun on [CPU]!\n---\n");
    // Launch runOnGPU kernel on GPU with 1 block and 10 threads
    runOnGPU<<<1, 10>>>();
    // Wait for GPU to finish before accessing on host
    cudaDeviceSynchronize();
    // Explicitly destroys and cleans up all resources associated with the current device in the current process.
    cudaDeviceReset();
    printf("---\n");
    return 0;
}

Run the test

chmod +x do-simple-cuda-test
./do-simple-cuda-test

GPU VM Settings

*Enable advanced to see all settings

For optimal/functional performance use the option that is in the left most column. If none is selected then use whatever is default

System

Feature	Optimal	Working	Degraded	Not Working	Untested
Graphic card	VirtIO-GPU	Default	-	-	*
Machine	q35	-	-	i440fx	-
BIOS	OVMF (UEFI)	-	-	SeaBIOS	-
SCSCI Controller	-	VirtIO SCSI single	-	-	*
Qemu Agent	Enabled	Disabled	-	-	-
Add TPM	-	Enabled/Disabled	-	-

Disks

Feature	Optimal	Working	Degraded	Not Working	Untested
Bus/Device	-	SCSI	-	-	*
Storage	-	local-lvm	-	-	Pure Storage
Disk size	-	-	-	-	-
Cache	-	Default (No cache)	-	-	*
Discard	Enabled	Disabled	-	-	-
IO thread	-	Enabled	-	-	Disabled
SSD emulation	Enabled	-	Disabled	-	-
Read-only	-	Disabled	-	-	Enabled
Backup	-	Enabled	-	-	Disabled
Skip replication	-	Disabled	-	-	Enabled
Async IO	-	Default (io_uring)	-	-	*All others

CPU

Feature	Optimal	Working	Degraded	Not Working	Untested
Sockets	-	-	-	-	-
Cores	-	-	-	-	-
Type	host	-	x86-64-v2-AES	Intel: Skylake-Server*	*
VCPUs	-	Default	-	-	Off/On
CPU limit	-	Default	-	-	Off/On
CPU Affinity	-	Default	-	-	Off/On
CPU units	-	Default	-	-	Off/On
Enable NUMA	Enabled	Disabled	-	-	-
md-clear	-	Default	-	-	Off/On
pcid	-	Default	-	-	Off/On
spec-ctrl	-	Default	-	-	Off/On
ssbd	-	Default	-	-	Off/On
ibpd	-	Default	-	-	Off/On
virt-ssbd	-	Default	-	-	Off/On
amd-ssbd	-	Default	-	-	Off/On
amd-no-ssb	-	Default	-	-	Off/On
pdpe1gb	-	Default	-	-	Off/On
hv-tlbflush	-	Default	-	-	Off/On
hv-evmcs	-	Default	-	-	Off/On
aes	-	Default	-	-	Off/On

Memory

Feature	Optimal	Working	Degraded	Not Working	Untested
Memory (MiB)	-	-	-	-	-
Minimum memory (MiB)	-	Default	-	-	*
Shares	-	-	-	-	-
Ballooning Device	-	Enabled	-	-	Disabled

Network

Feature	Optimal	Working	Degraded	Not Working	Untested
No network device	-	Enabled	-	-	Disabled
Bridge	-	-	-	-	-
VLAN Tag	-	-	-	-	-
Firewall	-	Enabled	-	-	Disabled
Model	-	VirtIO (paravirutalized)	-	-	*
MAC address	-	-	-	-	-
Disconnect	-	Disabled	-	-	Enabled
MTU	-	Default	-	-	-
Rate limit (MB/s)	-	-	-	-	-
Multiqueue	=VCPU count	Default	1	-	-