Built forGPU sharing

Carve one NVIDIA GPU into memory-isolated slices for multiple containers — no Kubernetes, no driver patches. Run the installer on any Ubuntu host with an NVIDIA driver and the backend, frontend, and HAMi-core libvgpu image are wired up for you.

$curl -fsSL http://gpu-lambda-blog.vercel.app/install.sh | bash

Prefer to do it yourself? Manual installation instructions.

Isolated shards

Split one GPU into fixed memory slices so each container gets a hard, enforced limit — not a best-effort share.

No Kubernetes

One installer wires up Docker, the NVIDIA toolkit, and the panel. No cluster, no operators, no driver patches.

Full CUDA

Workloads run against the real driver with stock CUDA images — the memory cap is transparent to the code inside.

Secure by default

Each container is confined to its own slice, so one tenant can't reach another's memory or saturate the card.

Maximize every GPU

One card, many workloads. GPU Shards partitions your NVIDIA GPU into memory-isolated slices, so idle memory stops going to waste and every container runs in its own lane.

The in-panel code editor and ML artifacts view

From editor to endpoint

Write a Python handler right in the browser, upload the models and datasets it needs, and run it on a CPU or GPU shard. Happy with the output? Deploy the same code as an HTTP endpoint in one click.

The GPU Shards panel: pick a GPU instance, allocate memory shards, configure the container, and deploy.

Open source. Self-hosted. Yours.

GPU Shards runs entirely on hardware you own or control — no cloud, no account, no telemetry. Install it with one command, share a single card across your team, and keep every workload on your own box.