Deploying Retrieval-Augmented Generation (RAG) applications has become simpler thanks to NVIDIA's NIM inference microservices and the RAG Blueprint. In this post, I'll walk you through how I deployed the LLaMA 3-8B language model using NIMs and connected it with the RAG Playground using Docker Compose — all on a single node.
Architecture Overview
We’re deploying two main components:
- NVIDIA NIMs: Containerized inference microservices.
- RAG Playground: A sample app from the NVIDIA RAG Blueprint.
System Prerequisites
- Ubuntu 22.04+ node with NVIDIA GPU
- NGC, Docker, Docker Compose, NVIDIA Container Toolkit
- Internet access to pull containers from nvcr.io
Step 1: Deploy LLaMA 3-8B (or NAY OTHER) Using NIMs
pip install nvidia-nim
nim run nvcr.io/nim/nvidia/llama-3-8b-instruct-4bit-awq:1.3.0 --port 8000
Step 2: Deploy Embedding & Reranking Microservices
docker run -d --gpus all \
-p 9080:8000 \
--name nemo-retriever-embedding-microservice \
nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.3.0
docker run -d --gpus all \
-p 1976:8000 \
--name nemo-retriever-ranking-microservice \
nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:1.3.0
Step 3: Deploy RAG Playground
git clone https://github.com/NVIDIA-AI-Blueprints/rag.git
cd rag
git checkout v1.0.0
cd deploy/compose
Create .env
file dynamically:
export HOST_IP=$(hostname -I | awk '{print $1}')
cat < .env
EMBEDDING_MS_BASE=http://$HOST_IP:9080
RANKING_MS_BASE=http://$HOST_IP:1976
LLM_MS_BASE=http://$HOST_IP:8000
EOF
Start services:
docker compose --env-file .env up -d
Step 4: Validate Setup
Visit RAG Playground UI in your browser at:
http://<YOUR_NODE_IP>:8090
Bonus Tips
- Use
--gpus "device=N"
in Docker to pin workloads to specific GPUs. - Run
nvidia-smi
to monitor utilization and memory.
Final Thoughts
I have created a Ansible plybook project to configure NVIDIA AI Enterprise and RAG. NVIDIA’s NIM services provide a powerful way to run LLMs in production. Combined with the RAG Blueprint, this is a production-grade GenAI setup ideal for internal POCs and prototyping.
Links - NVIDIA BLOG FOR RAG - NVIDIA AI ENTERPRISE