How to Build Self-Improving AI Agents Locally with Hermes and NVIDIA Hardware

Overview

Agentic AI is transforming productivity by enabling autonomous task execution. Following the success of frameworks like OpenClaw, the open-source community has embraced Hermes Agent—a new framework that has garnered over 140,000 GitHub stars in under three months and, as of last week, is the most used agent on OpenRouter. Developed by Nous Research, Hermes is designed for reliability and self-improvement, two qualities historically challenging to achieve. It is provider- and model-agnostic, optimized for always-on local use, making NVIDIA RTX PCs, NVIDIA RTX PRO workstations, and NVIDIA DGX Spark the ideal hardware to run it at full speed, 24/7.

How to Build Self-Improving AI Agents Locally with Hermes and NVIDIA Hardware
Source: blogs.nvidia.com

This guide will walk you through setting up Hermes Agent locally using NVIDIA hardware and the Qwen 3.6 series models from Alibaba, which are high-performance, open-weight LLMs that outperform previous-generation larger models. By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Prerequisites

Before you begin, ensure you have the following:

Step-by-Step Instructions

Step 1: Set Up Your NVIDIA Environment

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

First, verify your GPU is recognized. Open a terminal and run:

nvidia-smi

You should see your GPU model, driver version, and available memory. Next, install the NVIDIA Container Toolkit to enable GPU passthrough to Docker:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

For Windows, ensure you have WSL2 and Docker Desktop with WSL2 integration enabled.

Step 2: Download Qwen 3.6 Model

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Choose either the 27B or 35B parameter model. The 35B model runs on ~20GB of memory and outperforms 120B models (which require 70GB+). The 27B is a dense model that matches accuracy of 400B models. Use huggingface-cli:

pip install huggingface_hub
huggingface-cli download Qwen/Qwen3.6-35B-Instruct --local-dir ./qwen35b

Replace with the correct repository name if needed. Ensure you have a Hugging Face token set (huggingface-cli login).

Step 3: Launch Hermes Agent with Docker

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Pull the Hermes Agent Docker image optimized for NVIDIA GPUs:

docker pull nousresearch/hermes-agent:latest-cuda

Run the container with GPU access and mount the model directory:

docker run --gpus all -d --name hermes-agent \
  -v $(pwd)/qwen35b:/models \
  -e MODEL_PATH=/models \
  -p 8080:8080 \
  nousresearch/hermes-agent:latest-cuda

This launches a web interface at http://localhost:8080 and a REST API for integration.

Step 4: Configure Self-Evolving Skills

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Hermes automatically saves learnings from complex tasks as skills. To enable this, edit the configuration file (hermes_config.yaml inside the container or mount it):

How to Build Self-Improving AI Agents Locally with Hermes and NVIDIA Hardware
Source: blogs.nvidia.com
skills:
  auto_learn: true
  max_skills: 50
  memory_dir: /data/skills

Restart the container to apply changes. Skills are stored as JSON files that can be reviewed and manually curated.

Step 5: Integrate with Messaging and Files

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Hermes supports Slack, Discord, and file access. For Slack integration, set environment variables:

docker run --gpus all -d --name hermes-agent \
  -e SLACK_BOT_TOKEN=xoxb-... \
  -e SLACK_APP_TOKEN=xapp-... \
  ...

For local file access, mount directories:

-v /path/to/files:/data/files

Now your agent can read, write, and process files.

Common Mistakes

Insufficient GPU Memory

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

The Qwen 3.6 35B model requires ~20GB of GPU memory. If you have less, use the 27B model or enable 4-bit quantization. Check with nvidia-smi during startup; if the container crashes with OOM, reduce model size.

Missing NVIDIA Container Toolkit

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Without the toolkit, Docker cannot access the GPU, leading to very slow inference on CPU. Verify with docker run --gpus all nvidia/cuda:11.0-base nvidia-smi.

Skill Overload

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

If auto-learning creates too many skills (max_skills too high), the agent may become slower. Set a reasonable limit and periodically review skills via the web UI. Remove duplicates or outdated ones.

Firewall Blocking Ports

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

If you cannot access the web interface, ensure port 8080 is open in your firewall. On Linux, use sudo ufw allow 8080.

Summary

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Hermes Agent combined with Qwen 3.6 on NVIDIA RTX hardware delivers a powerful, self-improving AI that runs entirely locally. Key takeaways:

By following this guide, you have set up a local agent that not only performs tasks but improves over time, making it ideal for power users who demand privacy, speed, and adaptability.

Recommended

Discover More

JDownloader Website Breach Leads to Malicious Installers Spreading Python RATGetting Started with Fedora Hummingbird: A Step-by-Step Guide to Deploying Distroless ContainersA Step-by-Step Guide to Boosting Production Capacity in the Age of AI Chip DemandHow to Automatically Attribute Failures in LLM Multi-Agent Systems Using the Who&When DatasetBeyond the Gym: Scientists Uncover Creatine's Critical Role in Brain and Heart Health