NVIDIA RTX AI Garage: How to Run Popular LLMs Locally on PCs

Photo of author

The age of personal AI has arrived — and it’s powered by NVIDIA RTX. With cutting-edge open-weight models, free developer tools, and powerful hardware acceleration, running large language models (LLMs) locally is no longer just possible — it’s becoming the new standard for AI hobbyists, students, and pros alike.

No subscriptions. No data sharing. No limits. Just fast, private, and snappy AI, right on your desktop or laptop.

Why Local AI Is Booming

With the cost of cloud-based AI services rising and privacy concerns growing, local LLMs offer a compelling alternative. Whether you’re building an AI assistant, studying for finals, or developing a custom chatbot — RTX-powered PCs provide the horsepower and tools needed to run models like gpt-oss, Gemma 3, and Qwen 3 directly on your machine.

NVIDIA’s latest RTX AI Garage blog dives into this movement and the tools making it all possible.

The Local LLM Toolset for RTX PCs

Ollama – Your Gateway to Local AI

One of the easiest ways to get started, Ollama is an open-source, local-first app that lets you:

  • Run LLMs with a drag-and-drop interface
  • Chat with models in real time
  • Drop in PDFs or use multimodal prompts (text + image)

Latest Updates for RTX:

  • Up to 50% boost for gpt-oss-20B
  • 60% faster Gemma 3 models
  • Smarter model scheduling and improved multi-GPU stability

Explore Ollama with RTX

AnythingLLM – Build Your Own Study Buddy or Assistant

Stacked on top of Ollama, AnythingLLM transforms local models into powerful custom AI assistants:

  • Load notes, syllabi, slide decks
  • Generate flashcards, quizzes, and summaries
  • Ask contextual questions tied to your materials

RTX Acceleration = Instant responses + local data privacy

Use cases for students include:

  • “Generate flashcards from my biology lecture.”
  • “Explain this problem from my calculus homework.”
  • “Create and grade a quiz from chapters 5–6.”

Whether you’re prepping for a midterm or a new certification, AnythingLLM + RTX is a game-changer.

LM Studio – A Playground for AI Tinkering

Based on the powerful llama.cpp framework, LM Studio lets you:

  • Load dozens of open models
  • Run inference in real time
  • Serve LLMs as local API endpoints for your own tools or apps

Optimizations for RTX:

  • Supports Nemotron Nano v2 9B
  • Flash Attention enabled by default (+20% performance)
  • CUDA kernel tweaks boost inference speed up to 9%

LM Studio is ideal for developers building agentic AI, chatbots, or integrating AI into creative workflows.

Project G-Assist: AI-Controlled PC Tuning

Project G-Assist, NVIDIA’s experimental AI assistant, lets you control your RTX PC with voice or text. The latest v0.1.18 update adds new laptop-specific features, including:

  • 🔋 BatteryBoost controls for longer unplugged sessions
  • 🔇 WhisperMode to reduce fan noise
  • ⚙️ App profiles to balance performance & efficiency

Plus, with the new Plug-In Builder and Plug-In Hub, users can extend G-Assist with their own commands and integrations — perfect for power users and tinkerers.

Download G-Assist via the NVIDIA App

Windows ML + TensorRT: Up to 50% Faster AI on Windows 11

Microsoft has officially rolled out Windows ML with TensorRT, making it easier than ever to run AI models like:

  • LLMs (via llama.cpp, transformers, etc.)
  • Diffusion models for generative art
  • Other ONNX-supported models

The result? Up to 50% faster inference and easier deployment on RTX-powered Windows 11 PCs.

BONUS: NVIDIA Nemotron Powers Open Model Development

The NVIDIA Nemotron collection (including Nano v2 9B) is fueling open-source AI. From general-purpose LLMs to domain-specific tools, these models are optimized for agentic AI — and ready to run on RTX GPUs.

In Case You Missed It: RTX AI Garage Highlights

FeatureWhat’s New
Ollama on RTX50–60% faster model performance, smarter memory usage
Llama.cppOptimized CUDA kernels, Flash Attention on by default
Project G-Assistv0.1.18 update adds laptop tuning and improved UX
Windows ML + TensorRTOfficial launch, +50% inference boost
AnythingLLMBuild personal AI tutors with PDF/slide input

RTX = Your Personal AI Powerhouse

Forget the cloud. With NVIDIA’s latest updates and tools, your RTX PC is now an AI workstation, tutor, assistant, and creative lab — all in one box.

Learn more, experiment, and build — no subscriptions needed.

Stay updated: