Using Ollama on Windows 11 as an Alternative to Public LLMs

In my ongoing war against shadow AI, I’ve been testing out alternatives that most everyone can use, no matter what your technical expertise. Although somewhat limited depending on your resources available, (laptop CPU/GPU, memory, etc.) If you want to try the newest open models without sending your prompts to the cloud, or you just want a controllable sandbox for demos, the new offering of Ollama on Windows 11 is a great way to run LLMs locally. You’re going to trade some performance and convenience compared to Copilot or ChatGPT, but you gain privacy, offline capability, and a lot of tinkering power. The truth is, if you need some privacy, this is the way to go and if you’re learning, playing around on a laptop isn’t the worst way to go if you’re working with company or private data.

Below is a practical guide: what to expect, how to install, which models to try, and how to squeeze the most out of a Windows laptop or desktop.

Why local instead of public:

Privacy & control: Everything runs on your machine; prompts and documents don’t leave the PC. That’s attractive for regulated, company data and internal prototypes. Windows-focused outlets have been making the same case, why not use local tools, which can be a smart alternative for many scenarios.
Offline + cost: No subscription required; you can experiment even without internet.
Caveat: Copilot (and Microsoft 365 Copilot) integrates deeply with your tenant’s permissions and data boundaries, which is useful in enterprise, if you’re okay with cloud inference.

What you’ll need (realistic expectations)

CPU-only works, especially with small models (0.5B–7B). A recent article even shows usable results on a 7-year-old laptop, but just temper expectations (think “helper,” not “superhuman coder”).
GPU helps a lot. Ollama supports NVIDIA GPUs on Windows (CUDA); AMD Radeon acceleration has been introduced for Windows and Linux and continues to mature.
Model size matters. Smaller, quantized models (e.g., Q4_K_M) load and respond faster but may lose some quality vs. higher-precision variants like Q8_0.
Context length costs RAM/VRAM. Huge context windows (e.g., 32k–64k tokens) can tank performance; dial them back to keep things a bit snappier.

Once we run through the install, the GUI interface is quite intuitive if you’re used other generative text AI in the past, so I’m going to add the CLI options as well. I appreciate them and to be honest, Ollama performs better when run from just the CLI, which should be expected when running on a local laptop.

Install Ollama on Windows 11

The easiest to install from the command line is either Windows Installer or WinGet. Here’s the WinGet command:

Installed via WinGet

winget install --id=Ollama.Ollama -e

Note: Ollama runs a local service at http://localhost:11434 and gives you a CLI (ollama) plus a friendly Windows GUI app that makes chatting with local models easy (no terminal required).

Pull a model and run your first prompt

Browse the Ollama model library to pick something light (e.g., llama3.2:3b, phi4:14b, qwen2.5:7b, or a coding model like qwen2.5-coder:7b). Then:

Example: pull a small general model

ollama pull llama3.2:3b

Output:

C:\Windows\System32>ollama pull llama3.2:3b
pulling manifest
pulling dde5aa3fc5ff: 100% ▕██████████████████████████████████████████████████████████▏ 2.0 GB
pulling 966de95ca8a6: 100% ▕██████████████████████████████████████████████████████████▏ 1.4 KB
pulling fcc5a6bec9da: 100% ▕██████████████████████████████████████████████████████████▏ 7.7 KB
pulling a70ff7e570d9: 100% ▕██████████████████████████████████████████████████████████▏ 6.0 KB
pulling 56bb8bd477a5: 100% ▕██████████████████████████████████████████████████████████▏ 96 B
pulling 34bb5ab01051: 100% ▕██████████████████████████████████████████████████████████▏ 561 B
verifying sha256 digest
writing manifest
success

Chat from the CLI

ollama run llama3.2:3b "Summarize ACID in relational databases in two sentences."

Output:

C:\Windows\System32>ollama run llama3.2:3b "Summarize ACID in relational databases in two sentences."
ACID stands for Atomicity, Consistency, Isolation, and Durability, which are four fundamental principles that
ensure the reliability and integrity of database transactions in relational databases. By adhering to these
principles, ACID guarantees that database operations are processed reliably, even in the presence of failures or
concurrency issues, ensuring data consistency and accuracy.

I was curious how long it took, so ran it again in my Powershell console, (I’m such a terminal girly…) and this is what I received, sans the larger values in time, which had 0 values:

PS C:\WINDOWS\system32> Measure-Command {ollama run llama3.2:3b "Summarize ACID in relational databases in two sentences."}

Seconds           : 12
Milliseconds      : 383
Ticks             : 123839645
TotalMinutes      : 0.206399408333333
TotalSeconds      : 12.3839645
TotalMilliseconds : 12383.9645

So how did this compare to running it in ChatGPT 5? I ran it from a browser, and updated the prompt to request the time in milliseconds:

ACID means each transaction is Atomic (all-or-nothing), Consistent (preserves rules/invariants), Isolated (concurrent work doesn’t interfere), and Durable (once committed, it survives crashes). Together these properties make reads/writes reliable and predictable—even under concurrency, rollbacks, or failures.

I can’t directly and accurately measure your end-to-end chat round-trip time from inside this interface (network + browser).

Per ChatGPT, it brought back more than requested, (over achiever!) and I ended up having to take it back to Powershell if I wanted to get a real timing outside of the 59 seconds reported internally to the browser. With a quick check, came up with the following command to verify the time outside of the additional browser times:

$sw = [System.Diagnostics.Stopwatch]::StartNew()
$body = @{ prompt = "Summarize ACID in two sentences." } | ConvertTo-Json
Invoke-RestMethod -Method Post -Uri http://localhost:8000/chat -ContentType "application/json" -Body $body | Out-Null
$sw.Stop(); "$($sw.ElapsedMilliseconds) ms"

The total time consumed by ChatGPT 5 to do the same from a public LLM is 4283 ms, in other words,

Source	Time in Milliseconds/seconds
Ollama locally on CPU Laptop with Win 11	12383ms/12 seconds
ChatGPT 5 Pro Plan	4283ms/4 seconds
Difference – ChatGPT was faster by:	8100ms/8 seconds

Yes, ChatGPT was quicker to respond when removing the browser and additional output, but it should be expected. We’re just running this on our laptop and I don’t have a GPU in sight.

Tuning for speed on Windows

If responses feel sluggish, try the following in order. There’s no guarantee you’ll get break-neck speeds, but it will help if Ollama isn’t performing at acceptable speeds:

Pick the right model size & quantization.
Start with ~3B–7B parameter models quantized to Q4_K_M for a good size/quality balance; step up only if you need more reasoning.
Reduce the context length.
Don’t default to 32k or 64k unless you need it; 4k–8k often performs much better on typical PCs.
Use your GPU (if NVIDIA).
Install current NVIDIA drivers; Ollama will automatically accelerate with CUDA if supported. You can also set Windows “Graphics settings” to force the Ollama app to use the high-performance GPU.
Keep memory under control.
Limit simultaneous loaded models and how long they stay in memory:
- OLLAMA_MAX_LOADED_MODELS=1
- OLLAMA_KEEP_ALIVE=5m
  These are standard server env vars you can set in Windows “Environment Variables” before launching Ollama.
Use the new Windows GUI for quick toggles.
The app exposes basic options (like context window) without editing files, which I found handy when I was experimenting.

When to Not Use Ollama

Consider Copilot when:

Deep Microsoft 365 integration: If you need tenant-aware grounding across SharePoint, Outlook, Teams, etc., Copilot’s data-boundary logic and permissions are built in.
Bigger models / turnkey accuracy: Cloud services still win for the largest, most capable frontier models. There’s no local VRAM constraints.

Consider ChatGPT, Perplexity, Claude, etc. (PAID versions) when:

Generic questions regarding business/organizational work. No critical data or PII is involved.
Help reformatting or creating more crisp/concise wording in correspondence/documents.
Create what I call, “filler content” around any proprietary or intellectual property.
Remember to never upload or paste critical data/intellectual property into public generative AI.

Troubleshooting quick hits

“It’s all CPU.” Make sure you’re on a supported NVIDIA GPU with current drivers whenever possible; AMD acceleration on Windows exists but is newer and may vary by model/driver.
“The app works, but API calls fail.” Check the service at http://localhost:11434/api/tags. If it’s not responding, start Ollama and wait a few seconds.
“Model downloads are huge.” Choose smaller/quantized variants (e.g., :3b, Q4_K_M) and lower the context window.

Summary

If Copilot/ChatGPT(Paid, not free), OpenAI, Perplexity, etc. is your daily driver for Generative AI, think of Ollama as a local test bench. It’s perfect for validating prompts, exploring new open models, and building small end-to-end prototypes, both privately and offline. Start with a 3B–7B model, keep the context window modest, and let the new Windows app handle the basics while you get hands-on with the API for deeper experiments

References

These are my go-to I used to understand, to install and to troubleshoot. They’ve been the most helpful and I’d be lying if I didn’t say I’ve asked ChatGPT and Intellisense to help me when stuck with some of the challenges a Google search failed helping me on.

Download Ollama for Windows (installer & CLI/GUI). Ollama
API quickstart (generate/chat examples). Ollama
Model library (browse/choose models). Ollama
NVIDIA GPU support (what’s supported). ollama.qubitpi.org
AMD on Windows (support introduced and evolving). Ollama
PowerShell Docs (support as I learn) Microsoft Learn
PowerShell Timing using the Measure-Command
Context length and performance tips. Windows Central
Why can local AI be a smart choice. Windows Central

Using Ollama on Windows 11 as an Alternative to Public LLMs

Install Ollama on Windows 11

Tuning for speed on Windows

When to Not Use Ollama

Troubleshooting quick hits

Summary

References

Rate

Share

Categories

Share

Rate

Using Ollama on Windows 11 as an Alternative to Public LLMs

Install Ollama on Windows 11

Tuning for speed on Windows

When to Not Use Ollama

Troubleshooting quick hits

Summary

References

Rate

Share

Categories

Share

Rate

Related content

Lessons from the Postmark-MCP Backdoor

Building AI Governance and Policies- First Steps

Everyone Wants a Piece of the AI Pie

The Problem with AI Job Loss Headlines?

Shadow AI Data Leak Risk or “From the Desk of I saw that Coming”