Ditch the Token Bill: Run AI Stock Analysis Free with Ollama + FinSignal

Run AI Stock Analysis Locally — FinSignal with Ollama, LM Studio, and Claude

FinSignal is a Chrome extension (and standalone web app) that runs 8 specialist financial agents — technical, fundamental, sentiment, risk, earnings, and more — through a single LLM call and returns a BUY / SELL / HOLD signal with cited sources and a confidence score.

It supports three LLM backends out of the box:

Claude API — cloud, web-search grounded, highest quality
Ollama — fully local, runs on your machine, no data leaves your network
LM Studio — fully local, great GUI for model management

This post walks through setting up each one.

Install the Extension

Grab it from the Chrome Web Store
Pin it via the puzzle-piece menu so the ⬡ icon stays in your toolbar

The extension is free to install — the source repo is private, but everything you need is bundled in the published extension.

Option A — Claude API (Cloud, Recommended for Best Results)

Claude has live web_search access, so every analysis point gets grounded in real headlines and filings from the last few hours. This is the highest-fidelity path.

Setup:

Get an API key at console.anthropic.com/settings/api-keys Keys start with sk-ant-api03-
Click the ⬡ icon → paste your key → click Connect

That's it. Your key is stored in chrome.storage.session and cleared automatically when Chrome closes — it never leaves your browser except in direct calls to api.anthropic.com.

Settings → Provider should show Claude (Sonnet) selected. Add a ticker like NVDA, hit Run all, and you'll get a full multi-agent report in ~10 seconds.

Note: Claude is the only provider with live web search. For Ollama and LM Studio, the extension swaps in a different prompt that drops web-search references — more on what local models actually bring to the table below.

Option B — Ollama (Local, Privacy-First)

Ollama lets you run open-weight models entirely on your machine. No API key, no usage costs, no data leaving your network.

1. Install Ollama

# macOS
brew install ollama

# Or download from https://ollama.com

Start the server:

ollama serve
# Runs at http://localhost:11434

2. Pull a model

Gemma 3 worked really well for this use case — it follows the JSON schema reliably and produces coherent multi-section financial analysis:

ollama pull gemma3:4b       # ~3GB, runs on most laptops
ollama pull gemma3:12b      # better quality, needs ~8GB VRAM

Other good options:

ollama pull llama3.2:3b     # fast, lighter
ollama pull mistral:7b      # solid instruction following
ollama pull qwen2.5:7b      # strong at structured output

3. Configure in FinSignal

Open the extension → Settings tab
Provider → select Ollama
Ollama URL → http://localhost:11434 (default, leave as-is)
Model → type the model name exactly as pulled, e.g. gemma3:4b
Click Save

Now run analysis — it'll hit your local Ollama server instead of any cloud API.

Troubleshooting Ollama

CORS error in the extension popup?

The extension popup is on a chrome-extension:// origin. You need to tell Ollama to allow it:

OLLAMA_ORIGINS="chrome-extension://*" ollama serve

Or set it permanently:

# macOS launchd
launchctl setenv OLLAMA_ORIGINS "chrome-extension://*"

Model returns garbled or non-JSON output?

Smaller models sometimes fail to adhere to a strict JSON schema on the first try. Hit Retry — the orchestrator strips markdown fences and re-parses. If it fails repeatedly, try a larger variant (gemma3:12b over gemma3:4b).

Option C — LM Studio (Local, Great for Model Discovery)

LM Studio gives you a GUI for browsing, downloading, and running GGUF models. If you prefer not to use the CLI, this is the smoothest local experience.

1. Install LM Studio

Download from lmstudio.ai — available for macOS, Windows, and Linux.

2. Load a model

In LM Studio:

Go to the Discover tab → search gemma-3 or mistral
Download a Q4 or Q5 quantization (good balance of size vs quality)
Go to Local Server tab → select your model → click Start Server

LM Studio runs an OpenAI-compatible server at http://localhost:1234 by default.

3. Configure in FinSignal

Open the extension → Settings tab
Provider → select LM Studio
LM Studio URL → http://localhost:1234
Model → paste the model identifier shown in LM Studio's server tab (e.g. lmstudio-community/gemma-3-4b-it-GGUF)
Click Save

Provider Comparison

	Claude API	Ollama	LM Studio
Web search grounding	✅ Live headlines & filings	❌ Training data only	❌ Training data only
Fundamental depth	Strong	Strong	Strong
Recency (last earnings, news)	✅ Current	⚠️ Cutoff-limited	⚠️ Cutoff-limited
Privacy	Data sent to Anthropic	100% local	100% local
Cost	Pay per token	Free	Free
Setup	Paste API key	CLI + model pull	GUI download
Best model for this	claude-sonnet-4	gemma3:4b / 12b	Gemma 3 Q4/Q5
JSON schema adherence	Excellent	Excellent (gemma3)	Excellent (gemma3)

How the Analysis Works

All 8 agents run in a single LLM call — not 8 separate requests. The orchestrator builds a combined system prompt assigning each agent role, sends one message, and parses the structured JSON response.

User → orchestrator.js
         ↓
   buildSystemPrompt()  ← 8 agent roles combined
   buildUserMessage()   ← ticker + JSON schema
         ↓
   callClaude() | callOllama() | callLMStudio()
         ↓
   Parse JSON → normalize signal
         ↓
   Zustand store → React UI

Every analysis point in the response must include a source field. The UI silently drops any point without one — a basic anti-hallucination guardrail. Confidence is capped at 99 and calibrated to drop when agents produce conflicting signals.

When running locally (Ollama / LM Studio), the prompt drops web-search instructions and adds:

"You have NO live web access. Base analysis on your training knowledge. Prefix uncertain values with 'approximately' or 'estimated'."

This is an honesty instruction, not a capability ceiling. Models like Gemma 3 are trained on enormous amounts of financial data — SEC filings, earnings transcripts, analyst reports, 10-Ks, financial news. For well-documented tickers, that's years of synthesized coverage baked into the weights.

What the 8-agent framework does with a local model is structured knowledge extraction — forcing the model to surface what it already knows across technical, fundamental, sentiment, risk, and compliance lenses simultaneously. The result can be genuinely high-quality analysis, especially for fundamentals, sector context, business moat, and historical risk patterns.

The gap vs. Claude is specifically recency: last quarter's earnings beat, an analyst downgrade from last week, yesterday's macro event. For longer-horizon views where the fundamental picture matters more than last week's news, local models hold up well.