The Problem
Stock catalysts — FDA approvals, insider buying clusters, activist filings, earnings surprises — move prices before most retail investors notice them. By the time you read about it on Reddit or see the Seeking Alpha headline, the move is half over.
I wanted a system that would surface these catalysts 30–90 days early. Not a screener with static filters, but something that could discover opportunities I wasn't even looking for. The tools that exist are either too basic (finviz screens) or too expensive (Bloomberg terminal). None of them learn from their own mistakes.
The Architecture
Bull Scouter runs two loops. The daily signal loop runs 3x per day: an assessor discovers opportunities blindly via web research, a 9-step scanner scores them against market data, and an analyst synthesizes everything into verdicts. The nightly maintenance loop is where it gets interesting: the analyst flags what the scanner got wrong, a developer agent writes code fixes, and a reviewer agent posts inline comments on the PR. I wake up to a pull request with the fix already reviewed.
There are 10 AI agents total — 5 top-level agents and 5 discovery sub-agents. They run on launchd
cron jobs on a Mac Mini. No cloud infrastructure. No Kubernetes. Just Python, SQLite, and Claude Opus
talking to each other through the filesystem.
The Agents
Each agent is a Claude Opus instance with a specialized system prompt, running via claude -p
on the CLI. They're named after characters that fit their roles:
Chaotic multi-dimensional explorer. Sends 5 expert sub-agents — Homer, Marge, Bart, Lisa, and Antonis — on blind discovery missions with zero scanner data. Whatever they find independently that the scanner also found? That's the highest-conviction signal.
Sees all possible futures. Reviews all scanner paths — daily signals, contrarian plays, watchlist dips, catalyst heatmap — and issues verdicts. Produces adjustment files that feed back into the next scan cycle.
Builds and fixes everything. Reads Oracle's assessment, identifies bugs and config issues, implements fixes on a branch, runs the full test suite, and creates a PR. Max 5 changes per run.
Legendary code reviewer with uncompromising standards. Four parallel review sub-agents check Tony's PR for bugs and compliance violations. Posts inline comments with committable suggestions. Auto-approves or requests changes.
There's also Darwin the Optimizer — evolves configuration parameters through natural selection. And Rick's 5 discovery experts each specialize in a domain: general catalysts, geopolitics, AI/tech, insider/flow signals, and contrarian value.
Blind Discovery
This is the part I'm most proud of. Rick's 5 experts search the web for stock opportunities with zero knowledge of what the scanner found. They don't see scores, recommendations, or even the ticker list. They start from scratch every time — scanning news, SEC filings, options flow, macro trends.
After discovery, the results merge with the scanner's picks. Tickers found by both systems independently get tagged as “overlap” — the highest conviction signal in the system. Tickers found only by the experts become “scanner blind spots” that feed back into the scanner's improvement cycle.
The Data Pipeline
The scanner pulls from 8+ sources every run: Reddit mentions (ApeWisdom), FDA calendars, space launch manifests, earnings dates, SEC filings, Seeking Alpha & Benzinga RSS, and Perplexity AI for structured event discovery. Each source contributes tickers and catalyst events that feed into the scoring engine.
On top of that, three monitors run during and around market hours. The price monitor polls every 15 minutes for dip entries and cross-path opportunities. The options flow monitor captures put/call ratios, IV skew, and unusual volume twice daily — institutional positioning that's invisible to price and news scanners. Insider buying is tracked via SEC Form 4 filings, with Telegram alerts for purchases above $500K. A market regime gauge combines 7 indicators (VIX, SPY momentum, breadth, options flow, insider activity, news sentiment, yield curve) into a single -100/+100 composite that tells the system whether we're in risk-on or risk-off — the analyst adjusts its aggressiveness accordingly.
The Scoring Engine
Not all opportunities are the same, so the scanner runs multiple scoring frameworks in parallel and picks the best fit. Tier 1 targets binary catalysts in small caps — FDA decisions, contract announcements, the setups that move 50%+ overnight. Tier 2 catches fallen angels and reacceleration plays. Track B scores large-cap momentum breakouts. Fundamental quality frameworks evaluate value (8 components) and growth (7 components) separately. And the Buffett quality screen scores ~900 S&P stocks across 7 components — ROE quality, debt safety, margin strength, FCF generation, earnings consistency, moat proxy, and shareholder return — surfacing the kind of durable, high-quality businesses Warren Buffett would buy.
A stock needs more than a high score to earn a BUY. Three independent gates must pass: the composite score (≥75), a confidence score that measures signal reliability (≥50), and a hysteresis check requiring 2 consecutive qualifying scans. This prevents one-day spikes from triggering false signals. The system also decays stale evidence over time — yesterday's catalyst is worth less than today's.
Notifications
The system pushes alerts through two channels. Telegram for real-time, actionable signals during market hours. Email for the daily summary after the final scan.
Telegram alerts fire throughout the day: price dips on watchlist stocks (every 15 min), insider buys above $500K (every 30 min), unusual options flow (twice daily), and Rick's high-conviction blind discoveries. Each alert includes the current price, score, and key context so you can act without opening the dashboard.
The daily email goes out after the evening scan — a single unified report combining all paths. Top picks with full breakdowns, contrarian candidates, watchlist dip alerts, and a catalyst heatmap. It's the one place where everything the system found that day comes together.
The Schedule
Everything runs on launchd cron jobs, weekdays only.
The daily signal loop runs 3 iterations timed around the US market:
The schedule is designed so each agent's output feeds the next. Rick discovers before the scan runs. The scan finishes before Oracle reviews. Oracle's assessment is ready before Tony starts coding fixes.
The Self-Improvement Loop
Every night after the final scan, Oracle reviews the day's results and writes an assessment. Tony reads this assessment, identifies what to fix — maybe a scoring threshold is too aggressive, maybe a data source is returning stale data — and implements the change on a branch. Linus reviews the PR within 15 minutes. If it passes, I merge it in the morning. If not, Linus posts inline comments explaining why.
The scanner has made hundreds of self-improvements this way. Config tuning, bug fixes, data pipeline repairs — all without me writing a line of code. I review the PRs, but the system proposes and validates its own fixes.
The Stack
Deliberately simple. Python 3.11, SQLite (one file, no migrations — tables auto-create), yfinance for
market data, Claude Opus via the claude CLI (subscription, not API — no per-call billing).
Scheduled with macOS launchd plists. The website is static HTML + Tailwind + vanilla JS
reading from JSON files, hosted on GitHub Pages.
The whole thing runs on a Mac Mini sitting on my desk. Total infrastructure cost: $0 beyond the Claude subscription. The scanner, all 10 agents, the database, the exports — all local.
What's Next
Right now: collecting data and letting the calibration system prove itself. The nightly backfill tracks forward returns (5d, 20d, 60d, 120d) for every signal the scanner has ever produced. As that dataset grows, the system can measure which scoring components actually predict returns and which are noise — then tune weights accordingly. The agents already propose config changes based on Oracle's assessments; calibration data will make those proposals evidence-based instead of heuristic.
The architecture scales to N agents. The constraint isn't compute — it's designing the right prompts and feedback loops so the agents actually make each other better instead of just adding noise.