Building an MCP Agentic Stock Trading System - Part 5: Backtesting All Three Agents

Learning

Nov 28

This is Part 5 of a series on building an MCP trading system. Read Part 1, Part 2, Part 3, and Part 4 for the architecture, MCP servers, agentic loop, and decision comparison.

I ran all three agents over 2 months of real market data to see how MCP handles different "brains" with the same tools. The results surprised me—but not in the way I expected.

Backtest Setup:

Period: Sep 15 - Nov 24, 2025 (50 trading days)
Initial Capital: $100,000
Stocks: AAPL, GOOGL, MSFT, AMZN, TSLA
Same data, same MCP tools, three different brains

The Results:

🥇 Rules-Based: +6.25% ($106,251), 100% win rate, 26 trades, Sharpe 15.87, $0 cost

🥈 Local LLM: +6.13% ($106,129), 100% win rate, 13 trades, Sharpe 11.22, $0 cost

🥉 Claude Haiku: +0.88% ($100,875), 50% win rate, 9 trades, Sharpe 1.52, ~$0.15 cost

The surprise? Simple rules beat sophisticated AI reasoning. But the real story is how MCP let me test this hypothesis with zero architectural changes.

The rule-based agent was aggressive: 26 trades, took profits at +15%, re-entered positions, closed 90% of positions profitably. It executed a formula without second-guessing.

The Local LLM came within 0.12% of rules-based performance. Zero cost. 100% private. Reasoning transparency. It was more conservative (13 trades vs 26), held positions longer, but still achieved 100% win rate on closed positions.

Claude struggled. One bad trade (AMZN -11%) destroyed its performance. Too conservative overall (only 9 trades), but when it did trade, the reasoning looked perfect on paper. The market just disagreed.

┌─────────────────────────────────────────────────────────────────┐
│          2-Month Backtest Results (Sep 15 - Nov 24)             │
└─────────────────────────────────────────────────────────────────┘

   🥇 Rules-Based        🥈 Local LLM         🥉 Claude Haiku
   ─────────────────     ────────────────     ─────────────────
   Return: +6.25%        Return: +6.13%       Return: +0.88%
   Value: $106,251       Value: $106,129      Value: $100,875

   Win Rate: 100%        Win Rate: 100%       Win Rate: 50%
   Sharpe: 15.87         Sharpe: 11.22        Sharpe: 1.52
   Drawdown: 0%          Drawdown: 0%         Drawdown: -2.12%

   Trades: 26            Trades: 13           Trades: 9
   Cost: $0              Cost: $0             Cost: ~$0.15

   Strategy:             Strategy:            Strategy:
   Aggressive            Conservative         Very Conservative
   Takes profits         Holds winners        Holds too long
   Re-enters             Buy & hold           Bad AMZN entry

Same MCP tools, three different outcomes

*Three agents, one architecture, three different results*

Full analysis: BACKTEST-COMPARISON.md

This experiment validated MCP's core promise: write tools once, test different intelligence types without rewriting anything.

The technical win isn't that rules-based performed best—it's that I could run this comparison at all. Same 11 MCP tools, three completely different decision-making approaches, identical data. One architecture, three brains.

The Local LLM result (98% of rules-based performance) proves MCP's portability claim. For zero cost and full privacy, you get near-identical results PLUS reasoning transparency. Not because the trading strategy is great—it's basic—but because the architecture supports experimentation.

Claude's underperformance doesn't invalidate cloud AI—it shows prompt engineering matters more than I expected. The MCP architecture worked perfectly (tools were called correctly, data flowed properly). The strategy needs tuning.

What other experiments could you run? Bear markets, sideways markets, longer timeframes, different stock mixes, market crashes, sector rotations. The point isn't to build a production trading system—it's to demonstrate how MCP makes this kind of comparative testing trivial. That's the real experiment here.

agentic-stock-trader - View on GitHub

This post is part of my daily AI journey blog at Mosaic Mesh AI. Building in public, learning in public, sharing the messy middle of AI development.

mcpbacktest-resultsarchitecture-validationrules-vs-ailocal-llm

Bart Gottschalk

Building an MCP Agentic Stock Trading System - Part 5: Backtesting All Three Agents

Building an MCP Agentic Stock Trading System - Part 6: Cloud vs Local vs Rules

Building an MCP Agentic Stock Trading System - Part 4: When Agents Disagree

Mosaic Mesh AI