Building an Agentic Personal Trainer - Part 6: Memory and LearningPart 6

This is Part 6 of a series on building an agentic personal trainer. Read Part 1 for architecture, Part 2 for tools, Part 3 for the system prompt, Part 4 for Garmin integration, and Part 5 for duplicate detection.

"Didn't we talk about my knee yesterday?" If your AI coach can't remember last session, it's not coaching—it's starting over every time.

LangChain provides BufferMemory for conversation history. But that's session-only. The agent needs persistence across sessions, so every conversation gets saved to SQLite (db.js:42-50):

export function saveConversation(userMessage, agentResponse, llmProvider) {
  const stmt = db.prepare(`
    INSERT INTO conversations (timestamp, user_message, agent_response, llm_provider)
    VALUES (datetime('now'), ?, ?, ?)
  `);
  return stmt.run(userMessage, agentResponse, llmProvider);
}
// Note: Simplified for clarity - actual function has optional context parameter

Memory architecture: Two-tier memory: session-level BufferMemory plus persistent SQLite for cross-session continuity

On startup, the system loads the last 5 conversations into memory (trainer-agent.js:70-75):

const recentConvos = getConversationContext(5);
for (const conv of recentConvos) {
  await memory.chatHistory.addUserMessage(conv.user);
  await memory.chatHistory.addAIChatMessage(conv.assistant);
}

Why 5? It's a prototype parameter—enough continuity for coaching context without loading ancient conversations that burn tokens. The function signature shows it's configurable: getConversationContext(limit = 5). A production system would need to justify this number or make it smarter.

The database schema (schema.js) has 6 tables: conversations, injuries, injury_updates, planned_workouts, completed_workouts, and learned_preferences. That last one is infrastructure for future personalization—it's not being used yet, but the schema is in place:

CREATE TABLE learned_preferences (
  category TEXT NOT NULL,
  key TEXT NOT NULL,
  value TEXT NOT NULL,
  confidence REAL DEFAULT 0.5,
  UNIQUE(category, key)
);

The design: when the agent learns "prefers morning workouts" or "doesn't like treadmill running," it could persist that with a confidence score. The savePreference function (db.js:205-217) uses ON CONFLICT to update existing preferences:

export function savePreference(category, key, value, confidence = 0.5) {
  const stmt = db.prepare(`
    INSERT INTO learned_preferences (category, key, value, confidence)
    VALUES (?, ?, ?, ?)
    ON CONFLICT(category, key) DO UPDATE SET
      value = excluded.value,
      confidence = excluded.confidence,
      last_confirmed = datetime('now')
  `);
  return stmt.run(category, key, value, confidence);
}

The intended behavior: higher confidence = more weight in future suggestions. A preference confirmed multiple times would increase in confidence; contradicted preferences would decrease.

Context windows are expensive—and limited. Loading 5 conversations is arbitrary, balancing continuity with token cost and context window size. Local models like Llama 3.1 8B typically have ~8K token context windows. Gemini 2.0 Flash has 1M tokens. That difference matters: 5 conversations might be 2K tokens (comfortable for local) or could grow to 10K+ with verbose coaching exchanges (pushing local limits, trivial for Gemini). Too few conversations and the agent forgets recent context ("didn't we just talk about my knee?"). Too many and you're either burning tokens on ancient history or exceeding your model's context window entirely.

The real limitation: this approach loads conversations by recency, not relevance. A conversation about your knee injury from 3 weeks ago might be more important than yesterday's chat about race day nutrition. A production system could use vector embeddings to retrieve the most relevant past conversations based on the current topic, not just the most recent ones. I explored this approach in detail in my 5-part series on building a local semantic search engine just before starting this project.

But even relevance-based retrieval still loads full conversations—expensive in tokens. That's where learned_preferences comes in, though it's not being used yet in this prototype. The vision: extract key insights from conversations—"prefers morning workouts," "dislikes treadmill running," "right knee sensitive to back-to-back run days"—and make them instantly available without consuming context window space. A few dozen preferences (15 tokens each) give you months of learned behavior at a fraction of the cost of loading even a single full conversation (500+ tokens). The two approaches complement each other: use vector embeddings when you need conversation context, use learned preferences for distilled knowledge.

Next: how the LLM provider abstraction lets me switch between local and cloud models.


Part 6 of 9 in the Agentic Personal Trainer series.


agentic-personal-trainer - View on GitHub


This post is part of my daily AI journey blog at Mosaic Mesh AI. Building in public, learning in public, sharing the messy middle of AI development.

Previous
Previous

Building an Agentic Personal Trainer - Part 7: LLM Provider Abstraction

Next
Next

Building an Agentic Personal Trainer - Part 5: Smart Duplicate Detection