Prompt Assemble

Overview

A standardized, token-safe prompt assembly framework that guarantees API stability. Implements Two-Phase Context Construction and Memory Safety Valve to prevent token overflow while maximizing relevant context.

Design Goals:

✅ Never fail due to memory-related token overflow
✅ Memory is always discardable enhancement, never rigid dependency
✅ Token budget decisions centralized at prompt assemble layer

When to Use

Use this skill when:

Building or modifying any agent that constructs prompts
Implementing memory retrieval systems
Adding new prompt-related logic to existing agents
Any scenario where token budget safety is required

Core Workflow

User Input
    ↓
Need-Memory Decision
    ↓
Minimal Context Build
    ↓
Memory Retrieval (Optional)
    ↓
Memory Summarization
    ↓
Token Estimation
    ↓
Safety Valve Decision
    ↓
Final Prompt → LLM Call

Phase Details

Phase 0: Base Configuration


# Model Context Windows (2026-02-04)

# - MiniMax-M2.1: 204,000 tokens (default)

# - Claude 3.5 Sonnet: 200,000 tokens

# - GPT-4o: 128,000 tokens

MAX_TOKENS = 204000  # Set to your model's context limit
SAFETY_MARGIN = 0.75 * MAX_TOKENS  # Conservative: 75% threshold = 153,000 tokens
MEMORY_TOP_K = 3                     # Max 3 memories
MEMORY_SUMMARY_MAX = 3 lines        # Max 3 lines per memory

Design Philosophy:

Leave 25% buffer for safety (model overhead, estimation errors, spikes)
Better to underutilize capacity than to overflow

Phase 1: Minimal Context

System prompt
Recent N messages (N=3, trimmed)
Current user input
No memory by default

Phase 2: Memory Need Decision

def need_memory(user_input):
    triggers = [
        "previously",
        "earlier we discussed",
        "do you remember",
        "as I mentioned before",
        "continuing from",
        "before we",
        "last time",
        "previously mentioned"
    ]
    for trigger in triggers:
        if trigger.lower() in user_input.lower():
            return True
    return False

Phase 3: Memory Retrieval (Optional)

memories = memory_search(query=user_input, top_k=MEMORY_TOP_K)
for mem in memories:
    summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX))

Phase 4: Token Estimation

Calculate estimated tokens for base_context + summarized_memories.

Phase 5: Safety Valve (Critical)

if estimated_tokens > SAFETY_MARGIN:
    base_context.append("[System Notice] Relevant memory skipped due to token budget.")
    return assemble(base_context)

Hard Rules:

❌ Never downgrade system prompt
❌ Never truncate user input
❌ No "lucky splicing"
✅ Only memory layer is expendable

Phase 6: Final Assembly

final_prompt = assemble(base_context + summarized_memories)
return final_prompt

Memory Data Standards

Allowed in Long-Term Memory

✅ User preferences / identity / long-term goals
✅ Confirmed important conclusions
✅ System-level settings and rules

Forbidden in Long-Term Memory

❌ Raw conversation logs
❌ Reasoning traces
❌ Temporary discussions
❌ Information recoverable from chat history

Quick Start

Copy scripts/prompt_assemble.py to your agent and use:

from prompt_assemble import build_prompt

# In your agent's prompt construction:
final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn)

Resources

scripts/

prompt_assemble.py - Complete implementation with all phases (PromptAssembler class)

references/

memory_standards.md - Detailed memory content guidelines
token_estimation.md - Token counting strategies

Prompt Safe