Prompt Assemble
Overview
A standardized, token-safe prompt assembly framework that guarantees API stability. Implements Two-Phase Context Construction and Memory Safety Valve to prevent token overflow while maximizing relevant context.
Design Goals:
✅ Never fail due to memory-related token overflow
✅ Memory is always discardable enhancement, never rigid dependency
✅ Token budget decisions centralized at prompt assemble layer
When to Use
Use this skill when:
Building or modifying any agent that constructs prompts
Implementing memory retrieval systems
Adding new prompt-related logic to existing agents
Any scenario where token budget safety is required
Core Workflow
User Input
↓
Need-Memory Decision
↓
Minimal Context Build
↓
Memory Retrieval (Optional)
↓
Memory Summarization
↓
Token Estimation
↓
Safety Valve Decision
↓
Final Prompt → LLM Call
Phase Details
Phase 0: Base Configuration
# Model Context Windows (2026-02-04)
# - MiniMax-M2.1: 204,000 tokens (default)
# - Claude 3.5 Sonnet: 200,000 tokens
# - GPT-4o: 128,000 tokens
MAX_TOKENS = 204000 # Set to your model's context limit
SAFETY_MARGIN = 0.75 * MAX_TOKENS # Conservative: 75% threshold = 153,000 tokens
MEMORY_TOP_K = 3 # Max 3 memories
MEMORY_SUMMARY_MAX = 3 lines # Max 3 lines per memory
Design Philosophy:
Leave 25% buffer for safety (model overhead, estimation errors, spikes)
Better to underutilize capacity than to overflow
Phase 1: Minimal Context
System prompt
Recent N messages (N=3, trimmed)
Current user input
No memory by default
Phase 2: Memory Need Decision
def need_memory(user_input):
triggers = [
"previously",
"earlier we discussed",
"do you remember",
"as I mentioned before",
"continuing from",
"before we",
"last time",
"previously mentioned"
]
for trigger in triggers:
if trigger.lower() in user_input.lower():
return True
return False
Phase 3: Memory Retrieval (Optional)
memories = memory_search(query=user_input, top_k=MEMORY_TOP_K)
for mem in memories:
summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX))
Phase 4: Token Estimation
Calculate estimated tokens for base_context + summarized_memories.
Phase 5: Safety Valve (Critical)
if estimated_tokens > SAFETY_MARGIN:
base_context.append("[System Notice] Relevant memory skipped due to token budget.")
return assemble(base_context)
Hard Rules:
❌ Never downgrade system prompt
❌ Never truncate user input
❌ No "lucky splicing"
✅ Only memory layer is expendable
Phase 6: Final Assembly
final_prompt = assemble(base_context + summarized_memories)
return final_prompt
Memory Data Standards
Allowed in Long-Term Memory
✅ User preferences / identity / long-term goals
✅ Confirmed important conclusions
✅ System-level settings and rules
Forbidden in Long-Term Memory
❌ Raw conversation logs
❌ Reasoning traces
❌ Temporary discussions
❌ Information recoverable from chat history
Quick Start
Copy scripts/prompt_assemble.py to your agent and use:
from prompt_assemble import build_prompt
# In your agent's prompt construction:
final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn)
Resources
scripts/
prompt_assemble.py- Complete implementation with all phases (PromptAssembler class)
references/
memory_standards.md- Detailed memory content guidelinestoken_estimation.md- Token counting strategies