Oct 21, 2025

Jaroslaw Nowosad
Pretraining gives LLMs knowledge. Finetuning gives them behavior. But neither allows models to explicitly revise how they think. System Prompt Learning (SPL) explores a third paradigm — where models iteratively update their own system prompts based on performance feedback. This text outlines the concept, early experimental results, and open questions around control, evaluation, and governance.
The Context: LLMs Without a Whiteboard
Modern LLMs are powerful, but largely static.
Once trained or fine-tuned, their reasoning style is frozen until the next retraining cycle.
Human learning, by contrast, involves explicit reflection: we leave ourselves notes, adjust strategies, and build upon prior reasoning.
LLMs currently lack this layer — they don’t “take notes.”
This gap inspired exploration into System Prompt Learning: enabling a model to adjust its own reasoning instructions without retraining weights.
Defining System Prompt Learning
A System Prompt defines how a model behaves — tone, rules, and tool usage.
These are typically static, handcrafted instructions. For example:
Claude’s system prompt is roughly 16,739 words (110 KB), about 13× longer than OpenAI’s o4-mini.
Around 80% of Claude’s prompt consists of tool-use instructions, such as when and how to search or cite.
A real example from Claude’s prompt illustrates its procedural nature:
“If Claude is asked to count words, letters, and characters, it thinks step by step before answering. It explicitly counts each item before responding.”
These “micro-policies” define how reasoning unfolds — they are not learned, but written by engineers.
System Prompt Learning proposes to replace some of this manual writing with an automated, iterative update process based on task outcomes.
Instead of gradient updates, the model edits its own prompt text — forming a self-referential feedback loop:
Output → Evaluation → Extracted Insight → Prompt Edit → Next Iteration.
Early Experiment: The Memento Prototype
The Memento proof-of-concept tested whether a model could refine its own prompt using structured feedback from programming tasks
The framework ran cycles of:
Problem-solving: algorithmic and structural coding tasks
Evaluation: assessing correctness, efficiency, readability, and other criteria
Reflection: summarizing insights
Prompt editing: incorporating insights into the system text
Observed changes (illustrative, not benchmarked):
Metric | Initial | After Iterations |
---|---|---|
Correctness | 0.8 | 0.9 |
Maintainability | 0.3 | 0.8 |
Error Handling | 0.3 | 0.9 |
Documentation | 0.5 | 0.9 |
The goal was not absolute performance, but to observe whether prompt updates produced measurable, consistent improvements across runs.
Results indicated moderate gains and suggest that meta-level textual edits can alter model behavior meaningfully — without retraining.
However, no peer-reviewed validation or replication is yet available.
Technical Open Questions
The concept raises several unresolved issues:
Evaluation reliability: How can a model assess its own performance without compounding bias?
Optimization stability: How to prevent overfitting to local prompt variants?
Prompt drift: How to maintain logical consistency after many edits?
Scalability: Can this process generalize beyond structured programming tasks to more subjective domains?
These questions remain open and are prerequisites before SPL could be applied to operational systems.
Governance and Risk Considerations
Allowing models to modify their own reasoning layer introduces governance challenges:
Auditability: Each change must be logged, versioned, and reviewable.
Compliance: Self-modifying instructions complicate certification under frameworks like ISO 42001 or EU AI Act.
Security: A compromised feedback loop could unintentionally alter core model behavior.
Human oversight: Any practical use would require strict human-in-the-loop control.
The Grok incident on X, where a system-prompt misconfiguration led to unrelated and controversial responses, demonstrates how powerful — and fragile — these layers can be.
Research Directions
Current explorations focus on mitigating these risks through:
Hybrid human–AI prompt evolution (human approval before edits)
Cross-agent verification (multiple LLMs review each other’s updates)
Automated test suites for behavioral regression detection
Transfer learning studies to observe whether “learned principles” migrate between domains
Such measures could make System Prompt Learning auditable and bounded, but they remain experimental.
Conclusion
System Prompt Learning represents a conceptual step toward self-reflective AI systems — not by changing neural weights, but by editing the textual rules that guide reasoning.
Whether this becomes a stable learning paradigm or remains a research curiosity depends on future validation, safety mechanisms, and clear governance frameworks.
For now, Memento is best viewed as a thought experiment: a model exploring how to keep its own notebook.
Stay Up to Date