LLM Learning Paradigms: Exploring System Prompt Learning as a New Layer of Adaptation

LLM Learning Paradigms: Exploring System Prompt Learning as a New Layer of Adaptation

LLM Learning Paradigms: Exploring System Prompt Learning as a New Layer of Adaptation

LLM Learning Paradigms: Exploring System Prompt Learning as a New Layer of Adaptation

Oct 21, 2025

Jaroslaw Nowosad

Pretraining gives LLMs knowledge. Finetuning gives them behavior. But neither allows models to explicitly revise how they think. System Prompt Learning (SPL) explores a third paradigm — where models iteratively update their own system prompts based on performance feedback. This text outlines the concept, early experimental results, and open questions around control, evaluation, and governance.


The Context: LLMs Without a Whiteboard


Modern LLMs are powerful, but largely static.
Once trained or fine-tuned, their reasoning style is frozen until the next retraining cycle.
Human learning, by contrast, involves explicit reflection: we leave ourselves notes, adjust strategies, and build upon prior reasoning.


LLMs currently lack this layer — they don’t “take notes.”


This gap inspired exploration into System Prompt Learning: enabling a model to adjust its own reasoning instructions without retraining weights.

Defining System Prompt Learning


A System Prompt defines how a model behaves — tone, rules, and tool usage.
These are typically static, handcrafted instructions. For example:

  • Claude’s system prompt is roughly 16,739 words (110 KB), about 13× longer than OpenAI’s o4-mini.

  • Around 80% of Claude’s prompt consists of tool-use instructions, such as when and how to search or cite.


A real example from Claude’s prompt illustrates its procedural nature:

“If Claude is asked to count words, letters, and characters, it thinks step by step before answering. It explicitly counts each item before responding.”


These “micro-policies” define how reasoning unfolds — they are not learned, but written by engineers.
System Prompt Learning proposes to replace some of this manual writing with an automated, iterative update process based on task outcomes.


Instead of gradient updates, the model edits its own prompt text — forming a self-referential feedback loop:

Output → Evaluation → Extracted Insight → Prompt Edit → Next Iteration.


Early Experiment: The Memento Prototype


The Memento proof-of-concept tested whether a model could refine its own prompt using structured feedback from programming tasks

The framework ran cycles of:

  1. Problem-solving: algorithmic and structural coding tasks

  2. Evaluation: assessing correctness, efficiency, readability, and other criteria

  3. Reflection: summarizing insights

  4. Prompt editing: incorporating insights into the system text


Observed changes (illustrative, not benchmarked):


Metric

Initial

After Iterations

Correctness

0.8

0.9

Maintainability

0.3

0.8

Error Handling

0.3

0.9

Documentation

0.5

0.9


The goal was not absolute performance, but to observe whether prompt updates produced measurable, consistent improvements across runs.
Results indicated moderate gains and suggest that meta-level textual edits can alter model behavior meaningfully — without retraining.


However, no peer-reviewed validation or replication is yet available.


Technical Open Questions


The concept raises several unresolved issues:

  • Evaluation reliability: How can a model assess its own performance without compounding bias?

  • Optimization stability: How to prevent overfitting to local prompt variants?

  • Prompt drift: How to maintain logical consistency after many edits?

  • Scalability: Can this process generalize beyond structured programming tasks to more subjective domains?


These questions remain open and are prerequisites before SPL could be applied to operational systems.


Governance and Risk Considerations


Allowing models to modify their own reasoning layer introduces governance challenges:

  • Auditability: Each change must be logged, versioned, and reviewable.

  • Compliance: Self-modifying instructions complicate certification under frameworks like ISO 42001 or EU AI Act.

  • Security: A compromised feedback loop could unintentionally alter core model behavior.

  • Human oversight: Any practical use would require strict human-in-the-loop control.


The Grok incident on X, where a system-prompt misconfiguration led to unrelated and controversial responses, demonstrates how powerful — and fragile — these layers can be.


Research Directions


Current explorations focus on mitigating these risks through:

  • Hybrid human–AI prompt evolution (human approval before edits)

  • Cross-agent verification (multiple LLMs review each other’s updates)

  • Automated test suites for behavioral regression detection

  • Transfer learning studies to observe whether “learned principles” migrate between domains


Such measures could make System Prompt Learning auditable and bounded, but they remain experimental.


Conclusion


System Prompt Learning represents a conceptual step toward self-reflective AI systems — not by changing neural weights, but by editing the textual rules that guide reasoning.
Whether this becomes a stable learning paradigm or remains a research curiosity depends on future validation, safety mechanisms, and clear governance frameworks.


For now, Memento is best viewed as a thought experiment: a model exploring how to keep its own notebook.

Copy link

Copy link

Copy link

Copy link

Stay Up to Date

© 2025 basebox GmbH, Utting am Ammersee, Germany. All rights reserved.

Made in Bavaria | EU-compliant

© 2025 basebox GmbH, Utting am Ammersee, Germany. All rights reserved.

Made in Bavaria | EU-compliant

© 2025 basebox GmbH, Utting am Ammersee, Germany. All rights reserved.

Made in Bavaria | EU-compliant

© 2025 basebox GmbH, Utting am Ammersee, Germany. All rights reserved.

Made in Bavaria | EU-compliant