Pre-task agent memory construction

PREPING: Building Agent Memory without Tasks

Yumin Choi¹ Sangwoo Park¹ Minki Kang¹ Jinheon Baek^1,† Sung Ju Hwang^1,2,†

¹KAIST ²DeepAuto.ai ^†Equal advising

{yuminchoi, jinheon.baek, sungju.hwang}@kaist.ac.kr

PREPING builds reusable procedural memory before deployment, using self-generated synthetic practice to prepare agents for new executable environments before any target-environment task experience is available.

Paper Code

PREPING builds memory before the first user task and reduces online cold-start tool coverage lag. — PREPING targets the cold-start gap in agent memory construction. Offline methods require prior human tasks or demonstrations, while online methods begin deployment with empty memory. PREPING instead constructs procedural memory before the first user task through self-generated synthetic practice, giving the agent broader tool coverage before online experience accumulates.

Abstract

Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without any task-specific experience available. In this paper, we study pre-task memory construction: whether an agent can build procedural memory before observing any target-environment tasks, using only self-generated synthetic practice. Yet, synthetic interaction alone is insufficient, as without controlling what to practice and what to store, synthetic tasks become redundant, infeasible, and ultimately uninformative, and memory further degrades quickly due to unfiltered trajectories. To overcome this, we present PREPING, a proposer-guided memory construction framework. At its core is proposer memory, a structured control state that shapes future practice. A Proposer generates synthetic tasks conditioned on this state, a Solver executes them, and a Validator determines which trajectories are eligible for memory insertion while also providing feedback to guide future proposals. Experiments on AppWorld, BFCL v3, and MCP-Universe show that PREPING substantially improves over a no-memory baseline and achieves performance competitive with strong playbook-based methods built from offline or online experience, with deployment cost 2.99x lower on AppWorld and 2.23x lower on BFCL v3 than online memory construction. Further analyses reveal that the main benefit does not come from synthetic volume alone, but from proposer-side control over feasibility, redundancy, and coverage, combined with selective memory updates.

Method

Controlled Synthetic Practice

PREPING treats memory construction as two coupled control problems: deciding what to practice before deployment, and deciding what synthetic experience is safe to store.

Proposer

Generates synthetic task-level objectives from documentation and proposer memory, expanding practice toward under-covered environment behavior while avoiding repeated infeasible goals.

Solver

Executes synthetic tasks in the target environment and produces trajectories that expose procedures, preconditions, tool compositions, and failure modes.

Validator

Filters task-trajectory pairs for feasibility and completion, admitting only reliable synthetic experience into deployment-facing solver memory.

Results

Memory Before the First User Task

PREPING improves downstream execution across stateful app workflows, executable function calling, and MCP-server tool use without using target-environment task data during construction.

+17.1 AppWorld average points over Base

+19.3 BFCL v3 average points over Base

+5.4 MCP-Universe average points over Base

2.99x lower AppWorld deployment cost than ACE-Online

2.23x lower BFCL v3 deployment cost than ACE-Online

Coverage cold-start comparison between PREPING, PREPING plus ACE, and ACE-Online. — PREPING enters deployment with broad tool coverage already in memory. ACE-Online must discover that coverage through user-facing tasks, creating a coverage cold start.

AppWorld cold-start prefix success curve for Base, ACE-Online, PREPING, and PREPING plus ACE. — PREPING improves early-task success before online memory has enough interactions to catch up. Initializing online adaptation with PREPING further strengthens the first deployment window.

Deployment time cost per task comparing PREPING and ACE-Online on AppWorld and BFCL v3. — Frozen pre-task memory avoids per-task online memory-update calls during deployment. This shifts memory construction into an amortizable pre-deployment phase.

PREPING construction budget curve showing AppWorld Test-Normal TGC by synthetic task count. — Performance improves as synthetic practice increases. Meaningful gains appear even with modest construction budgets, before the full 100-task setting used in the main experiments.

Online Initialization

PREPING also warm-starts online memory construction.

Instead of starting ACE-Online from empty memory, PREPING+ACE begins deployment with pre-task memory and continues updating online. This improves performance on both AppWorld and BFCL v3 while preserving the standard online adaptation pathway.

AppWorld Avg. 71.3 → 76.3 +5.0 points over ACE-Online

BFCL v3 Base/Ctx Avg. 58.7 → 64.9 +6.2 points over ACE-Online

BibTeX

@misc{choi2026preping,
  title  = {PREPING: Building Agent Memory without Tasks},
  author = {Choi, Yumin and Park, Sangwoo and Kang, Minki and Baek, Jinheon and Hwang, Sung Ju},
  year   = {2026},
  eprint = {2605.13880},
  archivePrefix = {arXiv},
  url    = {https://arxiv.org/abs/2605.13880}
}