Understanding DAAF

From first concepts to confident user -- how DAAF thinks, decides, and collaborates with you

This guide is designed to turn a new user into a confident one -- it walks through how DAAF works, what it produces, what to expect, and where it can fail. If you have questions about anything you read here, you can paste any confusing passage into your Claude Code session and ask for help. Claude has access to all of DAAF's documentation and can help you understand it as you go.

What is DAAF?

DAAF is an AI-powered research assistant that helps you go from a research question to a completed analysis -- including data acquisition, cleaning, statistical analysis, visualizations, and a written report -- while keeping you in control of every decision. It runs inside a tool called Claude Code (Anthropic's AI coding assistant) on your computer. You interact with it by typing instructions in plain English, and DAAF handles the technical work while checking in with you at key decision points. Claude Code is a powerful general-purpose AI coding assistant, but it wasn't designed specifically for research -- DAAF adds the structure, the domain knowledge, and the safety guardrails that turn it into a rigorous research tool.

At its core, DAAF automates and simplifies the prompt-engineering process specifically for data analysis and research. Every single thing about how DAAF is designed is fundamentally about telling Claude exactly what it needs to know, when it needs to know it, so it does what you want more often and with higher quality on average. Ignoring the fancy terms like agents, subagents, skills, and orchestrators -- DAAF is a series of pre-built "recipes" of context that get fed to Claude before it tries to do what you ask, with the goal of making it more successful at working transparently, rigorously, and reproducibly.

Three Dimensions of AI Capability

One useful way to think about where AI is right now -- and why people seem to disagree so strongly about how capable it is -- is to think about AI capability as having three interdependent dimensions:

  1. The Mind -- the base model's raw intelligence and reasoning ability. This is what Anthropic, Google, and OpenAI are competing on with each new model release.
  2. The Body -- the orchestration frameworks and tooling that let the model actually do things: read files, run code, search the web, delegate tasks to other models. Claude Code is the "body" that lets Claude's "mind" interact with your computer. DAAF adds a much more structured and capable body on top of that.
  3. The Instructions -- your skill in communicating what you want, plus whatever pre-built instructions the system provides.
AI Output Quality equals Base AI Model Capability (The Mind) times Frameworks and Tooling (The Body) times Direct User Input and Skill (The Instructions)

Each dimension is necessary but insufficient on its own. A brilliant model with no tools can only chat. Powerful tools connected to a weak model will produce sophisticated-looking garbage. A strong model with great tools but vague instructions will go confidently in the wrong direction.

This brings us to one more concept worth knowing: context engineering. You may have encountered this term in the AI space recently -- it refers to the practice of designing systems, instructions, and processes that help an AI intelligently manage and assemble its own context for each specific task. It's a step beyond simply writing good prompts into something more architectural.

That's what DAAF is, at its core -- a context engineering framework designed specifically for research workflows. The DAAF Field Guide ↗ has more detail on these concepts.

This framework also explains why people have such wildly different experiences with AI. Someone chatting casually with a basic web interface is experiencing one narrow slice of what's possible. Someone using Claude Code with DAAF and well-crafted prompts is operating in a genuinely different capability regime -- not because the underlying model is different, but because the other two dimensions are dramatically more developed. The information gradient is steep: people who have invested in tooling and instruction quality are seeing capabilities that are invisible to casual users. This is a significant part of why discourse around AI can feel so polarized -- people are often talking past each other because they're working with very different combinations of these three dimensions.

The Mental Model: Orchestrator, Agents, Skills

I'm going to use an analogy that I think captures it well: DAAF is intended to mirror the workflows of a well-run research lab with you as the lead researcher.

The Orchestrator: Your Lab Director

When you type a message to DAAF, you're talking to the orchestrator. Think of the orchestrator as a lab director -- the person who takes your research question, figures out what needs to be done, decides who on the team should do each piece, coordinates the whole effort, and reports back to you at key milestones.

The orchestrator should NOT be doing the hands-on work itself, because its primary value-add and contribution is coordination and workflow management. It doesn't write analysis scripts, it doesn't clean data, it doesn't run regressions. What it does is:

Specialized Agents: Your Research Team

AgentRole in the Lab AnalogyWhat They Actually Do
research-executorTechnician/AnalystExecutes one data task at a time with meticulous pre/post validation
code-reviewerSenior Technician/AnalystReviews every single script, looking for bugs, methodology errors, and data quality issues
source-researcherResearch AssistantDeep-dives into a specific data source's documentation, caveats, and gotchas
data-plannerResearch Design LeadSynthesizes all preliminary findings into a detailed, executable research plan

Trying to get Claude to do everything equally well is impossible given fixed context window limitations, and trying to do so will ultimately confuse it and cause dreaded context rot (where an LLM becomes unpredictable and erratic due to overfilled or poorly structured context). This means that we need to split responsibilities across "versions" of Claude provided very different instructions and behavioral protocols.

Agent vs. Subagent: An "agent" is the general term for any tailored set of behavioral protocols for an LLM assistant. As a user, you could ask Claude directly to take on an agent persona. However, in DAAF's default workflows, the orchestrator calls up and tasks each agent itself -- agents become "subagents" when called by the orchestrator rather than directly by you.

Skills: Your Team's Reference Library

If agents define behavior ("how should I work?"), then skills define knowledge ("what do I need to know?"). Skills are structured knowledge documents that agents load into their own context on demand. Think of them as specialized reference manuals that your research team pulls off the shelf when they need domain-specific information.

Skill categories include:

In DAAF, skills are generally intended to be loaded by agents, not by the orchestrator. When the orchestrator delegates a task to the research-executor, it tells the agent which skills to load. The agent pulls up the relevant reference material, uses it to guide its work, and returns findings to the orchestrator. This keeps the orchestrator's context lean and ensures each agent gets exactly the knowledge it needs for its specific task.

Key takeaway

DAAF works like a research lab. You give direction to a lab manager (the orchestrator), who delegates work to specialists (agents) who consult reference materials (skills).

The Nine Engagement Modes

DAAF first classifies every request you make into one of nine engagement modes. Each mode triggers a fundamentally different workflow, different outputs, and different expectations. Understanding these modes is the single most useful thing you can do to work with DAAF effectively, because it helps you frame your questions in the way most likely to get you what you actually want.

Before doing anything else, DAAF will tell you which mode it's classifying your request into, explain why, and ask you to confirm. You should always have the chance to say "actually, I just wanted a quick lookup" or "actually, let's go deeper on this."

ModeWhen to UseWhat You Get
Data LookupQuick variable/dataset questionDirect answer with supporting context
Data DiscoveryScoping what data existsFeasibility assessment, available sources
Ad Hoc CollaborationFlexible working sessionThought partner for code, debugging, planning
Full PipelineComplete research analysisPlan, scripts, notebook, report, all artifacts
Revision & ExtensionModify an existing analysisNew versioned artifacts, full QA
Data OnboardingProfile a new datasetReusable data source skill
Reproducibility VerificationVerify a completed analysisReproduction Report with verdict
Framework DevelopmentModify DAAF itselfNew/updated skills, agents, modes
User SupportQuestions about DAAF/toolsConversational help, no formal outputs

Expand any mode below for trigger words, detailed descriptions, and guidance on when to use it -- and when not to:

Data Lookup

Trigger words: what are the values for, how is X defined, lookup, what does this variable mean, explain this table

A quick, focused lookup about available data tables and variables -- think of it as a data documentation oracle. DAAF loads a single relevant data knowledge source skill and gives you what you need quickly.

What you get: A direct, specific answer with supporting context (e.g., coded value definitions) and pointers to relevant documentation.

Expected time: Seconds. One question, one answer.

When NOT to use it: When your question is actually broader than you realize. If you find yourself asking five Data Lookup questions in a row, you probably want Data Discovery mode instead. DAAF will suggest this if it notices the pattern.

Data Discovery

Trigger words: what data exists, is it possible, feasibility, what's available, explore

A focused investigation into what data is available and whether an analysis is feasible -- think of it as a scoping partner. DAAF explores the landscape and reports back with available sources, year ranges, geographic coverage, key caveats, and a feasibility assessment.

Expected time: A few minutes of conversation, usually one or two exchanges. It may launch subagents to do some background research.

When NOT to use it: When you already know what data exists and you're ready to analyze it with a specific research question. In that case, jump straight to Full Pipeline.

Escalation: If Data Discovery turns up promising data, DAAF will suggest: "Based on these findings, would you like me to proceed with a Full Pipeline analysis?"

Ad Hoc Collaboration

Trigger words: help me with, review this, debug this, how do I, think through this with me

A flexible, multi-turn working session where DAAF acts as a thought partner. Review code, debug scripts, brainstorm analytic approaches, investigate a data source, or write a one-off analysis script. The conversation flows naturally -- change topics, ask follow-ups, and go wherever the work takes you.

What you get: A lightweight workspace for anything produced, plus access to all of DAAF's specialized capabilities on demand -- code execution, debugging, data source research, code review, analysis planning.

Expected time: As long as you need. No mandatory checkpoints or gates.

When NOT to use it: When you want a complete, formal analysis with a Plan, Notebook, and Report -- that's Full Pipeline. When you just need a quick variable definition -- that's Data Lookup.

Escalation: If the conversation evolves toward a full analysis, DAAF will suggest formalizing it into a Full Pipeline. Your workspace artifacts carry forward either way.

Full Pipeline

Trigger words: analyze, research, create, generate, what's the relationship between

DAAF takes your research question and runs a complete analytic workflow across 5 phases: exploring available data, creating a detailed research plan, fetching and cleaning data, running analyses and creating visualizations, and delivering a comprehensive report with all supporting artifacts. This is what DAAF was fundamentally built to do.

What you get: A detailed research plan, all raw and processed data files, validated Python scripts for every step, statistical analysis results and visualizations, a compiled Marimo notebook, a stakeholder report, and a lessons-learned document.

Expected time: About 5-10 minutes of active engagement spread across four check-in points where DAAF pauses for your review, a few hours of DAAF working independently, then whatever time you dedicate to reviewing final outputs -- plus API fees.

When NOT to use it: When you just need a quick answer, a variable definition, or want to know if certain data exists. Using Full Pipeline for a simple question is like driving a semi-truck to the corner store.

Revision & Extension

Trigger words: fix, update, change, modify the analysis, revise, extend

Modify or extend an existing analysis. Point DAAF to the project folder, and it reads the Plan and creates new versions of relevant artifacts -- it never modifies the originals. Versioning uses date suffixes: the original might be 2026-01-24, revision 1 becomes 2026-01-24a, revision 2 becomes 2026-01-24b, and so on.

Expected time: Depends on scope. Changing a year range might take 15 minutes; fundamentally rethinking the methodology could be nearly as long as a new analysis.

When NOT to use it: When the existing analysis is fundamentally flawed or you want a substantially different research question. Starting a new Full Pipeline analysis with better-targeted prompts will produce cleaner results than trying to revise the original into something it wasn't designed to answer.

Data Onboarding

Provide a raw data file (CSV, Parquet, Excel) and DAAF runs a thorough profiling protocol across 3 phases (Setup, Profiling, Skill Creation). You review findings and confirm interpretations before DAAF creates a standalone data source skill that future analyses can reference.

Checkpoints: 2 -- one after project setup, one after profiling completes to review preliminary interpretations.

When NOT to use it: When you want to analyze the dataset rather than profile it -- that's Full Pipeline. Data Onboarding is about expanding DAAF's knowledge base, not running an analysis.

Reproducibility Verification

Trigger words: verify, reproduce, does this replicate, check reproducibility

DAAF decompiles the Marimo notebook back into standalone scripts, re-executes each one, compares new outputs against the originals, and cross-references the Report's claims against actual results. The verdict is one of: FULLY REPRODUCED, PARTIALLY REPRODUCED, or NOT REPRODUCED.

Two key decisions: Whether to re-fetch data (default: yes) and methodological review depth (default: light). A deep review additionally scrutinizes statistical assumptions and interpretation quality.

When NOT to use it: When you already know the analysis needs changes -- use Revision & Extension instead.

Framework Development

Trigger words: create a skill, add an agent, modify DAAF, extend the framework

A structured collaboration mode for modifying DAAF itself -- its skills, agents, modes, reference files, templates, and configuration. The orchestrator scopes the current state, presents findings, then authors or modifies framework artifacts following canonical templates.

Checkpoints: 2 -- one after scoping (confirm approach), one after review (approve final state).

When NOT to use it: When you want to onboard a dataset by profiling it (use Data Onboarding) or run an analysis (use Full Pipeline).

User Support

Trigger words: what is DAAF, how does this work, help me understand, Docker, Git, Claude Code help

A lightweight, conversational mode for questions about DAAF itself and the tools it runs on -- Docker, Git, and Claude Code. No subagents are dispatched, no workspace is created, and no formal deliverables are produced. This is the only mode where DAAF itself is the subject, rather than your data or analysis.

Expected time: As long as you need. No checkpoints, no gates, no deliverables -- just a conversation.

When NOT to use it: When you already know what you want to do. If you have a specific data question or research task, jump straight into the relevant mode.

Escalation: When your questions evolve into action ("OK, I think I'm ready to try an analysis"), DAAF will suggest the appropriate mode. It routes, it doesn't gatekeep.

Mode Transitions

FromToWhen it happens
Data DiscoveryFull PipelineYour exploration revealed a feasible, interesting analysis
Full PipelineRevision and ExtensionYou completed an analysis and want to adjust or extend it
Data OnboardingFull PipelineYou profiled a dataset and now want to analyze it
Ad Hoc CollaborationFull PipelineYour working session evolved into something worth formalizing
Full PipelineReproducibility VerificationYou want to verify a completed analysis reproduces
Any modeUser SupportYou have questions about how DAAF or its tools work

DAAF supports clean transitions between any pair of modes where the shift makes sense. You don't need to memorize the transitions -- DAAF will suggest the right mode at natural breakpoints and wait for your confirmation. It should never silently switch modes on you.

Key takeaway

Think of these as different types of conversations, from a quick factual question to a complete multi-hour research project. You choose the scope; DAAF adapts.

Try It Yourself: A Guided Progression

Rather than try to jump in with a complete Full Pipeline Analysis at once, I strongly recommend testing out the simpler features and engagement modes first. Here's a concrete progression I'd recommend, designed to let you assess DAAF's knowledge and capabilities at each level of complexity:

Level 1: Quick Ask (Data Lookup Mode)

Ask DAAF to explain a single dataset or variable you're already familiar with. This tests DAAF's domain knowledge against your own expertise.

Example: How is free/reduced-price lunch eligibility defined in the CCD data? What are the coded values?

What you're testing: Does DAAF know the data as well as you do? Are there gaps in its knowledge? Does it mention the right caveats? Feel free to ask follow-ups -- this is a safe, low-stakes way to calibrate your trust.

Level 2: Thorough Documentation Review (Data Discovery Mode)

Ask DAAF to help figure out what's available within a broad conceptual category of data. This tests its ability to explore multiple options, consider trade-offs, and notice year overlaps or gaps.

Example: I'm considering a research project looking at college and university finances. Can you help me explore what datasets and variables are likely to be of interest?

What you're testing: How does DAAF surface relevant information when faced with broader options and less explicit direction? Does it recognize strengths and pitfalls of each possibility?

Level 3: Data Onboarding

If you have your own dataset, try profiling it with Data Onboarding mode. This is a great way to expand DAAF's capabilities with your own data -- and to contribute back to the community by sharing new data source skills.

Example: I have a CSV of county-level election returns I'd like to profile and add as a data source. The file is at: /daaf/data/county-elections/election_returns_2024.csv

What you're testing: Can DAAF systematically profile a dataset you know well, detect its structure, identify coded values and quality issues, and produce a reusable skill? Do its preliminary interpretations match your domain knowledge?

Level 4: Single Variable Analysis (Simple Full Pipeline)

Ask DAAF to analyze a single variable from a single dataset you already know well. This kicks off a Full Pipeline run, but a very simple and approachable one.

Example: Can you analyze the distribution of school-level poverty rates across all public elementary schools in California for the most recent year available? I'm interested in basic descriptive statistics and a histogram.

What you're testing: Can DAAF correctly fetch, clean, and describe a dataset you're already familiar with? Do the descriptive statistics match what you'd expect? Is the cleaning approach reasonable? This is where you start validating DAAF's execution quality, not just its knowledge.

Level 5: Simple Correlational/Longitudinal Analysis

Ask DAAF to look at the relationship between two variables of interest, possibly over time.

Example: Help me understand how average school-level poverty rates have changed over the past decade for public high schools, broken out by urbanicity (city, suburb, town, rural). Show me the trends and any notable patterns.

What you're testing: Can DAAF handle multi-year data, create meaningful groupings, and produce time-series visualizations? Are the trends sensible? Does it properly handle years with data quality issues (COVID years, for instance)?

Level 6: Multivariate Analysis

Get more abstract and complex. Ask about relationships between multiple variables that require joining data sources and more sophisticated statistical approaches.

Example: What linkages exist between school-level resources (per-pupil expenditure, teacher-student ratio), student socioeconomic status, and access to advanced coursework? Can you tease apart these relationships?

What you're testing: Can DAAF correctly join multiple data sources, handle the complexity of multi-variable analysis, and produce interpretable results? This is where DAAF's rigorous validation pipeline really earns its keep -- there are many more places for subtle errors to creep in.

Pro tip: You can even ask DAAF what you should ask it: I'm trying to think of moderately complex research questions I could use to test the DAAF system, based on the education data available. Can you suggest a few options related to educational equity?

Level 7: Replication Exercises

The ultimate test: can DAAF reproduce results from published research? The Urban Institute's Learning Curve series ↗ leverages the same Education Data Portal datasets DAAF currently has access to, and many studies have open-source code available ↗ for direct comparison.

What you're testing: The gold standard -- can DAAF produce results consistent with published, expert-produced research? This is the most rigorous test possible and will surface any systematic issues in the pipeline.

If you run replication exercises, the community would genuinely benefit from hearing about your results. Share your findings by opening an issue ↗.

Level 8: Charting Your Own Path

Once you're comfortable with the framework, start asking your own original research questions. You've built enough experience to know what DAAF handles well and where you need to pay extra attention. DAAF has strengths and limitations -- the goal is not a single end-all-be-all tool for everyone, but a unified, solid starting point with sensible defaults and opinionated standards of rigor.

If you find ways to make it work better for your context, the community would benefit from sharing that knowledge. See Extending DAAF for more.

Anatomy of a Completed Analysis

When a Full Pipeline analysis completes, your project folder will look something like this:

research/2026-01-24_School_Poverty_Analysis/ +-- 2026-01-24_School_Poverty_Analysis_Plan.md +-- 2026-01-24_School_Poverty_Analysis_Plan_Tasks.md +-- 2026-01-24_School_Poverty_Analysis_Notebook.py +-- 2026-01-24_School_Poverty_Analysis_Report.md +-- LEARNINGS.md +-- STATE.md +-- logs/ +-- scripts/ | +-- stage5_fetch/ | +-- stage6_clean/ | +-- stage7_transform/ | +-- stage8_analysis/ | +-- cr/ | +-- debug/ +-- data/ | +-- raw/ | +-- processed/ +-- output/ +-- analysis/ +-- figures/ +-- preliminary_notes/

Each artifact serves a specific purpose. The Plan.md is the single most important artifact in the project -- it captures everything about what was done and why. It includes the research question (verbatim), Research Outcomes (specific, measurable topics the analysis must investigate -- these define what must be examined, not what the answer should be), data sources with rationale, methodology with justification, a risk register, and a key decisions log. If any outcomes read like hypotheses (predicting a direction), flag them. Plan_Tasks.md contains the detailed machine-readable task specifications with the exact transformation sequence, dependencies, wave assignments for parallel execution, and input/output file paths.

The scripts/ directory is the real work product -- not the notebook. Get a sense for how these scripts are actually written and run: this is where DAAF's value lives. Without the core engine of data analysis being transparent, rigorous, and reproducible, nothing else that comes out of this process is valuable. Spend time here. Each script reads top-to-bottom like a lab notebook with clear section headers, inline audit trail comments explaining intent and reasoning, embedded validation assertions, and an appended execution log showing exactly what happened when the script ran. When a script fails QA and needs revision, the original keeps its output (it's part of the audit trail), and the revised version gets a letter suffix: 01_task.py01_task_a.py01_task_b.py. The cr/ subdirectory contains the code-reviewer's independent QA inspection scripts for every analysis script.

The Notebook.py is a Marimo notebook assembled from the completed scripts -- it's the presentation layer, not where analysis was done. What you won't see in the notebook: new analysis code, interactive dashboards, filter widgets, or additional transformations. This ensures what you see is exactly what was executed and validated, with nothing added or changed. The Report.md is a stakeholder-ready narrative synthesizing key findings, methodology, limitations, visualizations, and a references section (data sources, methodological references, software, and reporting standards -- DAAF tracks these automatically, though you should verify accuracy).

All data files use Apache Parquet format rather than CSV. Parquet preserves data types (integers stay integers, dates stay dates), compresses efficiently, and is fast to read. CSV files lose type information -- everything becomes a string -- which introduces subtle bugs. Parquet prevents an entire category of data quality issues.

STATE.md tracks the current state of the analysis -- which tasks are completed, which are pending, and any issues encountered. This is critical for multi-session work. LEARNINGS.md captures data source quirks, surprising findings, and methodological notes that might help future analyses.

Sample Projects

The repository includes sample projects in the research/ folder to illustrate what DAAF produces. The College Graduation Rate & Selectivity Analysis ↗ demonstrates a complete Full Pipeline run -- browse the Report ↗, the Plan ↗, a data fetch script ↗, or a statistical analysis script ↗ to see real artifacts. A companion Reproducibility Verification ↗ shows what independent re-execution and verification looks like.

These projects are presented as-is -- some interpretation is arguably overblown, and some analytical choices could be questioned. That's the point: DAAF produces work that is worth reviewing, not work that can be trusted blindly.

Dual-Layer Validation

Layer 1: Primary Validation (CP1-CP4)

CheckpointWhat It Catches
CP1: After data fetchEmpty datasets, wrong data types, >90% missing values in critical fields
CP2: After data cleaningInvalid coded values, suppression rates above 50%, impossible analysis types
CP3: After each transformationUnexpected row loss (>90%), broken joins, surprise null values
CP4: Before final outputMissing required outputs, deviations from the plan

Layer 2: Secondary Validation (QA1-QA4b)

QA CheckpointWhat It Catches
QA1: After fetch scriptsSchema problems, ID uniqueness violations, suspicious distributions
QA2: After cleaning scriptsIncorrect coded value handling, flawed filtering logic
QA3: After transformation scriptsBad join cardinality, aggregation errors, derived column mistakes
QA4a: After analysis scriptsInvalid statistical methods, violated assumptions, unreliable results
QA4b: After visualization scriptsMisleading charts, incorrect data sources, missing labels

If a primary checkpoint fails, execution stops -- DAAF doesn't try to power through. It reports the failure and either attempts a fix or escalates to you. The code-reviewer approaches each script with an adversarial mindset and inspects it immediately after execution, not batched at the end -- because an error in script 1 that goes undetected will silently propagate through scripts 2, 3, and 4.

Why two layers? Because they catch different types of errors. Primary validation catches operational failures -- the data is empty, the types are wrong, something clearly broke. Secondary QA catches methodological errors -- the code runs fine and produces output, but the methodology is wrong, the join logic is subtly off, or the interpretation doesn't match what the data actually shows. These are the insidious errors that are easy to miss when reviewing AI-generated code, and they're exactly the errors that matter most for research integrity.

The Full Pipeline Flow

The orchestrator coordinates several specialized agents for different tasks. You don't need to memorize these names -- DAAF manages them automatically. The important thing is understanding the overall flow and where your review points are.

  1. You ask a research question
  2. The orchestrator classifies the request as Full Pipeline and confirms with you
  3. The orchestrator delegates data exploration to a subagent
  4. The orchestrator delegates source deep-dives to source-researcher agents
  5. A synthesis agent consolidates all findings
  6. The orchestrator pauses for your review (Phase Status Update 1) ← your checkpoint
  7. A planning agent creates a detailed Plan, validated by a plan-checking agent
  8. The orchestrator pauses for your review again (Phase Status Update 2) ← your checkpoint
  9. An execution agent works through each task, with a code-reviewing agent inspecting each script
  10. The orchestrator pauses twice more for your review (Phase Status Updates 3 and 4) ← your checkpoints
  11. A notebook assembly agent compiles all scripts
  12. A report-writing agent creates the stakeholder report
  13. A verification agent performs adversarial final verification
  14. The orchestrator delivers everything to you

Context Windows and Prompt Engineering 101

Now that you understand what DAAF does and how it's structured, let's cover the foundational concept that makes it all work: how large language models (LLMs) -- the AI technology behind tools like ChatGPT and Claude -- process information.

  1. LLMs are designed to be really good at predicting the next word when given a sequence of words. They learn how to do this "well" in a variety of ways, but that's really the crux of it: everything about their current functionality, no matter how fancy or surprising (e.g., making powerpoints, searching the web, writing/running code, etc.), is still predicated on that one simple premise.
  2. How "well" LLMs work and what they can do is extremely dependent on the words you provide to it before you ask it to predict the next one. These preliminary words we provide to LLMs first are known as "context". This concept of context is absolutely mission-critical for any work with LLMs for two key reasons:
    • Different LLMs can only "digest" and use a certain amount of context at a time before predicting the next word, which is known as its "context window". To put this in perspective: GPT-3 could only "read" ~1,500 words before predicting the next one. Claude Opus 4.6 has a context window of ~150,000 words by default, and Anthropic is beta testing context windows of ~750,000 words at time of writing. The more context a model can incorporate, the more expertise, skills, and framing it can use for tasks you want it to do.
    • Another frontier of LLM advancement is teaching the LLM how to carefully and thoughtfully pay attention to different aspects of its provided context more judiciously. Not all context is treated equally, and so we need to account for that when providing an LLM with context. LLMs can get strangely confused and become erratic when their context windows are filled in ways they can't really process: this is known as context rot, and needs to be avoided at all costs.
  3. The complex task of trying to maximize an LLM's performance at a requested task by carefully deciding (a) exactly what context, and how much, to give an LLM given its current context window limitations, and (b) how to structure that provided context strategically and optimally for the task at hand to prevent confusion/distraction, is what is known as prompt engineering.

So with that in mind, DAAF can be thought of as a way to help researchers by automating and simplifying the prompt-engineering process specifically for core aspects of the data analysis and research process. Every single thing about how DAAF is designed is really just fundamentally designed to tell Claude exactly what it needs to know, when it needs to know it, so it does what we want more often and with higher quality on average.

Four things to keep in mind as you use DAAF:

  • In addition to the prompt engineering DAAF orchestrates behind the scenes, what you ask Claude to do and how you ask it to do it is an immensely important element of getting better quality output from DAAF/Claude.
  • The system is designed to intelligently select and inject the right context to Claude before your query/question/chat, based on what you provide. But this is NOT foolproof, and simply cannot account for every possibility.
  • DAAF really only works with the cutting-edge models like Opus 4.6, and it pushes them to their limit to take advantage of their full context windows where possible. This is why it is SO expensive to use at this time.
  • While DAAF's reference files, skills, and workflow instructions are all carefully designed to be loaded at specific moments, Claude may occasionally fail to load them, skip a step, or deviate from its instructions in subtle ways. When this happens, the agent falls back on its general training -- and that's when hallucinations and plausible-sounding-but-wrong details creep in. Verbose output in Claude Code's /config settings is particularly useful: it lets you see what DAAF's agents are actually thinking behind the scenes, including which files they're reading and which skills they're loading.

Session Management

DAAF monitors its own context utilization to manage long-running analyses gracefully:

UtilizationWhat Happens
Below 40% and below 150k tokens (roughly equivalent to words -- one token is about 3/4 of a word)Normal operation
>= 40% or >= 150k tokensDAAF starts delegating more to subagents
>= 60% or >= 200k tokensDAAF finishes current work, warns restart may be needed
>= 75% or >= 250k tokensDAAF finalizes STATE.md and recommends restart

When a session ends (whether by context exhaustion or user choice), DAAF writes a comprehensive STATE.md that captures exactly where work stands and provides a restart prompt -- a pre-written message capturing exactly what has been done and what needs to happen next.

How to Restart a Session

  1. Type /clear in the Claude Code terminal to reset the session (this clears the context window but does not affect any files on disk)
  2. Paste the restart prompt that DAAF provided
  3. DAAF reads STATE.md, picks up where it left off, and continues working

If you closed your terminal entirely or the session crashed, start a new Claude Code session and point DAAF to the project folder -- it will read STATE.md and figure out where to resume.

Reproducibility Verification mode note: Reproducibility Verification mode uses Reproduction_Report.md as its session state document instead of STATE.md.

Don't panic if a session ends mid-analysis. This is not unexpected for complex analyses -- the whole STATE.md system exists precisely for this reason. A session restart is not a failure state; it's a deliberate pressure valve that maintains the balance between giving Claude enough context to do good work and preventing context rot. Complex analyses may take several sessions, and each one picks up seamlessly from where the last one left off.

Two practical tips: try to let DAAF finish its current "atomic unit" (e.g., executing + QA-reviewing a script) before stopping -- interrupting mid-script is recoverable but creates a messier restart. And you can always check progress at any point by asking DAAF: "What's the current status of the analysis?"

Where Things Live

You don't need to know what's in most of these directories. The two that matter to you are research/ (where your analyses live) and user_reference/ (where documentation lives). Everything else is DAAF's internal machinery.

DirectoryWhat's In ItWho It's For
research/Your analysis projectsYou
user_reference/User documentationYou
.claude/agents/Agent protocols (14 definitions)DAAF (and curious users)
agent_reference/Workflow documentation, templatesDAAF
.claude/skills/Skill definitionsDAAF (and skill creators)
scripts/Shared utility scriptsDAAF
scripts/host/Host-side convenience scriptsYou

Everything you need to review, share, or reproduce is inside the project folder. You can copy the entire folder to a colleague and they'd have everything needed to understand and verify the analysis -- that's the whole point of reproducibility.

Browsing and Viewing Your Work

DAAF includes convenience scripts for viewing your files, notebooks, and session logs outside of the terminal: