Best Practices

Practical wisdom for getting the most out of DAAF while maintaining research quality

Key takeaways
  1. Write clear, scoped prompts -- describe what you want to learn, not exactly how to do it
  2. Review the research plan carefully -- it's your last chance to shape the analysis before execution begins
  3. Never skip human review -- DAAF is a powerful assistant, not an autonomous system you can walk away from
  4. Check the validation results -- they tell you whether each step passed, flagged warnings, or hit a critical problem
  5. Frame revision requests around your research goal -- not just the technical fix
  6. Know the boundaries -- DAAF is best for exploratory research with expert oversight, not for high-stakes decisions without independent verification

Writing Effective Prompts

When you start a DAAF analysis, you type a research request -- called a "prompt" -- describing what you want to investigate. This is the single most impactful thing you can do to improve the quality of what DAAF produces. I realize that "write better prompts" has become almost cliche advice at this point, but I want to be very concrete here about what that actually means in practice -- because the specifics really matter for a structured research system like DAAF. (For background on how DAAF manages context and why prompt quality matters technically, see Understanding DAAF.)

A good request helps steer the analysis in the right direction and enhances the likelihood of DAAF doing what you really want. When framing a research request, be specific along these dimensions:

You do not need to know the exact dataset names, variable codes, or statistical methods. If you know them, great, but if not, that's fine -- that is genuinely part of what DAAF is designed to handle rigorously. What you do need to provide is a clear enough picture that DAAF can make intelligent decisions about those things -- decisions you'll then review and approve before anything gets executed.

Reviewing the Plan Before Execution

The research plan (Plan.md) is arguably the most important document DAAF produces. It is your last chance to shape the entire analysis before any data is fetched, any code is written, or any computation is spent. I cannot overstate this: time spent carefully reviewing the research plan is the single highest-leverage activity in the entire DAAF workflow.

After DAAF creates the Plan and validates it internally (via the plan-checker agent), it will present a Phase Status Update (PSU2) that summarizes the Plan and gives you the exact file path to read it. Read the actual file. The PSU2 summary is helpful but it is a summary -- the full Plan.md contains critical details about methodology, risk, and scope that the summary necessarily condenses.

A companion file, Plan_Tasks.md, contains the detailed machine-readable task definitions that DAAF uses to execute each step. It is available for auditing specific task definitions if you want to inspect the exact transformation sequence, dependencies, and file paths.

Five Key Sections to Review

  1. Research Question -- Does the stated research question match what you actually asked? Misinterpretation happens, especially when the original request was somewhat open-ended. If the research question has been narrowed or reframed in a way that does not match your intent, flag it now.
  2. Research Outcomes (Must-Haves) -- These define what the analysis must rigorously investigate and report on -- not what the answer should be. Good outcomes specify the measurement required without pre-determining the direction of the finding. There should be at least 3 research outcomes. If any read like hypotheses (predicting a specific result), push back -- those belong in the optional Hypotheses section.
    Good outcome "Year-over-year enrollment change from 2018-2022 for Texas charter vs. traditional public schools is measured and characterized (direction, magnitude, significance)"
    Bad -- this is a hypothesis, not an outcome "Charter schools show higher enrollment growth than traditional public schools"
    Also bad -- too vague to verify "Analysis is comprehensive and covers enrollment trends"
  3. Transformation Sequence (in Plan_Tasks.md) -- The step-by-step execution plan. Look for: Does the sequence make logical sense? Are join keys specified (which columns, what kind of join)? Are file paths explicit, not placeholders? Do the verification criteria have concrete thresholds rather than "data looks correct"?
  4. Risk Register -- The Plan should identify at least one risk with a mitigation strategy. Common risks include data suppression reducing sample size, COVID-era data gaps, coded value changes across years, and join key mismatches. A Plan with zero identified risks is a red flag, not a sign of confidence.
  5. Data Sources and Year Ranges -- Are the right datasets being used? Are the years appropriate? Pay particular attention to known data gaps (e.g., COVID disruptions in 2020-2021) and whether the geographic scope matches your intent.

Red Flags to Watch For

Red FlagWhat It Might MeanWhat to Do
Research Outcomes are vague, subjective, or confirmatoryFinal verification will not be rigorousAsk for more specific, measurable outcomes
No risks identifiedPlan may be overconfidentAsk about suppression rates (where data values are hidden to protect individual privacy -- common in education data), data gaps, and how datasets will be combined
Placeholder file pathsPlan may not be fully specifiedAsk DAAF to complete the paths before proceeding
Very large scopeAnalysis may run very long and incur high API costsConsider narrowing scope first
No description of how datasets will be combinedWhen DAAF merges two datasets, rows can accidentally be duplicated or dropped if the merge logic is wrongAsk DAAF how many rows to expect after combining the datasets, and whether any records will be lost
No mention of suppressed or missing dataPlan may not account for data quality realitiesAsk about expected rates of hidden or missing values and how they will be handled
Statistical method seems inappropriateMay not match data structure or research questionAsk DAAF to justify its methodological choice

How to Request Changes

When you want to change something in the Plan, be specific about what and why, while leaving room for discussion:

"Can we change the year range to 2019-2022 instead of 2016-2022? I want to avoid pre-ESSA data."

"I think we should add urbanicity as a control variable in the regression. The poverty-enrollment relationship likely differs significantly between urban and rural schools, right?"

"The research outcome about suppression rates should specify a threshold -- I'd say that suppression rates below 30% are acceptable for proceeding."

"I do not think OLS regression is the right approach here given the panel structure of the data. Can you consider a fixed-effects model instead?"

What is easy to change at this stage: year ranges, geographic scope, control variables, output format, research outcome language, risk register additions, file naming.

What requires more thought: statistical methodology changes, adding or removing data sources, changing the unit of analysis (e.g., from schools to districts), fundamentally restructuring the transformation sequence.

When in doubt, just tell DAAF what you are thinking. It will let you know if the change is straightforward or if it requires a more significant Plan revision.

Interpreting Validation Checkpoints and STOP Conditions

DAAF runs a lot of validation -- the core philosophy is "every transformation has a validation, no exceptions." But as a user, you do not need to understand every internal check. What you need to understand is: what the results mean and when you need to act.

Understanding Checkpoint Results (CP1-CP4)

DAAF has four primary validation checkpoints embedded directly in its code scripts. These run automatically during execution and check for operational problems -- things like empty data, corrupted values, suppression (where data values are hidden to protect individual privacy -- common in education data), or data loss.

CP1: Post-Fetch Validation -- "Did we get the data we expected?"

Runs right after DAAF downloads data from a source. Checks whether data actually came back, whether expected columns are present, whether data types are correct, and what the missingness rate is for critical fields.

CP2: Post-Cleaning Validation -- "Is the cleaned data usable?"

Runs after DAAF has processed the raw data -- filtering out coded values (like -1 for "missing," -2 for "not applicable"), handling suppression, and applying data quality rules.

CP3: Post-Transformation Validation -- "Did the data transformation do what we intended?"

Runs after every join, aggregation, or derived-variable calculation. Checks whether row counts changed as expected, whether there are new unexpected null values, and whether derived variables have reasonable distributions.

CP4: Pre-Output Validation -- "Does the final output meet our commitments?"

Runs during the synthesis phase, checking the complete output against what the Plan promised. Validates that all required columns are present, all promised output files exist (figures, analysis results, report), all Research Outcomes are rigorously addressed, and the report has all required sections.

When DAAF Stops and Asks for Guidance

STOP conditions are moments when DAAF pauses execution and escalates to you. This is a good thing -- it means the system is working as intended. DAAF does not power through problems silently. When it stops, it will present the issue in a structured format: what happened, what it tried, your options (with pros and cons), and its recommendation.

STOP ConditionWhat HappenedYour Options
Empty data returnedThe data source had no data for your queryAdjust scope, try different source, or acknowledge limitation
Suppression >50%More than half the data is suppressed or missingNarrow geography, reduce subgroups, use different measure
Row loss >90%A transformation (join, filter) dropped most rowsCheck join keys, verify filter logic, adjust criteria
Cross-state assessment comparisonYou asked to compare test scores across statesReframe question (within-state trends are valid)
QA BLOCKER after 2 revisionsCode review found a problem that could not be resolved in 2 attemptsGuide DAAF's approach, simplify the task, or accept limitation
Data unavailableThe dataset does not exist for your scopeChoose a different data source or adjust scope

You do not need to have a solution -- you just need to tell DAAF which direction to go. "Try option 2" or "Let us narrow to just California and see if that helps" are perfectly fine responses.

QA Findings: BLOCKER vs. WARNING vs. INFO

In addition to the CP checkpoints, every script DAAF writes gets independently reviewed by a separate code-reviewer agent -- an adversarial reviewer whose job is to find problems the original code might have missed. The reviewer classifies findings by severity:

The key thing to understand is that DAAF catches and resolves most issues automatically. The vast majority of QA findings are INFO or WARNING. You only hear about BLOCKERs that could not be resolved, and those are rare.

Reviewing Notebooks, Reports, and Script Logs

When a Full Pipeline analysis completes, you receive several artifacts. Here is how to actually read and evaluate each one -- and just as importantly, what to look at first.

Where to Start

  1. Report -- Start here for the big picture. Does the narrative make sense? Do the findings answer your research question?
  2. Figures -- Look at the visualizations referenced in the report. Do they show what the report claims they show?
  3. Plan.md and STATE.md -- Skim Plan.md for methodology and key decisions. Check STATE.md for the Final Review Log and QA Findings Summary to see if DAAF flagged any deviations or concerns.
  4. Notebook -- Dive into specific stages if you want to verify how a particular result was derived.
  5. Script logs -- Go here for the deepest level of detail on any specific step.

You do not need to read everything in detail every time. The report is the synthesis; the notebook is the evidence; the scripts are the primary source. Go as deep as you need to based on how much you trust the results and how high-stakes the analysis is.

Tip: Before diving into individual artifacts, consider browsing the session visually using the DAAF Log Explorer. Run bash view_logs.sh (or .\view_logs.ps1 on Windows) from your daaf-docker folder to see an interactive timeline of every orchestrator action, subagent dispatch, and tool call. This gives you a high-level map of the entire session, making it easier to identify which stages or scripts deserve closer inspection.

Reading the Report

The report follows a standard structure (Executive Summary, Key Findings, Data & Methodology, Limitations, etc.). Focus on:

Reading the Notebook

The marimo notebook is a walkthrough tool -- it assembles the actual scripts that were executed (verbatim, not rewritten) alongside their execution logs. What to look for:

The notebook compiles scripts -- it does not create new analysis code. You will not see any transformations without embedded execution logs.

Reading Script Execution Logs

Every script file in the scripts/ directory has its execution log appended to the end of the file as comments. The execution log includes start/end timestamps, exit code (0 = success, non-zero = failure), stdout (everything the script printed during execution), and stderr (any warnings or errors).

If a script failed, you will also find versioned revisions: 01_fetch-ccd.py (original with its failed log), 01_fetch-ccd_a.py (first revision), 01_fetch-ccd_b.py (second revision if the first fix did not work). The notebook only includes the final successful version, but all versions are preserved in the scripts/ directory for audit trail purposes.

Reading QA Review Scripts

The scripts/cr/ directory contains the code-reviewer's inspection scripts for each stage, named by convention (e.g., stage5_01_cr1.py, stage7_02_cr2.py). These scripts contain the adversarial checks that the code-reviewer ran, along with their results. You generally do not need to read these unless you are investigating a specific concern -- but they are there for full transparency.

Human Oversight Responsibilities

DAAF is not an oracle. It is not an autonomous research system that you can walk away from and trust to get things right. It is not "fire-and-forget." Yes, it is a very powerful -- and sometimes surprisingly thorough -- assistant that operates under strict guardrails. But it is still an LLM-based system, which means it is fundamentally susceptible to the same limitations as all LLM systems: hallucination, sycophancy, over-confidence, and subtle logical errors that look plausible on the surface.

What makes DAAF different from using Claude (or any LLM) ad-hoc is the sheer volume of structured verification layered into the process. But those layers of verification do not eliminate the need for human judgment. They reduce the surface area of what you need to verify, and they make verification easier by giving you organized, traceable artifacts. That is the exoskeleton metaphor: DAAF amplifies your expertise, but your expertise is still the thing doing the real work.

What DAAF Validates Automatically

These safeguards run without your involvement throughout the pipeline:

SafeguardWhat It DoesWhere It Happens
Primary Checkpoints (CP1-CP4)Validates data at fetch, clean, transform, and output stages -- catches empty data, type errors, data loss, missing outputsEmbedded in every script
Secondary QA (QA1-QA4b)Independent adversarial review of every script by a separate code-reviewer agentAfter every script execution
Iteration ProtocolForces every transformation into small steps: DESCRIBE, CODE, EXECUTE, VALIDATE, DECIDEDuring all data operations
Batch Size LimitsMaximum 1-2 transformations per execution cycle to prevent error accumulationDuring data stages
STOP ConditionsAutomatic pause when data quality thresholds are breachedThroughout execution
Version ControlEvery file revision is saved separately -- nothing is ever overwrittenAll stages
Plan-Checker ValidationAutomated 6-dimension validation of the Plan before execution beginsBefore execution
Citation TrackingTracks and attributes citations for data sources, methods, software, and reporting standards as each script executes -- best-effort, not guaranteedThroughout execution and report generation

That is a substantial amount of automated quality control. It means that the majority of operational errors -- wrong data types, broken joins, corrupted files, missing columns, data loss during transformation -- will be caught before you ever see the results.

What Requires Your Judgment

Automated validation cannot assess everything. Here is what still requires a human researcher with domain expertise:

Monitoring DAAF's Internal Reference Loading

There is one oversight responsibility that is easy to overlook because it is about DAAF's own internal mechanics: making sure DAAF actually loaded the reference files and skills it was supposed to load.

LLMs are non-deterministic, and DAAF's reference loading is orchestrated by an LLM. This means that occasionally -- not often, but not never -- an agent will proceed without loading a skill it was instructed to load, or the orchestrator will skip a reference file it was supposed to read. When this happens, the agent falls back on its general training, which produces output that looks correct but is built on plausible inference rather than curated knowledge.

Verbose output is your primary monitoring tool. When you set Verbose output to True in /config (which you should -- it is a required configuration setting), you can see the internal thought process informing the file reads that DAAF's agents make. Here is what to watch for:

Signs that something may not have loaded:

What to do: Ask DAAF to verify ("Can you check whether the agent actually loaded the CCD skill before writing that script?"), request a re-run of that specific step with explicit instructions to load the relevant skill, check the script execution logs (failed scripts with KeyError or unexpected empty results often point back to a loading failure), or check session logs to verify reference loading sequences -- DAAF's built-in session log viewer was designed in part to help users monitor exactly whether and when DAAF loads proper reference files. DAAF's dual-layer validation (CP checkpoints + QA code review) will catch many loading failures downstream, but catching them early saves time and prevents cascading revisions.

When and How to Request Revisions

One of DAAF's most practically useful features is the ability to revise and extend completed analyses without starting from scratch. The version control system means every revision creates new files alongside the originals -- nothing gets lost.

Types of Revisions

Framing Revision Requests

When requesting a revision, include:

  1. Which project -- by title, date, or both. "The Texas poverty analysis from 2026-02-10" or "the CRDC discipline study."
  2. What specifically to change -- the more precise, the better. "Change the enrollment threshold from 50 to 100" is better than "adjust the enrollment filter."
  3. Why (if it is not obvious) -- "I realized virtual schools are skewing the enrollment trends" helps DAAF understand the intent, not just the mechanics.
  4. Downstream expectations -- if you know the change should affect later stages, say so. "Re-run the regression after updating the filter" tells DAAF that you want the downstream analysis updated, not just the data cleaning step.

New Version vs. New Project

New version (same project folder, new suffixed files): the core research question stays the same, you are refining or extending the existing analysis, and the revision builds on the existing Plan's logic.

New project (new project folder, fresh start): the research question is fundamentally different, you are switching to entirely different data sources, or the unit of analysis has changed.

A useful rule of thumb: if the existing Plan would need more than 50% of its transformation sequence rewritten, you are probably better off with a new project. When you are on the fence, DAAF will offer its assessment.

Appropriate vs. Inappropriate Use Cases

DAAF is still in active development, and there is only so much that can be done to check guardrails and test robustness at this stage. It is important to be transparent about what that means in practice.

Appropriate Uses

Uses Requiring Extensive Additional Validation

Never Appropriate


The following sections cover technical tools that DAAF uses behind the scenes. You may not need these day-to-day, but they're here when you do.

Using Git Version Control

DAAF produces a lot of files and does a lot of things at once. Getting comfortable with Git for version control is strongly recommended -- this type of work with LLMs benefits immensely from having a full audit log of file edits and changes at all times, with the ability to roll back changes and identify issues quickly.

Making a private "fork" (your own copy) of the DAAF repository to work in and back up research files to is a good practice (DAAF by default will NOT back up Parquet data files to avoid accidentally sharing data to the cloud). If you want a GitHub backup for your work, ask Claude how to make your own repository and save to it accordingly.

If Git is new to you, try asking Claude to explain the basics. Good starting questions:

Useful Git Commands

git diff HEAD~1 # See exactly what changed in the last session git log --oneline -10 # See the 10 most recent commits in a compact format git stash # Temporarily set aside uncommitted changes git stash pop # Bring stashed changes back

git diff HEAD~1 is great for reviewing what DAAF produced overnight or after a long run -- it shows every file that was added, modified, or deleted, with specific changes highlighted. git stash / git stash pop is useful for experimenting with something (like testing a different analysis approach) without committing to it.

Because DAAF automatically saves snapshots at key pipeline milestones, your work is preserved even across sessions.

Using the Browser-Based Code Editor

Having a good file editor is essential for working with DAAF. DAAF ships with a built-in browser-based code editor (code-server -- VS Code in the browser) that handles file browsing, Markdown preview, Git tracking, and cross-file search with zero installation on your host machine. To launch it:

bash run_vscode.sh # macOS / Linux .\run_vscode.ps1 # Windows

Then open http://localhost:2720 in your browser. The password is displayed in the terminal output (default: daaf). The editor comes pre-loaded with extensions for Python syntax highlighting, Markdown preview, Git history, and CSV viewing.

Alternative: Desktop VS Code with Dev Containers

If you already have VS Code installed and prefer a native desktop experience, install the Dev Containers extension and use "Attach to Running Container" to open the DAAF container's filesystem directly.

There are also similar alternatives designed to be a bit more teched-up with coding agents built in (e.g., Cursor). Your mileage may vary -- find an interface that works for you and your workflow.

Safety with Claude Code

Claude Code is extremely powerful and capable -- which is useful when it is doing what you want, but also means expanded risk when it operates erratically or is manipulated by bad actors. DAAF uses Docker in part to protect users directly, and packages guardrails into its hooks and permission files as well.

The main recommendation is to not let Claude Code run fully unattended. Check back on it periodically, even if only to spot-check what it is outputting and reporting. Letting it go completely unsupervised for long periods of time is asking for trouble, even if problems are rare in practice.

Three primary attack surfaces to be aware of:

  1. Malicious content hidden in data -- Unvetted data files or documentation could contain instructions that cause Claude to act erratically. DAAF mitigates this through structured data handling, but always be cautious with data from untrusted sources.
  2. Prompt injection via online research -- If you ask DAAF to conduct deep research online, the websites it searches could contain malicious prompt-injection instructions designed to manipulate the AI's behavior.
  3. Compromised project files -- Hidden, malicious code or instructions sneaked into the project documentation. All edits and changes to the DAAF project are thoroughly vetted and reviewed for the benefit of all users.

The first two are the user's responsibility: be thorough and thoughtful about what you have Claude read, do, and search on your behalf. DAAF's permission rules and safety hooks are designed to block manipulation at the system level, regardless of what the prompt says, and the structured workflow and validation checkpoints help catch outputs that don't match the data.

Tips for Data Onboarding

Before You Start

During the Process

For a detailed step-by-step guide to adding your own data, see Extending DAAF: Data Onboarding.