Best Practices - Learn

Key takeaways

Write clear, scoped prompts -- describe what you want to learn, not exactly how to do it
Review the research plan carefully -- it's your last chance to shape the analysis before execution begins
Never skip human review -- DAAF is a powerful assistant, not an autonomous system you can walk away from
Check the validation results -- they tell you whether each step passed, flagged warnings, or hit a critical problem
Frame revision requests around your research goal -- not just the technical fix
Know the boundaries -- DAAF is best for exploratory research with expert oversight, not for high-stakes decisions without independent verification

Writing Effective Prompts

When you start a DAAF analysis, you type a research request -- called a "prompt" -- describing what you want to investigate. This is the single most impactful thing you can do to improve the quality of what DAAF produces. I realize that "write better prompts" has become almost cliche advice at this point, but I want to be very concrete here about what that actually means in practice -- because the specifics really matter for a structured research system like DAAF. (For background on how DAAF manages context and why prompt quality matters technically, see Understanding DAAF.)

A good request helps steer the analysis in the right direction and enhances the likelihood of DAAF doing what you really want. When framing a research request, be specific along these dimensions:

Geography: Are you interested in a particular state, region, or just nation-wide? Or all of the above, separately? Be specific.
Time period: Which years? Something specific like "2018-2022" is ideal, knowing that it may need to adjust based on data availability. "The past few years" works but is vague and encourages DAAF to make assumptions you might not agree with. Explicit is better.
Data granularity: Are you interested in individual schools, school districts, or colleges/universities? This determines which datasets DAAF reaches for.
Analysis focus: What relationship, trend, or comparison are you trying to understand? "The relationship between poverty and enrollment" is much more actionable than "general socioeconomics."
Methodologies: What types of analytical methodologies do you think will be most relevant and useful for this analysis? Geospatial? Supervised machine learning? Basic descriptive analyses? Being clear about this will help direct DAAF to the right resources internally for better consultation results.
Priorities: What matters most to you about this analysis? If it has to make trade-offs, what should go first? Every analysis involves complicated decision-making, so giving DAAF more insight here helps it align with what you want.
Desired insights: What are you really trying to say, or learn, or do with the data analysis? A sense of your goals will also help DAAF make better decisions.

You do not need to know the exact dataset names, variable codes, or statistical methods. If you know them, great, but if not, that's fine -- that is genuinely part of what DAAF is designed to handle rigorously. What you do need to provide is a clear enough picture that DAAF can make intelligent decisions about those things -- decisions you'll then review and approve before anything gets executed.

Reviewing the Plan Before Execution

The research plan (Plan.md) is arguably the most important document DAAF produces. It is your last chance to shape the entire analysis before any data is fetched, any code is written, or any computation is spent. I cannot overstate this: time spent carefully reviewing the research plan is the single highest-leverage activity in the entire DAAF workflow.

After DAAF creates the Plan and validates it internally (via the plan-checker agent), it will present a Phase Status Update (PSU2) that summarizes the Plan and gives you the exact file path to read it. Read the actual file. The PSU2 summary is helpful but it is a summary -- the full Plan.md contains critical details about methodology, risk, and scope that the summary necessarily condenses.

A companion file, Plan_Tasks.md, contains the detailed machine-readable task definitions that DAAF uses to execute each step. It is available for auditing specific task definitions if you want to inspect the exact transformation sequence, dependencies, and file paths.

Five Key Sections to Review

Research Question -- Does the stated research question match what you actually asked? Misinterpretation happens, especially when the original request was somewhat open-ended. If the research question has been narrowed or reframed in a way that does not match your intent, flag it now.
Research Outcomes (Must-Haves) -- These define what the analysis must rigorously investigate and report on -- not what the answer should be. Good outcomes specify the measurement required without pre-determining the direction of the finding. There should be at least 3 research outcomes. If any read like hypotheses (predicting a specific result), push back -- those belong in the optional Hypotheses section.
Good outcome "Year-over-year enrollment change from 2018-2022 for Texas charter vs. traditional public schools is measured and characterized (direction, magnitude, significance)"

Bad -- this is a hypothesis, not an outcome "Charter schools show higher enrollment growth than traditional public schools"

Also bad -- too vague to verify "Analysis is comprehensive and covers enrollment trends"
Transformation Sequence (in Plan_Tasks.md) -- The step-by-step execution plan. Look for: Does the sequence make logical sense? Are join keys specified (which columns, what kind of join)? Are file paths explicit, not placeholders? Do the verification criteria have concrete thresholds rather than "data looks correct"?
Risk Register -- The Plan should identify at least one risk with a mitigation strategy. Common risks include data suppression reducing sample size, COVID-era data gaps, coded value changes across years, and join key mismatches. A Plan with zero identified risks is a red flag, not a sign of confidence.
Data Sources and Year Ranges -- Are the right datasets being used? Are the years appropriate? Pay particular attention to known data gaps (e.g., COVID disruptions in 2020-2021) and whether the geographic scope matches your intent.

Red Flags to Watch For

Red Flag	What It Might Mean	What to Do
Research Outcomes are vague, subjective, or confirmatory	Final verification will not be rigorous	Ask for more specific, measurable outcomes
No risks identified	Plan may be overconfident	Ask about suppression rates (where data values are hidden to protect individual privacy -- common in education data), data gaps, and how datasets will be combined
Placeholder file paths	Plan may not be fully specified	Ask DAAF to complete the paths before proceeding
Very large scope	Analysis may run very long and incur high API costs	Consider narrowing scope first (DAAFBench charts cost vs. reliability across models if budget is a factor)
No description of how datasets will be combined	When DAAF merges two datasets, rows can accidentally be duplicated or dropped if the merge logic is wrong	Ask DAAF how many rows to expect after combining the datasets, and whether any records will be lost
No mention of suppressed or missing data	Plan may not account for data quality realities	Ask about expected rates of hidden or missing values and how they will be handled
Statistical method seems inappropriate	May not match data structure or research question	Ask DAAF to justify its methodological choice

How to Request Changes

When you want to change something in the Plan, be specific about what and why, while leaving room for discussion:

"Can we change the year range to 2019-2022 instead of 2016-2022? I want to avoid pre-ESSA data."

"I think we should add urbanicity as a control variable in the regression. The poverty-enrollment relationship likely differs significantly between urban and rural schools, right?"

"The research outcome about suppression rates should specify a threshold -- I'd say that suppression rates below 30% are acceptable for proceeding."

"I do not think OLS regression is the right approach here given the panel structure of the data. Can you consider a fixed-effects model instead?"

What is easy to change at this stage: year ranges, geographic scope, control variables, output format, research outcome language, risk register additions, file naming.

What requires more thought: statistical methodology changes, adding or removing data sources, changing the unit of analysis (e.g., from schools to districts), fundamentally restructuring the transformation sequence.

When in doubt, just tell DAAF what you are thinking. It will let you know if the change is straightforward or if it requires a more significant Plan revision.

Interpreting Validation Checkpoints and STOP Conditions

DAAF runs a lot of validation -- the core philosophy is "every transformation has a validation, no exceptions." But as a user, you do not need to understand every internal check. What you need to understand is: what the results mean and when you need to act.

Understanding Checkpoint Results (CP1-CP4)

DAAF has four primary validation checkpoints embedded directly in its code scripts. These run automatically during execution and check for operational problems -- things like empty data, corrupted values, suppression (where data values are hidden to protect individual privacy -- common in education data), or data loss.

CP1: Post-Fetch Validation -- "Did we get the data we expected?"

Runs right after DAAF downloads data from a source. Checks whether data actually came back, whether expected columns are present, whether data types are correct, and what the missingness rate is for critical fields.

PASSED -- The data arrived, has the expected structure, and critical fields are mostly populated.
FAILED -- Something fundamental is wrong -- the data source returned nothing, critical columns are missing, or more than 90% of a critical field is null. DAAF will stop and explain the problem. Options typically include trying a different data source, adjusting the scope, or acknowledging a limitation and proceeding with caution.

CP2: Post-Cleaning Validation -- "Is the cleaned data usable?"

Runs after DAAF has processed the raw data -- filtering out coded values (like -1 for "missing," -2 for "not applicable"), handling suppression, and applying data quality rules.

PASSED -- Cleaning worked as expected; remaining data is sufficient for analysis.
WARNING -- Suppression rates are elevated (typically 30-50%). Enough data remains for analysis, but results may be less precise, particularly for subgroup breakdowns. DAAF will document this but proceed.
FAILED -- Suppression exceeds 50%, meaning more than half the data is missing or suppressed. DAAF will stop -- analysis on data with >50% suppression is generally unreliable. You will need to narrow scope, change data source, or acknowledge this as a fundamental limitation.

CP3: Post-Transformation Validation -- "Did the data transformation do what we intended?"

Runs after every join, aggregation, or derived-variable calculation. Checks whether row counts changed as expected, whether there are new unexpected null values, and whether derived variables have reasonable distributions.

PASSED -- The transformation produced expected results. Row counts, null patterns, and distributions look reasonable.
FAILED -- Something went wrong -- row counts dropped by more than 90%, a join produced unexpected nulls, or derived values are clearly incorrect. DAAF will stop and investigate.

CP4: Pre-Output Validation -- "Does the final output meet our commitments?"

Runs during the synthesis phase, checking the complete output against what the Plan promised. Validates that all required columns are present, all promised output files exist (figures, analysis results, report), all Research Outcomes are rigorously addressed, and the report has all required sections.

PASSED -- Everything the Plan committed to investigate has been rigorously addressed.
FAILED -- Something is missing -- a figure was not generated, a report section is incomplete, or a Research Outcome was not addressed. DAAF will identify the gap and attempt to resolve it.

When DAAF Stops and Asks for Guidance

STOP conditions are moments when DAAF pauses execution and escalates to you. This is a good thing -- it means the system is working as intended. DAAF does not power through problems silently. When it stops, it will present the issue in a structured format: what happened, what it tried, your options (with pros and cons), and its recommendation.

STOP Condition	What Happened	Your Options
Empty data returned	The data source had no data for your query	Adjust scope, try different source, or acknowledge limitation
Suppression >50%	More than half the data is suppressed or missing	Narrow geography, reduce subgroups, use different measure
Row loss >90%	A transformation (join, filter) dropped most rows	Check join keys, verify filter logic, adjust criteria
Cross-state assessment comparison	You asked to compare test scores across states	Reframe question (within-state trends are valid)
QA BLOCKER after 2 revisions	Code review found a problem that could not be resolved in 2 attempts	Guide DAAF's approach, simplify the task, or accept limitation
Data unavailable	The dataset does not exist for your scope	Choose a different data source or adjust scope

You do not need to have a solution -- you just need to tell DAAF which direction to go. "Try option 2" or "Let us narrow to just California and see if that helps" are perfectly fine responses.

QA Findings: BLOCKER vs. WARNING vs. INFO

In addition to the CP checkpoints, every script DAAF writes gets independently reviewed by a separate code-reviewer agent -- an adversarial reviewer whose job is to find problems the original code might have missed. The reviewer classifies findings by severity:

INFO -- An observation that does not indicate a problem but is worth noting. Example: "The dataset has 47 states represented instead of 50, which is expected given the query filters." You will generally not see these unless you dig into the QA scripts.
WARNING -- A potential issue that does not block progress but should be documented. Example: "Suppression rate for rural schools is 38%, which may limit subgroup analysis precision." Warnings are accumulated and presented to you at the Phase Status Update after analysis. They do not stop execution, but they flag things you should consider when interpreting results.
BLOCKER -- A genuine problem that must be fixed before proceeding. Example: "The join produced 40% more rows than expected, indicating a many-to-many join where a many-to-one was specified." Blockers trigger a revision cycle -- DAAF will attempt to fix the script (up to 2 attempts) and re-submit it for review. If the blocker persists after 2 fix attempts, DAAF escalates to you.

The key thing to understand is that DAAF catches and resolves most issues automatically. The vast majority of QA findings are INFO or WARNING. You only hear about BLOCKERs that could not be resolved, and those are rare.

Reviewing Notebooks, Reports, and Script Logs

When a Full Pipeline analysis completes, you receive several artifacts. Here is how to actually read and evaluate each one -- and just as importantly, what to look at first.

Where to Start

Report -- Start here for the big picture. Does the narrative make sense? Do the findings answer your research question?
Figures -- Look at the visualizations referenced in the report. Do they show what the report claims they show?
Plan.md and STATE.md -- Skim Plan.md for methodology and key decisions. Check STATE.md for the Final Review Log and QA Findings Summary to see if DAAF flagged any deviations or concerns.
Notebook -- Dive into specific stages if you want to verify how a particular result was derived.
Script logs -- Go here for the deepest level of detail on any specific step.

You do not need to read everything in detail every time. The report is the synthesis; the notebook is the evidence; the scripts are the primary source. Go as deep as you need to based on how much you trust the results and how high-stakes the analysis is.

Tip: Before diving into individual artifacts, consider browsing the session visually using the DAAF Log Explorer. Open it from the DAAF Control Panel (bash daaf.sh / .\daaf.ps1 from your daaf-docker folder → 3) View Session Logs), or run bash view_logs.sh (.\view_logs.ps1 on Windows) directly, to see an interactive timeline of every orchestrator action, subagent dispatch, and tool call. This gives you a high-level map of the entire session, making it easier to identify which stages or scripts deserve closer inspection.

Reading the Report

The report follows a standard structure (Executive Summary, Key Findings, Data & Methodology, Limitations, etc.). Focus on:

Key Findings -- Are findings genuinely supported by the data? Look for specificity -- "Enrollment declined by 12% between 2019 and 2022" is verifiable. "Enrollment showed interesting trends" is not.
Limitations section -- Often the most important section. DAAF is instructed to be candid about limitations, suppression rates, data gaps, and caveats. If the limitations section is suspiciously short or generic, that is a red flag -- not because DAAF is hiding something, but because the system may not have adequately identified the limitations.
Figure references -- The report should reference specific figures by filename. Verify that the referenced figures exist and actually show what the report says they show.
References -- The report includes a References section with up to four subsections: data sources, methodological references, software & tools, and reporting standards. DAAF tracks these automatically as each script executes, but citations can be wrong, incomplete, or missing -- verify that the right methods and tools are credited and that the citations are accurate.

Reading DAAF's Claims: Observed Facts vs. Inference

One habit worth building as you read any DAAF output is to notice how it grades its own claims by evidence. When DAAF reports that something ran -- a fetch returned so many rows, a validation passed -- the actual command and its output are on record, quoted in the script's execution log for you to check; statements without that kind of record are inferences, and DAAF is instructed to phrase them so they read that way rather than as established fact. Pay especially close attention when DAAF says something is impossible or unavailable -- that a source doesn't offer a variable, or that an operation can't be done -- because a wrong "no" fails silently and can quietly harden into accepted fact, so those negative claims deserve the same "show me the check" scrutiny you'd give any surprising result. And when DAAF tells you how much it did -- files changed, outcomes addressed -- that accounting should trace back to actual tool output, not to memory. Reading with this lens tells you which parts of a report you can take at face value and which warrant a second look.

Reading the Notebook

The marimo notebook is a walkthrough tool -- it assembles the actual scripts that were executed (verbatim, not rewritten) alongside their execution logs. (For R projects the notebook is a Quarto .qmd instead -- see Understanding DAAF -- and the same review guidance applies.) What to look for:

Execution logs that show warnings or unexpected values. Expand the accordions and scan for anything that looks off.
Row counts at each stage. You should be able to trace the data from raw (usually large) to processed (usually smaller after filtering) to analysis (potentially larger or smaller depending on joins). Dramatic unexpected changes in row count deserve investigation.
Validation results. Each script includes embedded validation. Look for CP status: PASSED, WARNING, or FAILED.

The notebook compiles scripts -- it does not create new analysis code. You will not see any transformations without embedded execution logs.

Reading Script Execution Logs

Every script file in the scripts/ directory has its execution log appended to the end of the file as comments. The execution log includes start/end timestamps, exit code (0 = success, non-zero = failure), stdout (everything the script printed during execution), and stderr (any warnings or errors).

If a script failed, you will also find versioned revisions: 01_fetch-ccd.py (original with its failed log), 01_fetch-ccd_a.py (first revision), 01_fetch-ccd_b.py (second revision if the first fix did not work). R projects follow the same pattern with .R files (01_fetch-ccd.R, 01_fetch-ccd_a.R, and so on). The notebook only includes the final successful version, but all versions are preserved in the scripts/ directory for audit trail purposes.

Reading QA Review Scripts

The scripts/cr/ directory contains the code-reviewer's inspection scripts for each stage, named by convention (e.g., stage5_01_cr1.py, stage7_02_cr2.py). These scripts contain the adversarial checks that the code-reviewer ran, along with their results. You generally do not need to read these unless you are investigating a specific concern -- but they are there for full transparency.

Human Oversight Responsibilities

DAAF is not an oracle. It is not an autonomous research system that you can walk away from and trust to get things right. It is not "fire-and-forget." Yes, it is a very powerful -- and sometimes surprisingly thorough -- assistant that operates under strict guardrails. But it is still an LLM-based system, which means it is fundamentally susceptible to the same limitations as all LLM systems: hallucination, sycophancy, over-confidence, and subtle logical errors that look plausible on the surface.

What makes DAAF different from using Claude (or any LLM) ad-hoc is the sheer volume of structured verification layered into the process. But those layers of verification do not eliminate the need for human judgment. They reduce the surface area of what you need to verify, and they make verification easier by giving you organized, traceable artifacts. That is the exoskeleton metaphor: DAAF amplifies your expertise, but your expertise is still the thing doing the real work.

What DAAF Validates Automatically

These safeguards run without your involvement throughout the pipeline:

Safeguard	What It Does	Where It Happens
Primary Checkpoints (CP1-CP4)	Validates data at fetch, clean, transform, and output stages -- catches empty data, type errors, data loss, missing outputs	Embedded in every script
Secondary QA (QA1-QA4b)	Independent adversarial review of every script by a separate code-reviewer agent	After every script execution
Iteration Protocol	Forces every transformation into small steps: DESCRIBE, CODE, EXECUTE, VALIDATE, DECIDE	During all data operations
Batch Size Limits	Maximum 1-2 transformations per execution cycle to prevent error accumulation	During data stages
STOP Conditions	Automatic pause when data quality thresholds are breached	Throughout execution
Version Control	Every file revision is saved separately -- nothing is ever overwritten	All stages
Plan-Checker Validation	Automated 6-dimension validation of the Plan before execution begins	Before execution
Citation Tracking	Tracks and attributes citations for data sources, methods, software, and reporting standards as each script executes -- best-effort, not guaranteed	Throughout execution and report generation

That is a substantial amount of automated quality control. It means that the majority of operational errors -- wrong data types, broken joins, corrupted files, missing columns, data loss during transformation -- will be caught before you ever see the results.

What Requires Your Judgment

Automated validation cannot assess everything. Here is what still requires a human researcher with domain expertise:

Formulating the right question -- Is this a good question? Is it rooted in reasonable assumptions? DAAF is thoughtful and will likely push back on strange assumptions, but it will also back down if you ask it to. You need to be the final say in what is worth investigating.
Methodological appropriateness -- Is the statistical method right for this research question and data structure? DAAF will choose a method and justify its choice, but the justification might be plausible-sounding without being correct. If you have strong priors about methodology, bring them to the Plan review.
Substantive interpretation -- DAAF will report that "enrollment declined by 12%," but it cannot tell you whether that decline is policy-relevant, expected, or alarming. It cannot contextualize findings within the broader policy landscape or institutional realities you may know about. That is your job.
Causal claims -- DAAF is designed to be careful about causal language, but LLMs can drift into causal framing even with guardrails. Scrutinize any finding that implies causation -- especially in observational data, which is all that DAAF currently works with.
Data source appropriateness -- DAAF knows a lot about the technical properties of each dataset, but it may not know that a particular data source has known quality issues in a specific year for a specific state. Your contextual knowledge matters.
Sufficiency for your use case -- DAAF can tell you the suppression rate is 28% and that this is within its acceptable bounds. Whether 28% suppression is acceptable for your specific use case -- exploratory analysis vs. a finding that will inform a policy decision -- is a judgment call only you can make.
Ethical considerations -- DAAF does not assess the ethical dimensions of your analysis. If you are working with data that involves vulnerable populations, politically sensitive topics, or potential for misuse of findings, those considerations are entirely your responsibility.

Monitoring DAAF's Internal Reference Loading

There is one oversight responsibility that is easy to overlook because it is about DAAF's own internal mechanics: making sure DAAF actually loaded the reference files and skills it was supposed to load.

LLMs are non-deterministic, and DAAF's reference loading is orchestrated by an LLM. This means that occasionally -- not often, but not never -- an agent will proceed without loading a skill it was instructed to load, or the orchestrator will skip a reference file it was supposed to read. When this happens, the agent falls back on its general training, which produces output that looks correct but is built on plausible inference rather than curated knowledge.

Verbose output is your primary monitoring tool. When you set Verbose output to True in /config (which you should -- it is a required configuration setting), you can see the internal thought process informing the file reads that DAAF's agents make. Here is what to watch for:

Signs that something may not have loaded:

An agent explicitly mentions wanting to load something but then never does
An agent makes confident claims about variable names, API endpoints, or coded values that you cannot find in the actual data or documentation
An agent writes code that uses variable names or data structures that don't match what the data source actually provides
You see an agent proceed directly to writing code without any visible skill or reference loading in the verbose output
Error messages about unexpected columns, missing variables, or failed API calls -- these often indicate the agent was working from hallucinated rather than loaded specifications

What to do: Ask DAAF to verify ("Can you check whether the agent actually loaded the CCD skill before writing that script?"), request a re-run of that specific step with explicit instructions to load the relevant skill, check the script execution logs (failed scripts with KeyError or unexpected empty results often point back to a loading failure), or check session logs to verify reference loading sequences -- DAAF's built-in session log viewer was designed in part to help users monitor exactly whether and when DAAF loads proper reference files. DAAF's dual-layer validation (CP checkpoints + QA code review) will catch many loading failures downstream, but catching them early saves time and prevents cascading revisions.

When and How to Request Revisions

One of DAAF's most practically useful features is the ability to revise and extend completed analyses without starting from scratch. The version control system means every revision creates new files alongside the originals -- nothing gets lost.

Types of Revisions

Quick adjustments (usually straightforward) -- Changing a filter value ("exclude schools with enrollment < 50 instead of < 100"), updating year ranges, changing visualization details ("use a bar chart instead of a line chart"), adjusting the report framing.
Moderate changes (may require re-running some stages) -- Adding a new breakdown dimension ("also break down by urbanicity"), adding a control variable to a regression, switching from one poverty measure to another, adding a robustness check.
Major changes (close to starting over -- consider a new project) -- Changing the unit of analysis (schools to districts), switching the primary data source entirely, fundamentally changing the research question, changing the statistical methodology (from descriptive to causal inference).

Framing Revision Requests

When requesting a revision, include:

Which project -- by title, date, or both. "The Texas poverty analysis from 2026-02-10" or "the CRDC discipline study."
What specifically to change -- the more precise, the better. "Change the enrollment threshold from 50 to 100" is better than "adjust the enrollment filter."
Why (if it is not obvious) -- "I realized virtual schools are skewing the enrollment trends" helps DAAF understand the intent, not just the mechanics.
Downstream expectations -- if you know the change should affect later stages, say so. "Re-run the regression after updating the filter" tells DAAF that you want the downstream analysis updated, not just the data cleaning step.

New Version vs. New Project

New version (same project folder, new suffixed files): the core research question stays the same, you are refining or extending the existing analysis, and the revision builds on the existing Plan's logic.

New project (new project folder, fresh start): the research question is fundamentally different, you are switching to entirely different data sources, or the unit of analysis has changed.

A useful rule of thumb: if the existing Plan would need more than 50% of its transformation sequence rewritten, you are probably better off with a new project. When you are on the fence, DAAF will offer its assessment.

Appropriate vs. Inappropriate Use Cases

DAAF is still in active development, and there is only so much that can be done to check guardrails and test robustness at this stage. It is important to be transparent about what that means in practice.

Appropriate Uses

Exploratory analysis with expert oversight -- You have a research question, you want to see what the data shows, you have a good sense of what to expect, and you are prepared to critically evaluate the results. This is the sweet spot.
Learning and skill-building -- DAAF is excellent for learning how datasets work, what variables are available, and how data pipelines are constructed. Even if you never use DAAF's outputs directly, working with the system teaches you things about the data.
Rapid prototyping -- You need to quickly test whether an analysis direction is viable before investing significant manual effort.
Scaling established methodologies -- You have already done this kind of analysis manually and know what correct output looks like. DAAF lets you run the same analysis across more states, more years, or more breakdowns than you could do alone.
Demonstrating AI-assisted research patterns -- Useful for showing colleagues, students, or stakeholders what rigorous AI-assisted research can look like -- and what guardrails it requires.
Replication-style exercises -- Running DAAF against questions where published answers already exist is an excellent way to evaluate both DAAF's capabilities and its limitations.

Uses Requiring Extensive Additional Validation

Policy-informing analysis -- DAAF's output should be treated as a starting point that requires thorough independent verification. Every finding should be checked against known benchmarks, and the methodology should be reviewed by someone with deep domain expertise.
Publication-adjacent work -- DAAF can accelerate data preparation and exploratory analysis, but the analytical decisions, robustness checks, and interpretation must be held to the standard of your target venue.
Cross-dataset analyses involving complex joins -- DAAF handles joins reasonably well for well-documented datasets, but joins between datasets with different geographic units, different year definitions, or ambiguous key relationships require careful human scrutiny.

Never Appropriate

High-stakes decisions based solely on AI outputs -- Never use DAAF's results as the sole basis for decisions that significantly affect people -- resource allocation, program elimination, individual assessments, legal proceedings. Always have qualified humans independently verify any findings that will drive consequential decisions.
Analysis presented without AI disclosure -- If you use DAAF to produce analysis, you should disclose the role of AI assistance in your work. Transparency is non-negotiable. DAAF is designed to make this easy by documenting exactly what it did, but the responsibility to disclose is yours.
Generating results to confirm a predetermined conclusion -- DAAF is designed to follow the data. Using it to manufacture support for a conclusion you've already reached undermines the entire framework.

The following sections cover technical tools that DAAF uses behind the scenes. You may not need these day-to-day, but they're here when you do.

Using Git Version Control

DAAF produces a lot of files and does a lot of things at once. Getting comfortable with Git for version control is strongly recommended -- this type of work with LLMs benefits immensely from having a full audit log of file edits and changes at all times, with the ability to roll back changes and identify issues quickly.

Making a private "fork" (your own copy) of the DAAF repository to work in and back up research files to is a good practice (DAAF by default will NOT back up Parquet data files to avoid accidentally sharing data to the cloud). If you want a GitHub backup for your work, ask Claude how to make your own repository and save to it accordingly.

If Git is new to you, try asking Claude to explain the basics. Good starting questions:

What does it mean to make a fork of a GitHub repo?
What does Git actually do, and why is it useful?
What's a commit? What does it do?
How can I track changes in DAAF using Git?
What tools can make this whole process easier?

Useful Git Commands

git diff HEAD~1               # See exactly what changed in the last session
git log --oneline -10         # See the 10 most recent commits in a compact format
git stash                     # Temporarily set aside uncommitted changes
git stash pop                 # Bring stashed changes back

git diff HEAD~1 is great for reviewing what DAAF produced overnight or after a long run -- it shows every file that was added, modified, or deleted, with specific changes highlighted. git stash / git stash pop is useful for experimenting with something (like testing a different analysis approach) without committing to it.

By default, DAAF's agents do not make Git commits on their own -- your working tree, with every preserved script version, is already the complete audit trail. git log --oneline -10 becomes useful once you commit your work manually, or once you turn on the optional "Git commit management" preference, which has DAAF suggest commits at natural milestones and ask before making them.

Using the Browser-Based Code Editor

Having a good file editor is essential for working with DAAF. DAAF ships with a built-in browser-based code editor (code-server -- VS Code in the browser) that handles file browsing, Markdown preview, Git tracking, and cross-file search with zero installation on your host machine. To launch it:

bash run_vscode.sh              # macOS / Linux
.\run_vscode.ps1                # Windows

Then open http://localhost:2720 in your browser. The password is displayed in the terminal output (default: daaf). The editor comes pre-loaded with extensions for Python and R syntax highlighting, Markdown preview, Git history visualization, CSV viewing, and folder compression for easy downloads.

Markdown preview -- Right-click any .md file and select "Open Preview" (or press Shift+Ctrl+V) to see rendered reports and plans with proper formatting.
File management -- Drag and drop files -- or whole folders -- from your computer into the file explorer sidebar to import them into the Docker volume (e.g., a dataset you want to profile). To get files back out, right-click a file and choose Download; for a whole folder, right-click it, choose Compress → zip, then download the resulting .zip.
Git integration -- The Source Control panel shows uncommitted changes and lets you view diffs -- the most direct way to review exactly what DAAF produced during a session (and, if you've enabled the optional "Git commit management" preference, browse commit history too).
Search across files -- Ctrl+Shift+F (Cmd+Shift+F on Mac) searches across all files -- great for finding specific variables, scripts, or content.

Alternative: Desktop VS Code with Dev Containers

If you already have VS Code installed and prefer a native desktop experience, install the Dev Containers extension and use "Attach to Running Container" to open the DAAF container's filesystem directly.

There are also similar alternatives designed to be a bit more teched-up with coding agents built in (e.g., Cursor). Your mileage may vary -- find an interface that works for you and your workflow.

Safety with Claude Code

Before we get into specifics: DAAF ships with a layer of guardrails -- built into its permission rules and safety hooks -- that block outright destructive commands, protect your credentials, and keep every change Claude makes visible and auditable. You can't easily wreck your own work or your system by accident, and the rest of this section explains what that protection covers and where your own judgment still matters.

Claude Code is extremely powerful and capable -- which is useful when it is doing what you want, but also means expanded risk when it operates erratically or is manipulated by bad actors. DAAF uses Docker in part to protect users directly, and packages guardrails into its hooks and permission files as well.

The main recommendation is to not let Claude Code run fully unattended. Check back on it periodically, even if only to spot-check what it is outputting and reporting. Letting it go completely unsupervised for long periods of time is asking for trouble, even if problems are rare in practice.

Three primary attack surfaces to be aware of:

Malicious content hidden in data -- Unvetted data files or documentation could contain instructions that cause Claude to act erratically. DAAF mitigates this through structured data handling, but always be cautious with data from untrusted sources.
Prompt injection via online research -- If you ask DAAF to conduct deep research online, the websites it searches could contain malicious prompt-injection instructions designed to manipulate the AI's behavior.
Compromised project files -- Hidden, malicious code or instructions sneaked into the project documentation. All edits and changes to the DAAF project are thoroughly vetted and reviewed for the benefit of all users.

The first two are the user's responsibility: be thorough and thoughtful about what you have Claude read, do, and search on your behalf. DAAF's permission rules and safety hooks are designed to block manipulation at the system level, regardless of what the prompt says, and the structured workflow and validation checkpoints help catch outputs that don't match the data.

A note on data privacy: All computation happens locally on your machine, and DAAF prevents Claude from bulk-uploading your data files. However, analytical output (sample rows, summary statistics, diagnostic results) does transit through Anthropic's servers as part of the conversation -- that's how Claude Code works. If you're working with private, proprietary, or regulated data (FERPA, HIPAA, etc.), the implications depend on your specific Anthropic license and access method. It's your responsibility to understand these nuances before using DAAF with non-public data.

If your data truly can't leave your environment, you're not out of options: DAAF includes a built-in synthetic-data workflow for exactly this case. You profile your sensitive data locally with a disclosure-controlled script, share only a summary report, and DAAF builds a realistically-shaped synthetic stand-in you can develop all your analysis code against -- then you run that finished, vetted code against the real data yourself, where it lives. The real data never enters the container. It's a code-development scaffold, not an analytic substitute or a formal privacy guarantee: your data-governance and disclosure obligations remain your own to meet.

If You Write or Paste Your Own Code

DAAF's safety layer is tuned around how its own agents work, so a few conventions are worth knowing if you drop in your own scripts or paste code for Claude to run:

Keep temporary files inside the project. Write scratch and intermediate files to scripts/scratch/ in your project folder, never to /tmp -- only the project tree is inside DAAF's backup boundary and audit trail, so anything written to /tmp is invisible to backups and can vanish silently.
Run one shell command at a time. DAAF's guardrails evaluate each command on its own, so chaining several together with &&, ;, or || is blocked -- split the steps into separate commands instead.
Add new packages through the Dockerfile, not at runtime. A pip install or install.packages() run mid-session disappears on the next rebuild and quietly breaks reproducibility, so the durable path is to add the package to the Dockerfile and rebuild. See Extending DAAF for the full walkthrough.

Tips for Data Onboarding

Before You Start

Have your data file ready in a common tabular format (CSV, TSV, Parquet, or Excel). Parquet (a compressed columnar data format) is preferred for speed and type preservation, but any of these work.
Gather any documentation -- codebooks, data dictionaries, README files, methodology papers. Providing documentation lets DAAF cross-check what the documentation says against what the data actually shows, catching discrepancies that could trip you up later.
Know your data's provenance -- where it came from, when you downloaded it, and what it covers. DAAF records this in the skill for future reference.
If your data comes from an API, have the API key set up in your environment before starting. DAAF will write a reproducible fetch script, but it needs the key to test the download.

During the Process

The interpretation review is the most important checkpoint. When DAAF presents its preliminary interpretations, take the time to carefully confirm, reject, or modify each one. These interpretations become the foundation of the skill that all future analyses will rely on.
Don't worry about getting everything perfect. The skill is a living artifact -- you can refine it later using Framework Development mode as you discover more about the data through actual use.
Flag priority columns if you know which ones matter most for your research. DAAF will give them extra attention during profiling.

For a detailed step-by-step guide to adding your own data, see Extending DAAF: Data Onboarding.

← Previous: Understanding DAAF Next: Extending DAAF →