- Write clear, scoped prompts -- describe what you want to learn, not exactly how to do it
- Review the research plan carefully -- it's your last chance to shape the analysis before execution begins
- Never skip human review -- DAAF is a powerful assistant, not an autonomous system you can walk away from
- Check the validation results -- they tell you whether each step passed, flagged warnings, or hit a critical problem
- Frame revision requests around your research goal -- not just the technical fix
- Know the boundaries -- DAAF is best for exploratory research with expert oversight, not for high-stakes decisions without independent verification
Writing Effective Prompts
When you start a DAAF analysis, you type a research request -- called a "prompt" -- describing what you want to investigate. This is the single most impactful thing you can do to improve the quality of what DAAF produces. I realize that "write better prompts" has become almost cliche advice at this point, but I want to be very concrete here about what that actually means in practice -- because the specifics really matter for a structured research system like DAAF. (For background on how DAAF manages context and why prompt quality matters technically, see Understanding DAAF.)
A good request helps steer the analysis in the right direction and enhances the likelihood of DAAF doing what you really want. When framing a research request, be specific along these dimensions:
- Geography: Are you interested in a particular state, region, or just nation-wide? Or all of the above, separately? Be specific.
- Time period: Which years? Something specific like "2018-2022" is ideal, knowing that it may need to adjust based on data availability. "The past few years" works but is vague and encourages DAAF to make assumptions you might not agree with. Explicit is better.
- Data granularity: Are you interested in individual schools, school districts, or colleges/universities? This determines which datasets DAAF reaches for.
- Analysis focus: What relationship, trend, or comparison are you trying to understand? "The relationship between poverty and enrollment" is much more actionable than "general socioeconomics."
- Methodologies: What types of analytical methodologies do you think will be most relevant and useful for this analysis? Geospatial? Supervised machine learning? Basic descriptive analyses? Being clear about this will help direct DAAF to the right resources internally for better consultation results.
- Priorities: What matters most to you about this analysis? If it has to make trade-offs, what should go first? Every analysis involves complicated decision-making, so giving DAAF more insight here helps it align with what you want.
- Desired insights: What are you really trying to say, or learn, or do with the data analysis? A sense of your goals will also help DAAF make better decisions.
You do not need to know the exact dataset names, variable codes, or statistical methods. If you know them, great, but if not, that's fine -- that is genuinely part of what DAAF is designed to handle rigorously. What you do need to provide is a clear enough picture that DAAF can make intelligent decisions about those things -- decisions you'll then review and approve before anything gets executed.
Reviewing the Plan Before Execution
The research plan (Plan.md) is arguably the most important document DAAF produces. It is your last chance to shape the entire analysis before any data is fetched, any code is written, or any computation is spent. I cannot overstate this: time spent carefully reviewing the research plan is the single highest-leverage activity in the entire DAAF workflow.
After DAAF creates the Plan and validates it internally (via the plan-checker agent), it will present a Phase Status Update (PSU2) that summarizes the Plan and gives you the exact file path to read it. Read the actual file. The PSU2 summary is helpful but it is a summary -- the full Plan.md contains critical details about methodology, risk, and scope that the summary necessarily condenses.
A companion file, Plan_Tasks.md, contains the detailed machine-readable task definitions that DAAF uses to execute each step. It is available for auditing specific task definitions if you want to inspect the exact transformation sequence, dependencies, and file paths.
Five Key Sections to Review
- Research Question -- Does the stated research question match what you actually asked? Misinterpretation happens, especially when the original request was somewhat open-ended. If the research question has been narrowed or reframed in a way that does not match your intent, flag it now.
-
Research Outcomes (Must-Haves) -- These define what the analysis must rigorously investigate and report on -- not what the answer should be. Good outcomes specify the measurement required without pre-determining the direction of the finding. There should be at least 3 research outcomes. If any read like hypotheses (predicting a specific result), push back -- those belong in the optional Hypotheses section.
Good outcome "Year-over-year enrollment change from 2018-2022 for Texas charter vs. traditional public schools is measured and characterized (direction, magnitude, significance)"Bad -- this is a hypothesis, not an outcome "Charter schools show higher enrollment growth than traditional public schools"Also bad -- too vague to verify "Analysis is comprehensive and covers enrollment trends"
- Transformation Sequence (in Plan_Tasks.md) -- The step-by-step execution plan. Look for: Does the sequence make logical sense? Are join keys specified (which columns, what kind of join)? Are file paths explicit, not placeholders? Do the verification criteria have concrete thresholds rather than "data looks correct"?
- Risk Register -- The Plan should identify at least one risk with a mitigation strategy. Common risks include data suppression reducing sample size, COVID-era data gaps, coded value changes across years, and join key mismatches. A Plan with zero identified risks is a red flag, not a sign of confidence.
- Data Sources and Year Ranges -- Are the right datasets being used? Are the years appropriate? Pay particular attention to known data gaps (e.g., COVID disruptions in 2020-2021) and whether the geographic scope matches your intent.
Red Flags to Watch For
| Red Flag | What It Might Mean | What to Do |
|---|---|---|
| Research Outcomes are vague, subjective, or confirmatory | Final verification will not be rigorous | Ask for more specific, measurable outcomes |
| No risks identified | Plan may be overconfident | Ask about suppression rates (where data values are hidden to protect individual privacy -- common in education data), data gaps, and how datasets will be combined |
| Placeholder file paths | Plan may not be fully specified | Ask DAAF to complete the paths before proceeding |
| Very large scope | Analysis may run very long and incur high API costs | Consider narrowing scope first |
| No description of how datasets will be combined | When DAAF merges two datasets, rows can accidentally be duplicated or dropped if the merge logic is wrong | Ask DAAF how many rows to expect after combining the datasets, and whether any records will be lost |
| No mention of suppressed or missing data | Plan may not account for data quality realities | Ask about expected rates of hidden or missing values and how they will be handled |
| Statistical method seems inappropriate | May not match data structure or research question | Ask DAAF to justify its methodological choice |
How to Request Changes
When you want to change something in the Plan, be specific about what and why, while leaving room for discussion:
"Can we change the year range to 2019-2022 instead of 2016-2022? I want to avoid pre-ESSA data."
"I think we should add urbanicity as a control variable in the regression. The poverty-enrollment relationship likely differs significantly between urban and rural schools, right?"
"The research outcome about suppression rates should specify a threshold -- I'd say that suppression rates below 30% are acceptable for proceeding."
"I do not think OLS regression is the right approach here given the panel structure of the data. Can you consider a fixed-effects model instead?"
What is easy to change at this stage: year ranges, geographic scope, control variables, output format, research outcome language, risk register additions, file naming.
What requires more thought: statistical methodology changes, adding or removing data sources, changing the unit of analysis (e.g., from schools to districts), fundamentally restructuring the transformation sequence.
When in doubt, just tell DAAF what you are thinking. It will let you know if the change is straightforward or if it requires a more significant Plan revision.
Interpreting Validation Checkpoints and STOP Conditions
DAAF runs a lot of validation -- the core philosophy is "every transformation has a validation, no exceptions." But as a user, you do not need to understand every internal check. What you need to understand is: what the results mean and when you need to act.
Understanding Checkpoint Results (CP1-CP4)
DAAF has four primary validation checkpoints embedded directly in its code scripts. These run automatically during execution and check for operational problems -- things like empty data, corrupted values, suppression (where data values are hidden to protect individual privacy -- common in education data), or data loss.
CP1: Post-Fetch Validation -- "Did we get the data we expected?"
Runs right after DAAF downloads data from a source. Checks whether data actually came back, whether expected columns are present, whether data types are correct, and what the missingness rate is for critical fields.
- PASSED -- The data arrived, has the expected structure, and critical fields are mostly populated.
- FAILED -- Something fundamental is wrong -- the data source returned nothing, critical columns are missing, or more than 90% of a critical field is null. DAAF will stop and explain the problem. Options typically include trying a different data source, adjusting the scope, or acknowledging a limitation and proceeding with caution.
CP2: Post-Cleaning Validation -- "Is the cleaned data usable?"
Runs after DAAF has processed the raw data -- filtering out coded values (like -1 for "missing," -2 for "not applicable"), handling suppression, and applying data quality rules.
- PASSED -- Cleaning worked as expected; remaining data is sufficient for analysis.
- WARNING -- Suppression rates are elevated (typically 30-50%). Enough data remains for analysis, but results may be less precise, particularly for subgroup breakdowns. DAAF will document this but proceed.
- FAILED -- Suppression exceeds 50%, meaning more than half the data is missing or suppressed. DAAF will stop -- analysis on data with >50% suppression is generally unreliable. You will need to narrow scope, change data source, or acknowledge this as a fundamental limitation.
CP3: Post-Transformation Validation -- "Did the data transformation do what we intended?"
Runs after every join, aggregation, or derived-variable calculation. Checks whether row counts changed as expected, whether there are new unexpected null values, and whether derived variables have reasonable distributions.
- PASSED -- The transformation produced expected results. Row counts, null patterns, and distributions look reasonable.
- FAILED -- Something went wrong -- row counts dropped by more than 90%, a join produced unexpected nulls, or derived values are clearly incorrect. DAAF will stop and investigate.
CP4: Pre-Output Validation -- "Does the final output meet our commitments?"
Runs during the synthesis phase, checking the complete output against what the Plan promised. Validates that all required columns are present, all promised output files exist (figures, analysis results, report), all Research Outcomes are rigorously addressed, and the report has all required sections.
- PASSED -- Everything the Plan committed to investigate has been rigorously addressed.
- FAILED -- Something is missing -- a figure was not generated, a report section is incomplete, or a Research Outcome was not addressed. DAAF will identify the gap and attempt to resolve it.
When DAAF Stops and Asks for Guidance
STOP conditions are moments when DAAF pauses execution and escalates to you. This is a good thing -- it means the system is working as intended. DAAF does not power through problems silently. When it stops, it will present the issue in a structured format: what happened, what it tried, your options (with pros and cons), and its recommendation.
| STOP Condition | What Happened | Your Options |
|---|---|---|
| Empty data returned | The data source had no data for your query | Adjust scope, try different source, or acknowledge limitation |
| Suppression >50% | More than half the data is suppressed or missing | Narrow geography, reduce subgroups, use different measure |
| Row loss >90% | A transformation (join, filter) dropped most rows | Check join keys, verify filter logic, adjust criteria |
| Cross-state assessment comparison | You asked to compare test scores across states | Reframe question (within-state trends are valid) |
| QA BLOCKER after 2 revisions | Code review found a problem that could not be resolved in 2 attempts | Guide DAAF's approach, simplify the task, or accept limitation |
| Data unavailable | The dataset does not exist for your scope | Choose a different data source or adjust scope |
You do not need to have a solution -- you just need to tell DAAF which direction to go. "Try option 2" or "Let us narrow to just California and see if that helps" are perfectly fine responses.
QA Findings: BLOCKER vs. WARNING vs. INFO
In addition to the CP checkpoints, every script DAAF writes gets independently reviewed by a separate code-reviewer agent -- an adversarial reviewer whose job is to find problems the original code might have missed. The reviewer classifies findings by severity:
- INFO -- An observation that does not indicate a problem but is worth noting. Example: "The dataset has 47 states represented instead of 50, which is expected given the query filters." You will generally not see these unless you dig into the QA scripts.
- WARNING -- A potential issue that does not block progress but should be documented. Example: "Suppression rate for rural schools is 38%, which may limit subgroup analysis precision." Warnings are accumulated and presented to you at the Phase Status Update after analysis. They do not stop execution, but they flag things you should consider when interpreting results.
- BLOCKER -- A genuine problem that must be fixed before proceeding. Example: "The join produced 40% more rows than expected, indicating a many-to-many join where a many-to-one was specified." Blockers trigger a revision cycle -- DAAF will attempt to fix the script (up to 2 attempts) and re-submit it for review. If the blocker persists after 2 fix attempts, DAAF escalates to you.
The key thing to understand is that DAAF catches and resolves most issues automatically. The vast majority of QA findings are INFO or WARNING. You only hear about BLOCKERs that could not be resolved, and those are rare.
Reviewing Notebooks, Reports, and Script Logs
When a Full Pipeline analysis completes, you receive several artifacts. Here is how to actually read and evaluate each one -- and just as importantly, what to look at first.
Where to Start
- Report -- Start here for the big picture. Does the narrative make sense? Do the findings answer your research question?
- Figures -- Look at the visualizations referenced in the report. Do they show what the report claims they show?
- Plan.md and STATE.md -- Skim Plan.md for methodology and key decisions. Check STATE.md for the Final Review Log and QA Findings Summary to see if DAAF flagged any deviations or concerns.
- Notebook -- Dive into specific stages if you want to verify how a particular result was derived.
- Script logs -- Go here for the deepest level of detail on any specific step.
You do not need to read everything in detail every time. The report is the synthesis; the notebook is the evidence; the scripts are the primary source. Go as deep as you need to based on how much you trust the results and how high-stakes the analysis is.
Tip: Before diving into individual artifacts, consider browsing the session visually using the DAAF Log Explorer. Run bash view_logs.sh (or .\view_logs.ps1 on Windows) from your daaf-docker folder to see an interactive timeline of every orchestrator action, subagent dispatch, and tool call. This gives you a high-level map of the entire session, making it easier to identify which stages or scripts deserve closer inspection.
Reading the Report
The report follows a standard structure (Executive Summary, Key Findings, Data & Methodology, Limitations, etc.). Focus on:
- Key Findings -- Are findings genuinely supported by the data? Look for specificity -- "Enrollment declined by 12% between 2019 and 2022" is verifiable. "Enrollment showed interesting trends" is not.
- Limitations section -- Often the most important section. DAAF is instructed to be candid about limitations, suppression rates, data gaps, and caveats. If the limitations section is suspiciously short or generic, that is a red flag -- not because DAAF is hiding something, but because the system may not have adequately identified the limitations.
- Figure references -- The report should reference specific figures by filename. Verify that the referenced figures exist and actually show what the report says they show.
- References -- The report includes a References section with up to four subsections: data sources, methodological references, software & tools, and reporting standards. DAAF tracks these automatically as each script executes, but citations can be wrong, incomplete, or missing -- verify that the right methods and tools are credited and that the citations are accurate.
Reading the Notebook
The marimo notebook is a walkthrough tool -- it assembles the actual scripts that were executed (verbatim, not rewritten) alongside their execution logs. What to look for:
- Execution logs that show warnings or unexpected values. Expand the accordions and scan for anything that looks off.
- Row counts at each stage. You should be able to trace the data from raw (usually large) to processed (usually smaller after filtering) to analysis (potentially larger or smaller depending on joins). Dramatic unexpected changes in row count deserve investigation.
- Validation results. Each script includes embedded validation. Look for CP status: PASSED, WARNING, or FAILED.
The notebook compiles scripts -- it does not create new analysis code. You will not see any transformations without embedded execution logs.
Reading Script Execution Logs
Every script file in the scripts/ directory has its execution log appended to the end of the file as comments. The execution log includes start/end timestamps, exit code (0 = success, non-zero = failure), stdout (everything the script printed during execution), and stderr (any warnings or errors).
If a script failed, you will also find versioned revisions: 01_fetch-ccd.py (original with its failed log), 01_fetch-ccd_a.py (first revision), 01_fetch-ccd_b.py (second revision if the first fix did not work). The notebook only includes the final successful version, but all versions are preserved in the scripts/ directory for audit trail purposes.
Reading QA Review Scripts
The scripts/cr/ directory contains the code-reviewer's inspection scripts for each stage, named by convention (e.g., stage5_01_cr1.py, stage7_02_cr2.py). These scripts contain the adversarial checks that the code-reviewer ran, along with their results. You generally do not need to read these unless you are investigating a specific concern -- but they are there for full transparency.
Human Oversight Responsibilities
DAAF is not an oracle. It is not an autonomous research system that you can walk away from and trust to get things right. It is not "fire-and-forget." Yes, it is a very powerful -- and sometimes surprisingly thorough -- assistant that operates under strict guardrails. But it is still an LLM-based system, which means it is fundamentally susceptible to the same limitations as all LLM systems: hallucination, sycophancy, over-confidence, and subtle logical errors that look plausible on the surface.
What makes DAAF different from using Claude (or any LLM) ad-hoc is the sheer volume of structured verification layered into the process. But those layers of verification do not eliminate the need for human judgment. They reduce the surface area of what you need to verify, and they make verification easier by giving you organized, traceable artifacts. That is the exoskeleton metaphor: DAAF amplifies your expertise, but your expertise is still the thing doing the real work.
What DAAF Validates Automatically
These safeguards run without your involvement throughout the pipeline:
| Safeguard | What It Does | Where It Happens |
|---|---|---|
| Primary Checkpoints (CP1-CP4) | Validates data at fetch, clean, transform, and output stages -- catches empty data, type errors, data loss, missing outputs | Embedded in every script |
| Secondary QA (QA1-QA4b) | Independent adversarial review of every script by a separate code-reviewer agent | After every script execution |
| Iteration Protocol | Forces every transformation into small steps: DESCRIBE, CODE, EXECUTE, VALIDATE, DECIDE | During all data operations |
| Batch Size Limits | Maximum 1-2 transformations per execution cycle to prevent error accumulation | During data stages |
| STOP Conditions | Automatic pause when data quality thresholds are breached | Throughout execution |
| Version Control | Every file revision is saved separately -- nothing is ever overwritten | All stages |
| Plan-Checker Validation | Automated 6-dimension validation of the Plan before execution begins | Before execution |
| Citation Tracking | Tracks and attributes citations for data sources, methods, software, and reporting standards as each script executes -- best-effort, not guaranteed | Throughout execution and report generation |
That is a substantial amount of automated quality control. It means that the majority of operational errors -- wrong data types, broken joins, corrupted files, missing columns, data loss during transformation -- will be caught before you ever see the results.
What Requires Your Judgment
Automated validation cannot assess everything. Here is what still requires a human researcher with domain expertise:
- Formulating the right question -- Is this a good question? Is it rooted in reasonable assumptions? DAAF is thoughtful and will likely push back on strange assumptions, but it will also back down if you ask it to. You need to be the final say in what is worth investigating.
- Methodological appropriateness -- Is the statistical method right for this research question and data structure? DAAF will choose a method and justify its choice, but the justification might be plausible-sounding without being correct. If you have strong priors about methodology, bring them to the Plan review.
- Substantive interpretation -- DAAF will report that "enrollment declined by 12%," but it cannot tell you whether that decline is policy-relevant, expected, or alarming. It cannot contextualize findings within the broader policy landscape or institutional realities you may know about. That is your job.
- Causal claims -- DAAF is designed to be careful about causal language, but LLMs can drift into causal framing even with guardrails. Scrutinize any finding that implies causation -- especially in observational data, which is all that DAAF currently works with.
- Data source appropriateness -- DAAF knows a lot about the technical properties of each dataset, but it may not know that a particular data source has known quality issues in a specific year for a specific state. Your contextual knowledge matters.
- Sufficiency for your use case -- DAAF can tell you the suppression rate is 28% and that this is within its acceptable bounds. Whether 28% suppression is acceptable for your specific use case -- exploratory analysis vs. a finding that will inform a policy decision -- is a judgment call only you can make.
- Ethical considerations -- DAAF does not assess the ethical dimensions of your analysis. If you are working with data that involves vulnerable populations, politically sensitive topics, or potential for misuse of findings, those considerations are entirely your responsibility.
Monitoring DAAF's Internal Reference Loading
There is one oversight responsibility that is easy to overlook because it is about DAAF's own internal mechanics: making sure DAAF actually loaded the reference files and skills it was supposed to load.
LLMs are non-deterministic, and DAAF's reference loading is orchestrated by an LLM. This means that occasionally -- not often, but not never -- an agent will proceed without loading a skill it was instructed to load, or the orchestrator will skip a reference file it was supposed to read. When this happens, the agent falls back on its general training, which produces output that looks correct but is built on plausible inference rather than curated knowledge.
Verbose output is your primary monitoring tool. When you set Verbose output to True in /config (which you should -- it is a required configuration setting), you can see the internal thought process informing the file reads that DAAF's agents make. Here is what to watch for:
Signs that something may not have loaded:
- An agent explicitly mentions wanting to load something but then never does
- An agent makes confident claims about variable names, API endpoints, or coded values that you cannot find in the actual data or documentation
- An agent writes code that uses variable names or data structures that don't match what the data source actually provides
- You see an agent proceed directly to writing code without any visible skill or reference loading in the verbose output
- Error messages about unexpected columns, missing variables, or failed API calls -- these often indicate the agent was working from hallucinated rather than loaded specifications
What to do: Ask DAAF to verify ("Can you check whether the agent actually loaded the CCD skill before writing that script?"), request a re-run of that specific step with explicit instructions to load the relevant skill, check the script execution logs (failed scripts with KeyError or unexpected empty results often point back to a loading failure), or check session logs to verify reference loading sequences -- DAAF's built-in session log viewer was designed in part to help users monitor exactly whether and when DAAF loads proper reference files. DAAF's dual-layer validation (CP checkpoints + QA code review) will catch many loading failures downstream, but catching them early saves time and prevents cascading revisions.
When and How to Request Revisions
One of DAAF's most practically useful features is the ability to revise and extend completed analyses without starting from scratch. The version control system means every revision creates new files alongside the originals -- nothing gets lost.
Types of Revisions
- Quick adjustments (usually straightforward) -- Changing a filter value ("exclude schools with enrollment < 50 instead of < 100"), updating year ranges, changing visualization details ("use a bar chart instead of a line chart"), adjusting the report framing.
- Moderate changes (may require re-running some stages) -- Adding a new breakdown dimension ("also break down by urbanicity"), adding a control variable to a regression, switching from one poverty measure to another, adding a robustness check.
- Major changes (close to starting over -- consider a new project) -- Changing the unit of analysis (schools to districts), switching the primary data source entirely, fundamentally changing the research question, changing the statistical methodology (from descriptive to causal inference).
Framing Revision Requests
When requesting a revision, include:
- Which project -- by title, date, or both. "The Texas poverty analysis from 2026-02-10" or "the CRDC discipline study."
- What specifically to change -- the more precise, the better. "Change the enrollment threshold from 50 to 100" is better than "adjust the enrollment filter."
- Why (if it is not obvious) -- "I realized virtual schools are skewing the enrollment trends" helps DAAF understand the intent, not just the mechanics.
- Downstream expectations -- if you know the change should affect later stages, say so. "Re-run the regression after updating the filter" tells DAAF that you want the downstream analysis updated, not just the data cleaning step.
New Version vs. New Project
New version (same project folder, new suffixed files): the core research question stays the same, you are refining or extending the existing analysis, and the revision builds on the existing Plan's logic.
New project (new project folder, fresh start): the research question is fundamentally different, you are switching to entirely different data sources, or the unit of analysis has changed.
A useful rule of thumb: if the existing Plan would need more than 50% of its transformation sequence rewritten, you are probably better off with a new project. When you are on the fence, DAAF will offer its assessment.
Appropriate vs. Inappropriate Use Cases
DAAF is still in active development, and there is only so much that can be done to check guardrails and test robustness at this stage. It is important to be transparent about what that means in practice.
Appropriate Uses
- Exploratory analysis with expert oversight -- You have a research question, you want to see what the data shows, you have a good sense of what to expect, and you are prepared to critically evaluate the results. This is the sweet spot.
- Learning and skill-building -- DAAF is excellent for learning how datasets work, what variables are available, and how data pipelines are constructed. Even if you never use DAAF's outputs directly, working with the system teaches you things about the data.
- Rapid prototyping -- You need to quickly test whether an analysis direction is viable before investing significant manual effort.
- Scaling established methodologies -- You have already done this kind of analysis manually and know what correct output looks like. DAAF lets you run the same analysis across more states, more years, or more breakdowns than you could do alone.
- Demonstrating AI-assisted research patterns -- Useful for showing colleagues, students, or stakeholders what rigorous AI-assisted research can look like -- and what guardrails it requires.
- Replication-style exercises -- Running DAAF against questions where published answers already exist is an excellent way to evaluate both DAAF's capabilities and its limitations.
Uses Requiring Extensive Additional Validation
- Policy-informing analysis -- DAAF's output should be treated as a starting point that requires thorough independent verification. Every finding should be checked against known benchmarks, and the methodology should be reviewed by someone with deep domain expertise.
- Publication-adjacent work -- DAAF can accelerate data preparation and exploratory analysis, but the analytical decisions, robustness checks, and interpretation must be held to the standard of your target venue.
- Cross-dataset analyses involving complex joins -- DAAF handles joins reasonably well for well-documented datasets, but joins between datasets with different geographic units, different year definitions, or ambiguous key relationships require careful human scrutiny.
Never Appropriate
- High-stakes decisions based solely on AI outputs -- Never use DAAF's results as the sole basis for decisions that significantly affect people -- resource allocation, program elimination, individual assessments, legal proceedings. Always have qualified humans independently verify any findings that will drive consequential decisions.
- Analysis presented without AI disclosure -- If you use DAAF to produce analysis, you should disclose the role of AI assistance in your work. Transparency is non-negotiable. DAAF is designed to make this easy by documenting exactly what it did, but the responsibility to disclose is yours.
- Generating results to confirm a predetermined conclusion -- DAAF is designed to follow the data. Using it to manufacture support for a conclusion you've already reached undermines the entire framework.
Using Git Version Control
DAAF produces a lot of files and does a lot of things at once. Getting comfortable with Git for version control is strongly recommended -- this type of work with LLMs benefits immensely from having a full audit log of file edits and changes at all times, with the ability to roll back changes and identify issues quickly.
Making a private "fork" (your own copy) of the DAAF repository to work in and back up research files to is a good practice (DAAF by default will NOT back up Parquet data files to avoid accidentally sharing data to the cloud). If you want a GitHub backup for your work, ask Claude how to make your own repository and save to it accordingly.
If Git is new to you, try asking Claude to explain the basics. Good starting questions:
- What does it mean to make a fork of a GitHub repo?
- What does Git actually do, and why is it useful?
- What's a commit? What does it do?
- How can I track changes in DAAF using Git?
- What tools can make this whole process easier?
Useful Git Commands
git diff HEAD~1 is great for reviewing what DAAF produced overnight or after a long run -- it shows every file that was added, modified, or deleted, with specific changes highlighted. git stash / git stash pop is useful for experimenting with something (like testing a different analysis approach) without committing to it.
Because DAAF automatically saves snapshots at key pipeline milestones, your work is preserved even across sessions.
Using the Browser-Based Code Editor
Having a good file editor is essential for working with DAAF. DAAF ships with a built-in browser-based code editor (code-server -- VS Code in the browser) that handles file browsing, Markdown preview, Git tracking, and cross-file search with zero installation on your host machine. To launch it:
Then open http://localhost:2720 in your browser. The password is displayed in the terminal output (default: daaf). The editor comes pre-loaded with extensions for Python syntax highlighting, Markdown preview, Git history, and CSV viewing.
- Markdown preview -- Right-click any
.mdfile and select "Open Preview" (or pressShift+Ctrl+V) to see rendered reports and plans with proper formatting. - File management -- Drag and drop files from your computer into the file explorer sidebar to import them into the Docker volume (e.g., a dataset you want to profile).
- Git integration -- The Source Control panel shows uncommitted changes, lets you view diffs, and browse commit history.
- Search across files --
Ctrl+Shift+F(Cmd+Shift+Fon Mac) searches across all files -- great for finding specific variables, scripts, or content.
Alternative: Desktop VS Code with Dev Containers
If you already have VS Code installed and prefer a native desktop experience, install the Dev Containers extension and use "Attach to Running Container" to open the DAAF container's filesystem directly.
There are also similar alternatives designed to be a bit more teched-up with coding agents built in (e.g., Cursor). Your mileage may vary -- find an interface that works for you and your workflow.
Safety with Claude Code
Claude Code is extremely powerful and capable -- which is useful when it is doing what you want, but also means expanded risk when it operates erratically or is manipulated by bad actors. DAAF uses Docker in part to protect users directly, and packages guardrails into its hooks and permission files as well.
The main recommendation is to not let Claude Code run fully unattended. Check back on it periodically, even if only to spot-check what it is outputting and reporting. Letting it go completely unsupervised for long periods of time is asking for trouble, even if problems are rare in practice.
Three primary attack surfaces to be aware of:
- Malicious content hidden in data -- Unvetted data files or documentation could contain instructions that cause Claude to act erratically. DAAF mitigates this through structured data handling, but always be cautious with data from untrusted sources.
- Prompt injection via online research -- If you ask DAAF to conduct deep research online, the websites it searches could contain malicious prompt-injection instructions designed to manipulate the AI's behavior.
- Compromised project files -- Hidden, malicious code or instructions sneaked into the project documentation. All edits and changes to the DAAF project are thoroughly vetted and reviewed for the benefit of all users.
The first two are the user's responsibility: be thorough and thoughtful about what you have Claude read, do, and search on your behalf. DAAF's permission rules and safety hooks are designed to block manipulation at the system level, regardless of what the prompt says, and the structured workflow and validation checkpoints help catch outputs that don't match the data.
Tips for Data Onboarding
Before You Start
- Have your data file ready in a common tabular format (CSV, TSV, Parquet, or Excel). Parquet (a compressed columnar data format) is preferred for speed and type preservation, but any of these work.
- Gather any documentation -- codebooks, data dictionaries, README files, methodology papers. Providing documentation lets DAAF cross-check what the documentation says against what the data actually shows, catching discrepancies that could trip you up later.
- Know your data's provenance -- where it came from, when you downloaded it, and what it covers. DAAF records this in the skill for future reference.
- If your data comes from an API, have the API key set up in your environment before starting. DAAF will write a reproducible fetch script, but it needs the key to test the download.
During the Process
- The interpretation review is the most important checkpoint. When DAAF presents its preliminary interpretations, take the time to carefully confirm, reject, or modify each one. These interpretations become the foundation of the skill that all future analyses will rely on.
- Don't worry about getting everything perfect. The skill is a living artifact -- you can refine it later using Framework Development mode as you discover more about the data through actual use.
- Flag priority columns if you know which ones matter most for your research. DAAF will give them extra attention during profiling.
For a detailed step-by-step guide to adding your own data, see Extending DAAF: Data Onboarding.