Your expertise, empowered by Claude

Scale Responsible, Rigorous, and Reproducible Research with AI

AI coding assistants can now write analytic code, manipulate complex datasets, and produce entire reports. They also hallucinate, cut corners, and constantly project confidence.

The Data Analyst Augmentation Framework is a free, open-source toolkit that turns Claude Code into a rigorous quantitative research engine -- in Python or R, on the AI provider of your choice -- with you at the helm: every step auditable, every output verifiable, every decision yours to make. Built by researchers, for researchers.

Get Started with DAAF See What Makes It Different

User: Hey DAAF, I'm thinking of running a regression discontinuity analysis with my own dataset. Before I do anything else, can you help me think through conceptually what assumptions/conditions are required for a good RD?

You interact with DAAF and Claude Code just like any chat window: describe what you want to do in plain, natural language. No special syntax or commands to have to juggle and remember.

DAAF: Thinking…

DAAF: Welcome to DAAF -- the Data Analyst Augmentation Framework. I'm your research orchestrator, here to help you work through rigorous, reproducible data analysis with human oversight at every key decision point.

Great question -- thinking through RD assumptions before touching data is exactly the right instinct. This is a conceptual/advisory conversation, so I'd classify this as Ad Hoc Collaboration mode. I'll work with you as a thought partner to think through the assumptions and conditions for a valid regression discontinuity design. No formal deliverables -- just a focused discussion. You can steer the conversation wherever it's most useful.

Sound good, or would you rather approach this differently?

DAAF intelligently determines which bespoke workflow best fits your request. Like a good collaborator, it communicates its plan and asks for your confirmation before proceeding. Ad Hoc Collaboration mode is a lightweight workflow for informal discussion and one-off tasks -- one of nine workflow styles DAAF supports.

User: That makes sense and sounds good, let's proceed

DAAF: Let me load the mode reference, then we'll dive in.

DAAF: Read(skills/daaf-orchestrator/.../ad-hoc-collaboration-mode.md)❲ Read 420 lines

DAAF: Now let me load the data-scientist skill as the mode requires, then I'll walk through the RD assumptions with you.

DAAF: Skill(data-scientist)❲ Successfully loaded skill

DAAF: Now let me pull in the causal inference reference for the detailed RD methodology.

DAAF: Read(skills/data-scientist/.../causal-inference.md)❲ Read 897 lines

DAAF: Thinking…

DAAF dynamically selects relevant references from a large library of carefully vetted guides ("Agent Skills") -- then embeds them into the conversation to ground its processes and thinking from that point on in known-good material that helps fight hallucinations ("Context Engineering"). Hover over any file to learn more and see what those reference files actually contain.

DAAF: Great question to start with -- getting the conceptual foundations right before touching data is exactly how credible causal work begins. Here's a thorough walkthrough of what makes an RD design valid.

------
Regression Discontinuity: The Core Assumptions

RD exploits a known cutoff in a running variable (also called a “score” or “forcing variable”) that determines treatment assignment. The fundamental idea is that units just above and just below the cutoff are essentially identical except for their treatment status -- creating quasi-random variation near the threshold.

1. A Known, Rule-Based Cutoff

There must be a clear, predetermined threshold where treatment status changes. Examples: a test score cutoff for scholarship eligibility, an income threshold for program access, an age cutoff for policy eligibility.

The cutoff must be rule-based, not discretionary. If a committee decides who gets treated using the score as one input among many, you don't have an RD -- you have selection on observables (at best).

2. Continuity of Potential Outcomes at the Cutoff

This is the core identifying assumption: absent the treatment, outcomes would have evolved smoothly through the cutoff. In potential-outcomes notation, E[Y(0)|X=c] and E[Y(1)|X=c] must be continuous at cutoff c.

3. No Manipulation of the Running Variable (No Sorting)

Units cannot precisely control their score to land on a preferred side of the cutoff. The McCrary (2008) density test checks whether the distribution of the running variable is smooth at the cutoff.

4. Sharp vs. Fuzzy -- Know Which You Have

Even in short interactions, DAAF grounds itself in real citations and references to avoid relying on its fuzzy general knowledge or inconsistent results from web searches. Have more follow-ups? Want to write some actual code? DAAF is your flexible and responsive collaborator every step of the way.

User: Can you remind me, what measures of socioeconomic status do we have available in the College Scorecard dataset? And for what years?

Have you ever wanted to be able to just talk with your data documentation in natural language? With DAAF, you can. Anytime you want to work with a specific dataset, DAAF helps you embed your codebooks, data diagnostics, and technical papers into a persistent knowledge file the AI assistants can reference to become experts on your data.

DAAF: Thinking…

DAAF: Welcome to DAAF! Great question -- this looks like a Data Lookup: you're asking about specific variable availability in the College Scorecard dataset.

I'll check the Scorecard skill for SES-related measures and their year coverage. Sound good?

User: Yeah, that sounds right to me. Please dig in.

DAAF: Read(skills/daaf-orchestrator/.../data-lookup-mode.md)❲ Read 163 lines

Just like the Ad Hoc example, DAAF classifies your request and confirms before proceeding. In Data Lookup mode, DAAF draws on carefully crafted dataset documentation to bring you grounded answers about your datasets -- far more reliable than general web-searching or relying on its fuzzy general knowledge.

DAAF: Dispatching a lookup specialist to pull SES measures from the Scorecard skill.

DAAF: Agent(search-agent, “Lookup Scorecard SES measures”)

↳ Skill(education-data-source-scorecard)❲ Loaded successfully

↳ Read(skills/.../scorecard/references/variable-definitions.md)❲ Read 604 lines

↳ Read(skills/.../scorecard/references/population-coverage.md)❲ Read 271 lines

DAAF: Thinking…

To better handle complex workflows, DAAF delegates work to specialist assistants whenever possible ("Agent Orchestration"). Here, DAAF launches a separate data scout to load the College Scorecard dataset reference it has access to. The data scout reads variable definitions and coverage caveats to compose an authoritative, grounded answer rooted in genuine documentation. Hover over any file to see what these references look like.

DAAF: Here's a quick summary of what's available for SES in College Scorecard via the Urban Institute Education Data Portal:

Student Body NSLDS (1997-2016)
From FAFSA/financial aid records -- individual-level SES

• faminc_mean / faminc_med -- Mean/median family income
• lowincome_pct / midincome_pct / highincome_pct -- Income tercile shares ($0-30K / $30-75K / $75K+)
• dependent_pct / independent_pct -- Dependency status shares
• first_gen_student_pct -- First-generation share
• parents_highest_ed_* -- Parent education distribution

Student Body Treasury (1997-2016)
From IRS/Treasury -- area-level proxies based on home ZIP

• hhinc_home_zip_med -- Median household income in student's home ZIP
• poverty_rate_home_zip -- Poverty rate in home ZIP
• unemp_rate_home_zip -- Unemployment rate in home ZIP

Earnings (2003-2014, 2018) and Repayment NSLDS (2007-2016) also include SES-disaggregated outcomes.

------
Key Caveats

1. Title IV recipients only. All measures reflect FAFSA filers -- full-pay students (often higher-income) are excluded, so SES distributions skew lower than reality, especially at selective privates.
2. Coverage varies by institution type -- ~30-50% at selective privates vs. 80-95% at for-profits.
3. Home ZIP variables are ecological proxies, not individual/household measures.
4. Income tercile thresholds are fixed ($0-30K / $30-75K / $75K+), not inflation-adjusted across years.
5. Portal data ends at 2016 for student body datasets.

Confidence: HIGH -- sourced from the curated Scorecard skill. Year boundaries should be verified against actual data if precise coverage matters for your analysis.

DAAF reports back with some high-level findings that the user can dig into at will in more detail. Notice the Key Caveats section: DAAF is explicitly designed to proactively flag these sorts of data limitations, coverage gaps, and places where you'd want to verify further. This attention to detail and nuance is embedded in every DAAF workflow and interaction by design.

DAAF: … Okay, the next task in the pipeline is joining the resource datasets (student-faculty ratios, retention rates, and finance data) onto the core demographics table. I’ll write a detailed dispatch for the research-executor to load the prior join scripts for context on the current table structure, then write and validate a new script for this step.

DAAF drafts all analytic code in a rigorous multi-step sequence to ensure higher levels of quality before it ever reaches your review. As in the Data Lookups example, it uses specialist assistants (“subagents”) to split tasks: one specialist to draft code with preliminary validations, and another specialist to adversarially review and further inspect assumptions from multiple angles.

DAAF: Agent(research-executor, “Join resource datasets onto core demographics”)

↳ Skill(data-scientist)❲ Successfully loaded skill

↳ Read(scripts/stage7_transform/01_join-core.py)❲ Read 316 lines

↳ Read(scripts/stage7_transform/02_join-demographics.py)❲ Read 318 lines

↳ Thinking…

↳ Write(scripts/stage7_transform/03_join-resources.py)❲ 288 lines

↳ Bash(bash run_with_capture.sh scripts/.../03_join-resources.py)
INITIAL VALIDATIONS: PASSED (8/8 checks)

Before writing any code, the coding agent first loads references on data science best practices and reads the prior scripts in the sequence to understand the current table structure. Then, it writes a complete script according to spec (logging its intentions, reasoning, and assumptions via inline comments so everything is easy to track and review) and runs the script with automatic logging of all validation/diagnostic tests. Because everything passed these initial validations, it reports the successful script location and logs back to the main DAAF assistant for the next step: adversarial code review.

DAAF: Initial script draft written with all self-tests passing from the research-executor. Now I'll dispatch the code-reviewer with the same context to take a closer look for QA.

DAAF: Agent(code-reviewer, “QA inspection: stage7_03_join-resources.py”)

↳ Skill(data-scientist)❲ Successfully loaded skill

↳ Read(scripts/stage7_transform/01_join-core.py)❲ Read 316 lines

↳ Read(scripts/stage7_transform/02_join-demographics.py)❲ Read 318 lines

↳ Read(scripts/stage7_transform/03_join-resources.py)❲ Read 288 lines

↳ Thinking…

↳ Write(scripts/cr/stage7_03_cr1.py)❲ 387 lines

↳ Bash(bash run_with_capture.sh scripts/cr/stage7_03_cr1.py)
[PASS] Schema: All 22 expected columns present
[PASS] Row count: 2,893 (expected 2,893)
[PASS] Coded values: None remain
[PASS] [Semantic] Model 3 regression vars present
[WARN] [Boundary] retention_rate: 26 zero values
[WARN] Finance plausibility: max=$14.1M per FTE
QA RESULT: ISSUES_FOUND -- investigating further

A completely separate code review specialist independently loads and audits the script to run its own battery of checks across schema validation, distribution analysis, boundary conditions, spec adherence, and more. It is specifically instructed to be adversarial, skeptical, and cautious; here, it flagged some suspicious data values and outliers from its initial inspections it decided to investigate further with additional checks.

↳ Thinking…

↳ Write(scripts/cr/stage7_03_cr2.py)❲ 198 lines

↳ Bash(bash run_with_capture.sh scripts/cr/stage7_03_cr2.py)
Hypothesis 1 (finance outliers): Extreme values from
law schools and medical schools — source data, not join error
Hypothesis 2 (zero retention): 26 institutions,
all small or specialized — legitimate, not coded-zero artifacts
Further investigation needed: NO

DAAF: Great inspections and QA report from the code-reviewer. Sounds like all concerns were resolved and we can proceed onto the next steps, but I'll log this for the user in our running session notes and mention it at the next checkpoint for their awareness.

After thoroughly explaining and testing its concerns in a new script, the code-reviewer agent drafts an in-depth summary QA report to pass back to the main DAAF assistant. While this specific analysis proceeds without issues, the above drafting-reviewing loop can iterate as many times as needed if any problems need resolution before proceeding. All issues get flagged for human review at regular checkpoints, ensuring visibility into the problems and solutions along the way. Most importantly: Every script version and QA check is fully traceable and reproducible by default.

User: I'm aware that graduation rates are often thought of as a key outcome for assessing a university/college's quality by the general public, but many researchers argue that there's a very strong question of chicken-or-the-egg in interpreting it that way: Are graduation rates high because the college actually did a good job in serving its students, or are graduation rates high because the college selectively admits students who are already highly competitive and academically prepared and likely to graduate/succeed anyway? I'd like to more critically explore this dynamic with data to better understand how correlated these things are, especially when thinking about additional complicating institutional factors like share of students on financial aid, other underserved or historically disadvantaged student population rates, etc. I'd like an analysis that helps provide an intuitive and holistic view on how these factors all relate to one another, and what implications that might have for broadly thinking about college 'quality' in general.

Users can genuinely pose any arbitrarily complex question or task to DAAF. Just describe what you want to understand, and DAAF will kick off a careful and deliberate workflow that scales to your needs. Regardless of complexity, DAAF always works closely with you to identify viable data sources, decide on the best analytical approaches, track all artifacts and outputs, and structure the work -- keeping your expertise and insight at the center of all key decision-making.

DAAF: Thinking…

DAAF: Welcome to DAAF! This is a really compelling research question -- you're essentially asking: how much of the variation in graduation rates is attributable to what colleges do versus who they admit?

I'm classifying this as Full Pipeline Mode -- you're asking for a substantive, multi-dimensional analysis that will require pulling data from multiple federal sources (IPEDS admissions, graduation rates, student financial aid, enrollment demographics), joining them at the institution level, and producing visualizations and statistical models.

I'll walk you through 5 phases with human checkpoints at every stage. Shall I proceed?

From here, DAAF launches into a multi-stage research workflow that goes all the way from initial data scoping to a completed, fully reproducible data analysis pipeline and summary analytic report with key findings, methodology/process, data visualizations, limitations, and opportunities for extension. DAAF does this by orchestrating scalable and complex workflows that intelligently stitch together everything you've seen in the earlier examples: dynamically loading curated reference skills for grounded expertise, dispatching specialist agents for targeted tasks, and passing every core task (planning, coding, and writing) through adversarial review along the way.

DAAF pauses and engages with you at every key decision point for weigh-in, review, and redirection, but in the end, a completed Full Pipeline analysis can take as little as 30 minutes of direct human involvement. Want to see how that works?

See our dedicated interactive deep-dive into this real analysis →

Scale Responsible, Rigorous, and Reproducible Research with AI

Your AI-powered lab manager

A force-multiplying exoskeleton for your expertise

Enhance your analysis

Enhance your scope

Enhance your impact

All of this, up and running in just one line of code

Set Up Prerequisites

Install DAAF

Begin

Public good infrastructure for the future of research

Ready to see what's possible?