No FAQs match your search. Try different keywords or .
Installation Troubleshooting
Getting DAAF installed and running for the first time -- Docker setup, container building, and initial configuration.
"docker: command not found" or "docker is not recognized"
This message means your computer doesn't recognize Docker (a tool that creates an isolated environment for running software) as an installed program yet. Here's how to fix it:
- Make sure Docker Desktop is installed. If you haven't installed it yet, download it from docker.com/products/docker-desktop ↗ and run the installer. (See Get Started -- Prerequisites for a walkthrough.)
- Restart your computer. After installing Docker Desktop, a full restart is often required -- especially on Windows -- so your system recognizes the new
dockercommand. - Verify it worked. Open a terminal (the program where you type commands -- on Mac, search for "Terminal" in Spotlight; on Windows, search for "PowerShell" in the Start menu) and type:
docker --version
Docker version 27.x.x, build xxxxxxx. If you see this, Docker is installed and ready. - If you still see the error after restarting, make sure Docker Desktop is actually running -- look for the whale icon in your system tray (Windows, bottom-right) or menu bar (Mac, top-right). Docker Desktop must be running for the
dockercommand to work.
"unable to get image 'daaf-daaf-docker'"
This means DAAF's pre-built environment (called an "image" in Docker -- think of it as a blueprint for setting everything up) hasn't been created on your computer yet.
- Make sure Docker Desktop is running. Look for the whale icon in your system tray (Windows) or menu bar (Mac). If it's not there, open Docker Desktop from your Applications folder or Start menu.
- Check if the image exists. Open Docker Desktop and click Images in the left sidebar. Look for an entry named
daaf-daaf-docker. If it's listed there, the image exists and the issue is likely that Docker Desktop wasn't running when you tried to start DAAF. - If the image is missing, run the installer again:
- macOS / Linux:
curl -fsSL https://raw.githubusercontent.com/DAAF-Contribution-Community/daaf/main/scripts/host/install.sh | bash - Windows:
irm https://raw.githubusercontent.com/DAAF-Contribution-Community/daaf/main/scripts/host/install.ps1 | iex
- macOS / Linux:
The installer will rebuild the image. This takes a few minutes the first time -- you'll see progress messages in your terminal as it downloads and configures everything.
"service 'daaf-docker' is not running"
This means DAAF's environment (its "container" -- the isolated workspace where everything runs) isn't active. Think of Docker Desktop as the engine and the container as the car -- the engine needs to be running before the car can go.
- Make sure Docker Desktop is running. Look for the whale icon in your system tray (Windows) or menu bar (Mac).
- Start DAAF using the convenience script -- this handles everything automatically:
- macOS / Linux: Open a terminal, navigate to your
daaf-dockerfolder, and runbash run_daaf.sh - Windows: Open PowerShell, navigate to your
daaf-dockerfolder, and run.\run_daaf.ps1
- macOS / Linux: Open a terminal, navigate to your
- To verify the container is running, open Docker Desktop and click Containers in the left sidebar. You should see an entry with "daaf" in the name showing a green "Running" status.
- If the container isn't listed at all, the initial installation may not have completed. Try running the installer again (see Get Started -- Installation).
Port conflicts (2718, 2719, or 2720 already in use)
A "port" is like a numbered doorway that programs use to communicate (see Key Concepts above). DAAF uses three ports to let you view things in your web browser:
- Port 2718 -- Marimo notebooks (your research results)
- Port 2719 -- DAAF Log Explorer (session history)
- Port 2720 -- Browser-based code editor
If another program on your computer is already using one of these doorways, you'll see a "port in use" error. The most common cause is a previous DAAF session that didn't shut down cleanly, or another application (like a local web server) using the same port number.
Quickest fix -- restart Docker:
- Open Docker Desktop
- Click Containers in the left sidebar
- If you see a DAAF container listed, click the stop button (square icon), wait a moment, then click the start button (play icon)
If the conflict persists, you can change which port numbers DAAF uses. Open the file docker-compose.yml in the daaf-docker folder (you can use any text editor -- Notepad, TextEdit, or VS Code). Find the ports: section and change the first number in each pair. For example, change "127.0.0.1:2718:2718" to "127.0.0.1:3000:2718". The first number is the port on your computer; the second is the port inside the container -- only change the first one. After saving, restart the container.
Tip: The view_notebooks convenience script automatically detects port conflicts and will tell you what's happening.
Related: Why are ports bound to localhost?
Permission denied errors inside the container (especially on macOS)
If Claude shows a "Permission denied" error when trying to read or create files, it's a file ownership issue -- the files inside DAAF's workspace were created with different permissions than what the container expects. This is a known quirk of Docker on macOS (and occasionally Windows).
The fix is usually simple -- just restart DAAF:
- Open a terminal and navigate to your
daaf-dockerfolder - Run:
docker compose down(this stops everything) - Run:
docker compose up -d(this starts it back up)
What you should see: Messages about services starting up. DAAF has a built-in repair step that automatically fixes file permissions every time it starts, so a simple restart usually resolves this.
If the problem persists after restarting, you can run a manual repair command. This is safe -- it just updates file ownership without changing any of your data:
docker run --rm -v "daaf_daaf-data:/daaf" busybox chown -R 1000:1000 /daaf
What this does in plain language: it starts a tiny temporary helper program, points it at your DAAF data, tells it to update all file ownership to match what the container expects, then cleans itself up. Your research files are untouched -- only the ownership metadata changes.
Claude Code asks for an API key every time I launch
Normally, DAAF remembers your login between sessions. If it keeps asking, the simplest permanent fix is to save your credentials in a settings file that DAAF reads every time it starts:
- Find the file: In the
daaf-dockerfolder on your computer (not inside the container), look for a file calledenvironment_settings.txt. If it doesn't exist yet, create one by copying the example file:- macOS / Linux:
cp environment_settings_example.txt environment_settings.txt - Windows:
Copy-Item environment_settings_example.txt environment_settings.txt
- macOS / Linux:
- Edit the file: Open
environment_settings.txtin any text editor (Notepad, TextEdit, VS Code -- anything works). You'll see lines likeANTHROPIC_API_KEY=. Add your API key after the=sign, with no spaces. For example:ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxx - Restart DAAF: Stop and restart the container so it picks up the new settings. Run
docker compose downthenbash run_daaf.sh(or.\run_daaf.ps1on Windows) from thedaaf-dockerfolder.
Why this happens: Your login information is stored inside DAAF's workspace. Occasionally, container rebuilds or updates can reset this storage. The environment_settings.txt approach is more durable because it lives on your computer (outside the container) and is applied fresh every time DAAF starts.
Malformed authentication URL when trying to log in to Claude Code
When Claude Code prompts you to authenticate by opening a URL in your browser, the link can sometimes wrap across multiple lines in your terminal window. If you copy a URL that contains an accidental line break, the browser will fail to open it or show an error page.
How to fix it:
- Paste the URL into a plain text editor (Notepad on Windows, TextEdit on Mac, or any simple editor -- not a word processor like Word, which can add hidden formatting).
- Look for any line breaks in the middle of the URL and delete them. The entire address should be one continuous line with no spaces or breaks.
- Copy the cleaned-up URL and paste it into your browser's address bar.
Related: Claude Code asks for an API key every time I launch
OpenRouter: "model not found" or authentication errors
OpenRouter is an alternative way to access Claude models without a direct Anthropic subscription (see Get Started -- Billing Options). If you're getting errors, check these three settings in your environment_settings.txt file (in the daaf-docker folder on your computer):
- The API address must be exact. Make sure the line reads exactly:
ANTHROPIC_BASE_URL=https://openrouter.ai/api-- no trailing/v1at the end. Even a small difference will cause errors. - The Anthropic key line must be present but empty. You need this line in the file:
ANTHROPIC_API_KEY=(with nothing after the equals sign). This tells DAAF to use OpenRouter instead of Anthropic directly. If this line is missing entirely, DAAF will try to connect to Anthropic's servers instead. - Clear any previous Anthropic login. If you previously logged in directly with Anthropic, that cached login can override the OpenRouter settings. Inside Claude Code, type
/logoutto clear it.
After making changes, restart DAAF (docker compose down then bash run_daaf.sh or .\run_daaf.ps1).
To verify it's working: Type /status inside Claude Code. It should show your connection details. You can also log into openrouter.ai ↗ and check the Activity Dashboard to see if requests are arriving.
Container seems really slow to build the first time
Completely normal -- the first build typically takes 5-15 minutes depending on your internet speed and computer. Here's what's happening behind the scenes:
- Downloading the base operating system for DAAF's environment (~200 MB)
- Installing Python 3.12 and 50+ data science packages
- Installing geospatial libraries (GDAL, GEOS, PROJ) for mapping capabilities
- Installing Claude Code
What you should see: A stream of progress messages in your terminal. Lines like Step 4/12 : RUN apt-get install... or Downloading polars-1.x.x... are normal. Let it run to completion.
Good news: This only happens once. Docker saves everything it downloads, so future starts take just a few seconds. If you ever need to rebuild (for example, after updating DAAF), only the parts that changed need to be re-downloaded.
Related: How much disk space does DAAF use?
I can't find my research files on my computer
By default, your research files live inside DAAF's isolated workspace, not in a regular folder on your computer. This is by design -- it keeps your research environment separate and protected. But it means you can't find those files by browsing your normal folders.
Three ways to access your files:
- Use the browser-based file manager (easiest). Run
bash run_vscode.sh(or.\run_vscode.ps1on Windows) from thedaaf-dockerfolder. This opens a full file browser at localhost:2720 where you can browse, download, and even edit any file in the DAAF workspace using your web browser. You can right-click files to download them to your computer. - Use the backup script. Run
bash backup_daaf.sh(or.\backup_daaf.ps1on Windows) from thedaaf-dockerfolder. This creates a timestamped backup of your entire DAAF workspace as a regular folder on your computer insidedaaf-docker/. - Copy specific files. To copy a single file or folder from DAAF to your computer, run this from the
daaf-dockerfolder:docker compose cp daaf-docker:/daaf/research/your-project/report.md ./
report.mdfrom inside DAAF to your current folder. Replace the path with the file you want.
Tip: DAAF stores all research projects in /daaf/research/ inside the container. Each project gets its own folder with all scripts, data, notebooks, and reports.
Related: How do I back up my research files?
How do I get help understanding or using DAAF itself?
The best part: you don't need to leave DAAF to get help with DAAF. Just ask! DAAF has a dedicated User Support mode for questions about the framework itself and the tools it runs on (Docker, Git (a version-tracking system for files), Claude Code). Simply type a question like "What is DAAF?", "How do engagement modes work?", "Something's not working right," "How do I give Docker more memory?", or "Help me understand the pipeline" and DAAF will recognize it as a User Support request. It can also look up official documentation for Docker, Git, and Claude Code online when needed.
In User Support mode, DAAF loads its own documentation and responds conversationally -- no subagents, no formal outputs, no workspace creation. When your questions naturally evolve into wanting to do something, DAAF will suggest switching to the appropriate mode.
For self-guided reading, the full user documentation suite is in user_reference/:
- Understanding and Working with DAAF ↗ -- how DAAF thinks, decides, and collaborates
- Best Practices ↗ -- tips for getting the most out of your sessions
How do I update DAAF to the latest version?
Run the update convenience script from the daaf-docker folder on your computer:
- macOS / Linux:
bash update_daaf.sh - Windows:
.\update_daaf.ps1
This pulls the latest version of the DAAF framework into your container. Your research files and project data are not affected.
If you don't have an update_daaf script (older installations), you can update manually: enter the container shell with bash run_daaf.sh bash, then run cd /daaf && git pull origin main.
How do I back up my research files?
DAAF includes a backup convenience script that creates a timestamped copy of your entire workspace on your computer:
- macOS / Linux:
bash backup_daaf.sh - Windows:
.\backup_daaf.ps1
Run this from the daaf-docker folder. It creates a backup folder with today's date inside daaf-docker/. We recommend backing up before major updates or at the end of important analysis sessions.
You can also use the browser-based file manager (bash run_vscode.sh) to manually download specific files to your computer.
Setup and Settings
Billing options, model choices, privacy, engagement modes, and other configuration decisions.
Can I run DAAF without Docker?
Technically yes, but it's not recommended and we can't provide support for it. Docker is what makes DAAF safe and reliable, for three important reasons:
- Security. DAAF lets an AI write and run code on your computer. Docker creates a walled-off space where that code can't accidentally access your personal files, install unwanted software, or make system changes. Without Docker, Claude would run with your full computer permissions.
- Reproducibility. Docker ensures that Python, all 50+ data science libraries, and Claude Code are installed in exactly the same configuration every time. This means your analyses will produce the same results on any computer.
- Easy recovery. If something goes wrong, you can tear down the DAAF environment and rebuild it from scratch in minutes. Your research data is kept separate and stays safe.
If you're experienced enough to want to run DAAF without Docker, you likely have the skills to adapt the setup yourself -- but the project can't offer troubleshooting help for non-Docker installations.
Should I use an API key or a Max subscription?
To use DAAF, you need a way to pay for Claude's AI processing. (DAAF itself is free and open-source -- the cost is for the AI service it runs on.) Think of it like a phone: the phone is free, but you need a phone plan. There are three main "plans":
Max subscription strongly recommended ($100 or $200/mo). DAAF is extremely usage-intensive by design -- based on real-world testing, API billing costs roughly 10x more than a Max subscription. A single full-pipeline analysis can easily cost $50-100+ via the API; the Max plan covers that at a flat monthly rate.
| Factor | API Key | Max Subscription |
|---|---|---|
| What is it? | A secret code (like a password) that bills your Anthropic account per use. You get one from console.anthropic.com ↗ | A monthly subscription to Anthropic's Claude service. Sign up at claude.ai ↗ |
| Cost model | Pay per token (uncapped) | Flat monthly ($100-200/mo) |
| Cost predictability | Variable, can spike | Fixed |
| Usage limits | Unlimited (if paying) | Subject to plan tier limits |
| Rate limiting | Minimal | May hit limits during heavy sessions |
| Best for | Light/occasional use, organizational budgets | Regular DAAF usage (recommended) |
Third option: OpenRouter -- pay-per-token access to Claude models with no monthly commitment (5.5% fee on credit purchases). Good for testing before committing to a subscription.
One thing to note: the Max plan does have usage limits per time window. If you're running several analyses in parallel (which you absolutely can!), you may occasionally hit a rate limit and need to wait a bit. The API key doesn't have that issue, but the cost adds up fast.
Which Claude model should I use?
Opus 4.5 or Opus 4.6 -- required. All development and testing was done on these models. Sonnet and Haiku produce erratic, inconsistent results with DAAF's workflow complexity.
Why Opus? DAAF's architecture demands Opus-class reasoning for several reasons:
- Multi-agent orchestration -- DAAF uses multiple AI "agents" (specialized versions of Claude, each with a different job) that work together. The main coordinator must manage all of them, which requires advanced reasoning.
- Following detailed protocols -- agent and skill files contain complex, branching instructions that require precise adherence
- Judgment calls about data quality -- identifying suppression patterns, questionable distributions, and methodological concerns
- Independent code review -- a separate AI reviewer checks every piece of code for mistakes, acting as a skeptical second pair of eyes. This requires the reviewer to be smart enough to catch subtle errors, not just rubber-stamp the work.
Opus 4.6 supports configurable thinking levels -- use the "High" setting (toggle in the /model selector with arrow keys). Higher thinking consumes more usage allocation, so there's a legitimate tradeoff to explore. Tested and recommended at "High" -- if you experiment with different settings, sharing your results would help improve this guidance for everyone.
How do I change the Claude model during a session?
Type /model in the Claude Code chat window. You'll see a list of available models -- use the up/down arrow keys to highlight the one you want and press Enter. The change takes effect immediately for everything after that point in your session.
Adjusting thinking level (Opus 4.6 only): While Opus 4.6 is highlighted in the model selector, press the left and right arrow keys to change the thinking level (Low, Medium, High). Higher thinking means Claude takes more time to reason through complex problems before responding -- it produces better results but uses more of your subscription allowance. High is recommended for DAAF.
Can I use DAAF with a different AI provider (OpenAI, Google, etc.)?
Partially, via OpenRouter. OpenRouter is a model gateway that provides pay-per-token access to Claude models through a single API key -- already configured as Option C in DAAF's setup. It works well for accessing Anthropic models without a Max subscription.
Using non-Anthropic AI models (like GPT-4 from OpenAI or Gemini from Google) is possible in theory but will produce poor results in practice. Claude Code is optimized for Anthropic models, and DAAF's complex multi-agent workflow requires Opus-class reasoning. OpenRouter's own documentation notes that Claude Code "is optimized for Anthropic models and may not work correctly with other providers." GPT-4o handles basic operations but struggles with the tool-calling patterns DAAF depends on. Extended thinking is also unavailable for non-Anthropic models.
Portability note: DAAF's skills and agent files are all Markdown -- the knowledge base transfers to other AI harnesses. What would need adaptation: the hooks system (.claude/hooks/), permission configuration (.claude/settings.json), and Claude Code-specific invocation patterns. In other words, DAAF's knowledge about education data, statistical methods, and best practices could be used with other AI tools. But the system that coordinates multiple AI agents, enforces safety rules, and manages quality checks is specific to Claude Code and would need significant rework.
Community contributions to port DAAF to other providers or test with open-source models would be enormously valuable -- the more researchers who have access to rigorous AI-assisted analysis tooling, the better. If you have the capacity to explore that, please reach out ↗.
Is my data sent to Anthropic? What about privacy?
This requires a nuanced answer. Here's how data actually flows:
- All data analysis and computation happens directly on your machine. Your datasets live inside the Docker container on your local hardware. Scripts run locally, outputs are written locally, and there is no mechanism by which Claude Code sends entire datasets outside of your machine.
- However, analytical outputs are inevitably sent to Anthropic. In the process of conducting data analysis, DAAF runs diagnostics (like examining individual table rows), statistical tests, data visualizations, report summaries, and so on. Because of the way chats with Claude in Claude Code work, these analytical outputs -- small "chunks" of your data in the form of results, sample rows, and summaries -- are sent to Anthropic's servers as part of the conversation. This is how AI-assisted analysis works at a fundamental level, because Claude needs to see information about the data itself to make good decisions about the analytical code.
- DAAF enforces additional safety guardrails. Hooks and permission rules prevent Claude from uploading or exfiltrating data files, and the framework blocks reading, writing, or committing credential-like files (
.env,*.pem,*.key,environment_settings*). The Docker container runs in a locked-down environment with restricted permissions, containing any unexpected behavior to the DAAF workspace. You can verify all of this yourself by reading the hook scripts in.claude/hooks/. - Your data privacy/data security exposure concerns with Anthropic depends on your specific license and access method. Certain Enterprise agreements with Anthropic provide stronger data handling assurances, including FERPA and HIPAA compliance. Access through cloud platforms like AWS Bedrock or Google Vertex AI may offer additional data governance controls through your organization's existing agreements. Under Anthropic's standard Enterprise policy, API inputs and outputs are not used for model training -- but the specifics depend on your exact plan, agreement, and access method.
Bottom line: You need to be fully aware of and take ownership of exploring and understanding these nuances for your use case before using DAAF with any private, proprietary, or otherwise protected non-public data. If you work with education records (FERPA), health data (HIPAA), or any regulated data, consult your IT team and legal counsel about the appropriate Anthropic access method for your compliance requirements. DAAF provides strong local safety guarantees, but the analytical conversation with Claude inevitably involves data exposure to Anthropic's infrastructure -- and the terms of that exposure are between you and Anthropic.
Is there a free way to use DAAF?
Not practically for full research analyses. DAAF is very usage-intensive -- a single complete analysis involves hundreds of AI interactions across multiple agents. The free tier and the $20/mo Pro tier of Claude don't provide nearly enough usage for this. You could use them for quick questions in User Support mode or simple Data Lookup requests, but a full analysis pipeline would exhaust the allowance very quickly.
OpenRouter offers a lower-commitment alternative: pay-per-token to Claude Opus models with no monthly subscription (5.5% fee on credit purchases). More accessible than $100-200/mo if you're doing occasional analyses rather than heavy daily use.
This is genuinely the biggest barrier to entry. The hope is that as model costs continue to decrease and open-source models become more capable, a more accessible option will emerge. If you have the capacity to test DAAF with open-source models or alternative providers, please reach out -- community contributions here would be enormously valuable.
How much disk space does DAAF use?
The initial DAAF installation uses roughly 3-5 GB of disk space. This includes the operating system, Python, 50+ data science libraries (for statistics, mapping, visualization, and machine learning), and Claude Code.
Beyond the initial installation, your workspace grows as you create projects. Each research project accumulates scripts, data files, notebooks, and reports -- typically 50-500 MB per project depending on dataset sizes.
To check how much space Docker is using: Open Docker Desktop and look at the Images and Volumes sections, or run docker system df in your terminal for a summary.
To reclaim unused space: Run docker system prune in your terminal. This cleans up old, unused data that Docker has accumulated. It will ask for confirmation before deleting anything.
⚠ Important: Do not delete the item named daaf_daaf-data in Docker Desktop's Volumes section -- that contains all your research files and project work. Everything else can be safely cleaned up.
Can I use DAAF offline?
No -- an internet connection is required for two reasons:
- AI processing happens online. When Claude analyzes your data, writes code, or answers questions, it communicates with Anthropic's servers over the internet. Without a connection, no AI interactions are possible.
- Data fetching needs internet. DAAF downloads public datasets (like education data from the Urban Institute) directly from their online portals.
If you lose your connection mid-session, don't worry -- your work-in-progress files and session state are preserved in the Docker volume. Nothing is lost. Once you reconnect, you can resume right where you left off. If DAAF was in the middle of a complex pipeline, it may ask you to restart from a recent checkpoint.
Why are the notebook and log viewer ports bound to localhost only?
When you view notebooks, session logs, or the code editor through your browser, DAAF is running a small web server inside its container. By default, this server is configured so that only your own computer can access it -- other devices on your WiFi or office network cannot.
Why this matters: The notebook viewer and code editor are powerful tools that can execute code. If they were accessible to anyone on your network, someone else could potentially run commands inside your DAAF environment. Restricting access to your computer only is a security precaution.
In practice, this means you access these tools by typing localhost:2718 (notebooks), localhost:2719 (logs), or localhost:2720 (editor) in your web browser. This is the normal, expected behavior.
If you need to change this (for example, to access DAAF from a tablet on the same network), you can edit the docker-compose.yml file in the daaf-docker folder. Find the ports: section and remove 127.0.0.1: from the beginning of each port line. Then restart the container. Only do this on trusted networks.
What are engagement modes and how do I choose one?
DAAF has 9 different "modes" -- think of them as different types of conversations you can have. You don't need to memorize them because you don't choose modes manually. Just describe what you want to do in plain language, and DAAF automatically selects the right mode. For example:
- "I want to analyze education spending trends in Virginia" → DAAF starts Full Pipeline
- "What data is available on school discipline?" → DAAF starts Data Discovery
- "Can you help me debug this script?" → DAAF starts Ad Hoc Collaboration
- "What does DAAF stand for?" → DAAF starts User Support
- "I have a CSV file I'd like DAAF to learn about" → DAAF starts Data Onboarding
For a complete list of all 9 modes and what they do, see the Get Started -- 9 Ways to Work section.
What are the /config and /model commands I keep seeing referenced?
These are commands you type directly into the Claude Code chat window (not your regular terminal). They start with a forward slash (/) and configure how Claude Code behaves:
/config-- Opens Claude Code's settings menu. The two important settings for DAAF are "Auto-compact" (set to False) and "Verbose output" (set to True). You only need to do this once./model-- Opens the model selector. Use arrow keys to pick your model (Opus 4.5 or 4.6 recommended) and press Enter./clear-- Resets the conversation, giving Claude fresh memory. Your files and data are not affected./exit-- Ends the Claude Code session./status-- Shows your current connection and model information.
These slash commands only work inside Claude Code's chat interface. They won't work in your regular terminal or PowerShell window.
Common Errors
Error messages you might encounter during DAAF sessions and what they mean.
"STOP: Suppression rate >50%"
This means more than half the data values in an important column have been hidden (or "suppressed") by the data publisher to protect people's privacy. This is common in education data -- when a group is very small (for example, fewer than 10 students of a certain race in a school district), the exact number is withheld so individual students can't be identified.
When more than 50% of the data is suppressed, the remaining values aren't reliable enough to draw meaningful conclusions. DAAF stops rather than producing a misleading analysis.
What you can try:
- Broader geography: Instead of individual school districts, try state-level or regional data (larger populations mean less suppression)
- Less demographic detail: Instead of breaking down by specific race/ethnicity subgroups, try using broader categories or totals
- Different years: Some years may have better coverage than others
- Report the limitation: In some cases, the suppression itself is a meaningful finding -- it tells you something about the data landscape
Related: Data unavailable or empty results
The notebook won't render in my browser.
The easiest way to view notebooks is with the convenience script -- run bash view_notebooks.sh (or .\view_notebooks.ps1 on Windows) from your daaf-docker folder. This handles container startup, port binding, and flag configuration automatically, and includes built-in port conflict detection.
That script handles everything automatically and is the recommended approach. The troubleshooting steps below are only needed if you're running Marimo manually (which most users don't need to do):
If you're using the manual marimo run command and can't see anything at http://localhost:2718, check these things in order:
- Is the container running? Check Docker Desktop's Containers panel. The
daafcontainer should show as running. - Did you include the right flags? The command needs
--host 0.0.0.0 --port 2718 --headlessfor Docker. - Is the port mapped correctly? Check your
docker-compose.yml-- the line"127.0.0.1:2718:2718"underports:maps the container's port to your host machine. - Is something else using port 2718? The
view_notebooksconvenience script detects this automatically. - Try a different browser or incognito/private window. Occasionally, browser extensions or cached state can interfere.
- Check for errors in the terminal. If marimo itself hit an error (e.g., a missing dependency or a syntax error in the notebook), the error will appear in the terminal where you ran the
marimo runcommand.
"Context utilization CRITICAL" and the session seems to stop
Not an error -- this is DAAF being responsible about Claude's working memory. Here's the concept: every time you or DAAF exchanges a message, reads a file, or runs code, it takes up space in Claude's "memory" for the current session (called the "context window"). Think of it like a desk -- as you pile on more papers, it gets harder to find what you need. Even though the desk is very large (Claude can handle up to 1 million "tokens" -- roughly 750,000 words), work quality starts declining well before it's completely full.
DAAF enforces percentage-based and absolute token thresholds -- whichever fires first:
| Utilization | Status | What happens |
|---|---|---|
| < 40% and < 150k tokens | NOMINAL | Normal operations |
| ≥ 40% or ≥ 150k tokens | ELEVATED | Works normally but delegates more to subagents |
| ≥ 60% or ≥ 200k tokens | HIGH | Finishes current work, prepares session restart |
| ≥ 75% or ≥ 250k tokens | CRITICAL | Stops new work, asks to restart session |
Seeing CRITICAL means Claude's context is nearly full -- continuing would degrade work quality. DAAF would rather stop and restart cleanly than produce increasingly unreliable output.
Recovery steps:
- Claude should have updated STATE.md with current progress and provided a restart prompt
- Copy the restart prompt
- Type
/clearin the Claude Code chat window to reset the session. This is like clearing off the desk -- it gives Claude a fresh start with empty memory. All your files, scripts, and data are completely untouched -- only the conversation memory is reset. - Paste the restart prompt into the fresh session
- Claude reads STATE.md and resumes exactly where it left off with a full fresh context window
Like saving your game before the battery dies -- the session state system was designed specifically for this.
Claude seems to have forgotten earlier instructions or decisions.
This is a known limitation of how AI language models work -- and it's the same reason you might lose track of details during a very long meeting. As a session gets longer, earlier information can get "pushed out" of Claude's active attention.
DAAF has several built-in mechanisms to handle this:
- Context monitoring catches this proactively via the context-reporter hook, which tracks utilization and warns when quality may degrade.
- STATE.md records all key decisions, checkpoint outcomes, and QA findings so they survive context pressure.
- Plan.md serves as the methodology specification; STATE.md tracks execution progress, QA findings, and runtime state. Together they provide a recoverable record of the full analysis.
- Session restart via
/clearand the restart prompt in STATE.md gives Claude a fresh context window with all prior decisions preserved.
If you notice degradation, prompt Claude to re-read STATE.md and Plan.md, or restart the session.
Claude seems to be making things up about data variables or endpoints.
This phenomenon (sometimes called "hallucination" in AI contexts) is the most common -- and most important -- symptom to recognize. It happens when DAAF confidently states incorrect information about specific data details -- variable names, website addresses, or data coding schemes that sound right but don't match reality.
DAAF has extensive curated knowledge about supported data sources stored in skill files. When skills load correctly, agents access exact variable names, precise endpoint paths, correct coded values, and known caveats. When a skill doesn't load (which doesn't happen every time -- it's somewhat unpredictable), the agent falls back on general training data and fills gaps with plausible-sounding but potentially incorrect details.
What to do:
- Make sure Verbose output is set to True in
/config-- this is your primary tool for monitoring how agents decide which reference files to load - Ask DAAF to verify: "Double-check that variable name against the actual skill documentation" or "Did the agent load the CCD data source skill before writing that script?"
- If it persists, try restarting with
/clear-- a fresh context often resolves loading issues - Report persistent loading failures by opening an issue ↗ -- patterns help improve DAAF's loading reliability
For more detail, see Best Practices -- Monitoring DAAF's Internal Reference Loading ↗.
Related: Which Claude model should I use?
DAAF seems to be doing something I didn't ask for. How do I stop or redirect it?
This can happen when DAAF misclassifies your request into the wrong engagement mode, or interprets your question differently than you intended. You have full control at all times:
- To interrupt immediately: Press
Ctrl + C(orCmd + Con Mac) in the terminal. This stops whatever Claude is currently doing. Your files and progress are safe -- nothing is lost. - To redirect: Just say what you actually wanted: "Actually, I just wanted a quick data lookup, not a full analysis" or "Hold on -- I want to change the approach." DAAF will adjust.
- To start over: Type
/clearto reset the session and start fresh. All your files remain intact.
DAAF is designed to check in with you at multiple points during longer workflows. At Full Pipeline checkpoints, you can review and adjust the direction before work continues.
Performance
How long things take, resource allocation, and running parallel analyses.
The analysis is taking a very long time. Is that normal?
Probably yes. A full-pipeline DAAF analysis is not a quick process -- by design. DAAF breaks every analysis into 12 stages across 5 phases. In data-heavy stages, every single script goes through an execute-then-review cycle (Claude writes the code, runs it, and then a completely separate Claude instance reads through the code line by line looking for mistakes -- like having a colleague review your work before you submit it).
| Phase | What's happening | Typical duration |
|---|---|---|
| Phase 1 (Discovery) | Exploring data sources, deep documentation dives | 5-15 minutes |
| Phase 2 (Planning) | Creating Plan.md and Plan_Tasks.md, validating | 20-30 minutes |
| Phase 3 (Data Acquisition) | Fetching data, cleaning, QA per script | 30-45 minutes |
| Phase 4 (Analysis) | Transformations, statistical analysis, visualizations, QA | 60-90 minutes |
| Phase 5 (Synthesis) | Assembling notebook, writing report, final review | 20-30 minutes |
A typical full run exceeds 2-3 hours of Claude's active processing time, plus any time you spend reviewing at phase boundaries.
What makes things slower: more data sources (each needs a fetch/clean/QA cycle), complex joins across multiple datasets, QA revisions when the code-reviewer catches issues, rate limiting on Max subscriptions (Anthropic may temporarily slow down your requests during heavy use to manage server load), and network latency fetching data from the Urban Institute portal.
When to worry: if a single stage seems stuck for 20-30+ minutes with no progress, check whether Claude is waiting for your input at a checkpoint. If it's genuinely stuck, interrupt with Ctrl+C and ask Claude to check STATE.md and resume.
Can I allocate more resources to the Docker container?
Usually unnecessary. The AI processing (Claude thinking and responding) happens on Anthropic's servers, not on your computer -- so your computer's speed mainly affects data processing, not AI interactions.
If you're working with very large datasets and notice slowness during data processing (not during Claude's responses), you can give Docker more memory: open Docker Desktop, go to Settings (gear icon) → Resources, and increase the memory allocation. For most analyses, the defaults work fine.
Can I run DAAF analyses in parallel?
Yes! You can work on multiple research projects at the same time. Open a new terminal window (or tab) and start another DAAF session -- each session works independently with its own project folder.
Cost note: Each parallel session uses your Anthropic subscription allowance independently, so running two analyses simultaneously uses roughly twice the allocation as running them one after the other. On a Max subscription, this means you may hit the usage limit faster during heavy parallel work.
Data Access
Working with built-in education data sources and bringing your own data into DAAF.
The assistant says data is unavailable or returns empty results.
There are several possible reasons, and it's usually not a DAAF problem -- it's a data availability issue:
- The data may not exist for what you asked. Not every dataset covers every year, state, or variable. For example, some education metrics aren't available before 2010, or certain demographic breakdowns aren't collected in every state. DAAF will usually tell you what's available during the Discovery phase.
- The data source might be temporarily down. DAAF fetches data from external portals (like the Urban Institute Education Data Portal ↗). If the portal is experiencing issues, data requests will fail. Try again later.
- The request might be too specific. Sometimes narrowing your search too much (e.g., a very specific school district + a specific race/ethnicity subgroup + a specific year) results in no matching data. Try broadening your geographic area, removing one filter, or using a range of years instead of a single year.
- The endpoint or filters are wrong. Occasionally, the assistant may construct a query that doesn't quite match the API's expected parameters. If you suspect this, check the session logs to see the exact query that was attempted, and compare it against the Education Data Portal documentation ↗.
Best first step: Use Data Discovery Mode before starting a full analysis. Tell DAAF something like: "I want to explore what data is available on [your topic] for [your geography/years]." This lets you see what's available before committing to a full pipeline run.
I'm getting a "KeyError: HARVARD_DATAVERSE_API_KEY" error when fetching election data.
Unlike most of the education data DAAF uses (which is freely available), election data is hosted on Harvard Dataverse, a platform that requires a free account and a personal access key to download files. This is their policy, not a DAAF limitation.
To fix this:
- Create a free account at dataverse.harvard.edu ↗
- Log in, click your account name in the top-right corner, then select API Token from the dropdown menu, then click Create Token. Copy the token that appears -- you'll need it in the next step.
- Add the key to the
environment_settings.txtfile in yourdaaf-docker/folder on the host:HARVARD_DATAVERSE_API_KEY=your_token_here
- If you don't have an
environment_settings.txtfile yet, copy the template first:- macOS/Linux:
cp environment_settings_example.txt environment_settings.txt - Windows:
Copy-Item environment_settings_example.txt environment_settings.txt
- macOS/Linux:
- Recreate the container:
docker compose downthenbash run_daaf.sh(or.\run_daaf.ps1on Windows)
Alternatively, you can set it manually inside the container before launching Claude Code: export HARVARD_DATAVERSE_API_KEY="your_token_here"
How current is the education data?
Education data always has a publication lag -- it takes time for data to be collected, cleaned, and published. Here are rough timelines for the major sources DAAF uses:
- CCD (Common Core of Data -- public school data): 1-2 years behind
- IPEDS (Integrated Postsecondary Education Data -- college/university data): 1-2 years behind
- CRDC (Civil Rights Data Collection -- school discipline, access data): 2-3 years behind
- College Scorecard (graduation rates, earnings data): 1-2 years behind, though earnings data can lag further
- EdFacts (achievement and assessment data): 1-2 years behind
You don't need to memorize these -- DAAF automatically checks what years are available during the Discovery phase and will tell you if the data you're looking for hasn't been published yet.
Can I use my own data files instead of the built-in sources?
Yes! DAAF has a dedicated Data Onboarding Mode specifically for this. Here's how it works:
- Get your file into DAAF. The easiest way is to use the browser-based file manager: run
bash run_vscode.sh(or.\run_vscode.ps1on Windows), then drag and drop your file into the DAAF workspace. You can also usedocker compose cp ./yourfile.csv daaf-docker:/daaf/from the terminal. - Tell DAAF about it. Start a session and say something like: "I have a CSV file called enrollment_data.csv that I'd like to onboard. Can you help me profile it?" DAAF will switch to Data Onboarding Mode automatically.
- DAAF analyzes your data. It will examine the structure, statistics, relationships, and quality of your dataset, then create a reference document that future analyses can use -- so DAAF "remembers" what your data contains and how to work with it.
⚠ Privacy reminder: While your dataset files stay on your local machine, analytical outputs (sample rows, statistics, summaries) will be sent to Anthropic's servers as part of the Claude Code conversation. For sensitive, personally identifiable, or regulated data (like student records protected by FERPA, or HIPAA-related health data), consult your organization's data policies and review your Anthropic license terms before proceeding. See "Is my data sent to Anthropic?" in the Setup and Settings section above for the full picture.
Session Logs and Diagnostics
Where logs live, how to view them visually, and how to use them for debugging.
Where are session logs stored?
Every time you use DAAF, it automatically saves a complete record of the session -- what Claude did, what files it created or modified, what code it ran, and what the results were. Think of these as detailed receipts for every session. They're saved in .claude/logs/sessions/ in multiple formats:
| Format | File Pattern | Purpose |
|---|---|---|
| Markdown | YYYY-MM-DD_HH-MM-SS_<session-id>_orchestrator.md | A readable summary you can open in any text editor -- shows everything Claude did, step by step |
| JSONL | YYYY-MM-DD_HH-MM-SS_<session-id>_orchestrator.jsonl | A detailed data file for advanced debugging -- you generally won't need to open this directly |
| Subagent JSONL | YYYY-MM-DD_HH-MM-SS_<session-id>_subagent_<agent-id>.jsonl | Separate logs for each specialized AI agent that was called during the session |
The orchestrator Markdown archive includes a Subagent Activity summary table listing each subagent's type, duration, tool uses, and final-message excerpt. Additionally, .claude/logs/activity.log records a timestamped entry every session start for a quick usage history overview. All logs are gitignored by default -- they stay local, never pushed to a repository.
Easiest way to view logs: Run bash view_logs.sh (or .\view_logs.ps1 on Windows) from the daaf-docker folder. This opens a visual timeline in your web browser that's much easier to navigate than reading the raw files.
Your logs stay completely private -- they're stored only on your computer and are never uploaded or shared automatically.
What happens to session logs if Claude Code crashes or I close the terminal unexpectedly?
Logs are preserved automatically. On the next session start, a background recovery scan archives any un-archived transcripts. You don't need to do anything -- DAAF handles this automatically.
How can I use session logs for debugging?
If something went wrong during a DAAF session -- unexpected results, an error you didn't understand, or behavior that seemed off -- session logs are your detective tools. The Markdown logs show exactly what the assistant did, in order -- every tool call, file read/write, subagent invocation, and output at each step.
DAAF includes the interactive DAAF Log Explorer, which renders session transcripts as a visual timeline in your web browser. The orchestrator's actions appear as a horizontal timeline bar, with subagent dispatches waterfalling downward. Click any block to see exactly what files were read, written, or executed -- with plain-language descriptions and clickable file references.
Quickest access from your host machine (no container shell needed):
bash view_logs.sh-- macOS / Linux.\view_logs.ps1-- Windows
This starts the container if needed, generates an activity manifest from all sessions, and starts a server. Open the printed URL in your browser. You can also run per-project log collection from inside the container using bash /daaf/scripts/collect_session_logs.sh and bash /daaf/scripts/generate_log_viewer.sh with your project path.
Note: The server requires port 2719 to be mapped in your docker-compose.yml. If you set up DAAF after this feature was added, it's already there. If not, add "127.0.0.1:2719:2719" under the ports: section and restart your container with docker compose down && docker compose up -d.
Alternatively, DAAF also processes every individual log transcript into a more intuitive markdown file showing the flow of the conversation alongside tool calling segments. You can find the relevant .md session log in .claude/logs/sessions/ (sorted by timestamp). The raw .jsonl file contains the complete raw transcript if deeper inspection is needed.
Are session logs shared or uploaded anywhere?
No -- your logs stay completely private. They are completely local, gitignored, and never uploaded. Only shared if you manually include excerpts in a bug report.
What's the difference between STATE.md and session logs?
Very different purposes:
Session logs are complete, raw transcripts of everything that happens in a Claude Code session. Automatically generated, stored in .claude/logs/, primarily useful for post-hoc debugging. Think of them as a security camera recording -- comprehensive but unfiltered. Browse visually using the DAAF Log Explorer rather than reading raw files.
STATE.md is a structured progress tracker that DAAF creates during full-pipeline analyses. It lives inside your project folder (research/[project]/STATE.md) and tracks the current analysis stage, passed checkpoints, decisions made, and next steps. It accumulates QA Findings Summaries, Final Review Logs, and Runtime Risks encountered during execution. (These are quality-check results, final-pass review notes, and any problems or limitations discovered along the way.)
STATE.md's primary purpose is enabling session recovery -- if a session runs out of context, you start a fresh session and STATE.md tells Claude exactly where to pick up. Like a bookmark with detailed notes.
Packages and Environment
Installing additional Python packages and managing the software environment inside DAAF's container.
How do I install additional Python packages?
There are two approaches, depending on whether you need the package temporarily or permanently:
Temporary (for quick testing): Inside a DAAF session, ask Claude to install it: "Can you install the networkx package?" Claude will run the installation command for you. However, these packages will disappear the next time DAAF is rebuilt or updated.
Permanent (recommended): Add the package to DAAF's blueprint file so it's always available:
- Open the file called
Dockerfilein yourdaaf-dockerfolder using any text editor - Find the section that lists Python packages (look for lines with package names like
polars,statsmodels, etc.) - Add your package name to the list
- Save the file
- Rebuild DAAF by running from the
daaf-dockerfolder:docker compose downdocker compose up -d --build
The rebuild will take a few minutes as it installs the new package. Your research files are unaffected -- only the environment is rebuilt.
Alternatively, you can ask DAAF directly: "I need the networkx package permanently -- can you help me add it to the Dockerfile?" DAAF can guide you through this process step by step.
Can I use apt-get or sudo inside the container?
No, and this is by design. For security, DAAF intentionally runs with limited permissions -- the AI cannot install system-level software, gain administrator access, or make changes outside the DAAF workspace. This protects your computer.
If you need a system-level library (something beyond a Python package), add it to the Dockerfile before building. Look for the section with apt-get install and add your library name there. Then rebuild the container with docker compose up -d --build.
This is an advanced operation -- if you're unsure, ask DAAF: "I need [library name] installed at the system level. Can you help me modify the Dockerfile?"
Will packages installed at runtime persist across restarts?
No. When you install a package during a session (for example, by asking Claude to run uv pip install networkx), it's installed temporarily. Think of it like writing on a whiteboard -- it works right now, but gets erased when the room is reset.
To make a package permanent (so it's always available), add it to the Dockerfile and rebuild. See "How do I install additional Python packages?" above for step-by-step instructions.
Important distinction: Your research files (data, scripts, notebooks, reports) are stored separately and do persist. Only the software environment resets on rebuild -- your work is safe.
What package manager does DAAF use?
uv -- a modern Python package installer that works the same way as the standard pip tool but runs much faster. You generally don't need to interact with it directly -- when you ask DAAF to install a package, it uses uv automatically behind the scenes.
If you're a Python developer and want to install packages manually inside the container, use uv pip install --user packagename. Note that manually installed packages don't persist across container rebuilds (see "Will packages installed at runtime persist?" above).
Technology Choices
Why DAAF uses the specific tools it does -- the reasoning behind each major technology decision.
Why Polars instead of Pandas?
Polars outperforms Pandas on the dimensions that matter most for DAAF's use case: performance on large datasets, lazy evaluation that lets you build a query plan before executing it, strong type safety that catches errors early, and an expression-based API that's more readable in code review.
For the kinds of datasets DAAF users typically work with -- often hundreds of thousands to millions of rows with complex joins and aggregations -- Polars' Rust-based engine provides meaningfully faster execution without requiring the user to think about optimization. The lazy evaluation model also makes it easier for DAAF to construct efficient query pipelines, since operations can be planned and optimized as a batch rather than executed line-by-line.
That said, Pandas is also installed in the container. If you have existing Pandas code or a strong Pandas preference, DAAF can work with it -- you'll just miss some of the performance and type-safety benefits that come with Polars by default.
Why Marimo instead of Jupyter?
Marimo notebooks are reactive -- when you change a cell, all dependent cells update automatically. This eliminates the hidden-state problems that plague Jupyter notebooks, where cells can hold stale values depending on execution order. For reproducibility, this is critical: a Marimo notebook always reflects the current state of the code.
Marimo notebooks are also stored as plain .py files rather than JSON, which makes them Git-friendly -- diffs are readable, merges are manageable, and version control works naturally. In Jupyter, notebook diffs are nearly impossible to review because the JSON format mixes code, outputs, and metadata.
The combination of reactive execution, Git-friendly format, and no hidden state makes Marimo a much better fit for DAAF's emphasis on reproducibility and auditability.
Why Docker instead of a virtual environment?
Docker provides three things that virtual environments cannot: true reproducibility (the exact same OS, libraries, and system dependencies on every machine), security isolation (Claude Code runs inside a container with dropped privileges and no access to your personal files), and consistent system dependency management (libraries that need C compilers, GDAL, or other system packages just work without per-platform debugging).
A virtual environment handles Python packages but not system-level dependencies, and it offers no security isolation at all -- a coding agent with a virtual environment has full access to your filesystem, credentials, and network. Docker's container boundary is what makes it safe to let an AI agent write and execute code on your machine.
The tradeoff is complexity: Docker Desktop is an additional install step, and the container model is unfamiliar to many researchers. DAAF's installer and helper scripts are designed to minimize this friction -- but if you're curious about what's happening behind the scenes, the Dockerfile in the repository is fully readable and you can ask your favorite LLM to help you interpret it.
Why Parquet for all data files?
Parquet preserves column types exactly -- integers stay integers, dates stay dates, and categories stay categories. CSV files lose all type information, which means every time you reload a CSV, your analysis tool has to guess what each column is. These guesses are frequently wrong (ZIP codes become numbers, date columns become strings), and silent type coercion is one of the most common sources of data analysis errors.
Parquet files are also compressed by default (typically 3-10x smaller than equivalent CSVs) and support columnar access, meaning you can read just the columns you need without loading the entire file into memory. For large education datasets, this translates to meaningfully faster load times and lower memory usage.
There's no CSV encoding ambiguity either -- no debates about delimiters, quote characters, or text encoding. A Parquet file is a Parquet file, and it reads the same way everywhere.
Why are scripts the primary artifact instead of notebooks?
DAAF produces Python scripts (.py files) as the primary analysis artifact because scripts provide a complete, sequential audit trail. Every line executes in order, top to bottom -- there's no ambiguity about execution sequence, no hidden state from out-of-order cell execution, and no risk of notebook cells holding stale values.
Scripts also version-control cleanly: git diff shows exactly what changed between versions, making it straightforward to review DAAF's revisions. And because scripts are immutable execution records (each version is saved separately, never overwritten), you always have a complete history of how the analysis evolved.
DAAF does generate Marimo notebooks as well -- but these serve as interactive exploration tools for reviewing results, not as the canonical record of the analysis. The script is the source of truth; the notebook is a lens for examining it.