Support - DAAF

No FAQs match your search. Try different keywords or .

Installation Troubleshooting

Getting DAAF installed and running for the first time -- Docker setup, container building, and initial configuration.

"docker: command not found" or "docker is not recognized"

This message means your computer doesn't recognize Docker (a tool that creates an isolated environment for running software) as an installed program yet. Here's how to fix it:

Make sure Docker Desktop is installed. If you haven't installed it yet, download it from docker.com/products/docker-desktop ↗ and run the installer. (See Get Started -- Prerequisites for a walkthrough.)
Restart your computer. After installing Docker Desktop, a full restart is often required -- especially on Windows -- so your system recognizes the new docker command.
Verify it worked. Open a terminal (the program where you type commands -- on Mac, search for "Terminal" in Spotlight; on Windows, search for "PowerShell" in the Start menu) and type:
- docker --version
What you should see: Something like Docker version 27.x.x, build xxxxxxx. If you see this, Docker is installed and ready.
If you still see the error after restarting, make sure Docker Desktop is actually running -- look for the whale icon in your system tray (Windows, bottom-right) or menu bar (Mac, top-right). Docker Desktop must be running for the docker command to work.

"unable to get image 'daaf-daaf-docker'"

This means DAAF's pre-built environment (called an "image" in Docker -- think of it as a blueprint for setting everything up) hasn't been created on your computer yet.

Make sure Docker Desktop is running. Look for the whale icon in your system tray (Windows) or menu bar (Mac). If it's not there, open Docker Desktop from your Applications folder or Start menu.
Check if the image exists. Open Docker Desktop and click Images in the left sidebar. Look for an entry named daaf-daaf-docker. If it's listed there, the image exists and the issue is likely that Docker Desktop wasn't running when you tried to start DAAF.
If the image is missing, run the installer again:
- macOS / Linux: curl -fsSL https://raw.githubusercontent.com/DAAF-Contribution-Community/daaf/main/scripts/host/install.sh | bash
- Windows: irm https://raw.githubusercontent.com/DAAF-Contribution-Community/daaf/main/scripts/host/install.ps1 | iex

The installer will rebuild the image. This takes a few minutes the first time -- you'll see progress messages in your terminal as it downloads and configures everything.

"service 'daaf-docker' is not running"

This means DAAF's environment (its "container" -- the isolated workspace where everything runs) isn't active. Think of Docker Desktop as the engine and the container as the car -- the engine needs to be running before the car can go.

Make sure Docker Desktop is running. Look for the whale icon in your system tray (Windows) or menu bar (Mac).
Start DAAF from the DAAF Control Panel -- the easiest way, which brings the container up for you. Open a terminal, navigate to your daaf-docker folder, and run bash daaf.sh (macOS/Linux) or .\daaf.ps1 (Windows), then choose 1) Start Claude Code. (If you'd rather do it by hand, run docker compose up -d from the daaf-docker folder first, then start Claude Code -- the older run_daaf.sh / run_daaf.ps1 scripts still work too.)
To verify the container is running, open Docker Desktop and click Containers in the left sidebar. You should see an entry with "daaf" in the name showing a green "Running" status.
If the container isn't listed at all, the initial installation may not have completed. Try running the installer again (see Get Started -- Installation).

Port conflicts (2718, 2719, or 2720 already in use)

A "port" is like a numbered doorway that programs use to communicate (see Key Concepts above). DAAF uses three ports to let you view things in your web browser:

Port 2718 -- Marimo notebooks (your research results)
Port 2719 -- DAAF Log Explorer (session history)
Port 2720 -- Browser-based code editor

If another program on your computer is already using one of these doorways, you'll see a "port in use" error. The most common cause is a previous DAAF session that didn't shut down cleanly, or another application (like a local web server) using the same port number.

Quickest fix -- restart Docker:

Open Docker Desktop
Click Containers in the left sidebar
If you see a DAAF container listed, click the stop button (square icon), wait a moment, then click the start button (play icon)

If the conflict persists, you can change which port numbers DAAF uses. Open the file docker-compose.yml in the daaf-docker folder (you can use any text editor -- Notepad, TextEdit, or VS Code). Find the ports: section and change the first number in each pair. For example, change "127.0.0.1:2718:2718" to "127.0.0.1:3000:2718". The first number is the port on your computer; the second is the port inside the container -- only change the first one. After saving, restart the container.

Tip: The view_notebooks convenience script automatically detects port conflicts and will tell you what's happening.

Permission denied errors inside the container (especially on macOS)

If Claude shows a "Permission denied" error when trying to read or create files, it's a file ownership issue -- the files inside DAAF's workspace were created with different permissions than what the container expects. This is a known quirk of Docker on macOS (and occasionally Windows).

The fix is usually simple -- just restart DAAF:

Open a terminal and navigate to your daaf-docker folder
Run: docker compose down (this stops everything)
Run: docker compose up -d (this starts it back up)

What you should see: Messages about services starting up. DAAF has a built-in repair step that automatically fixes file permissions every time it starts, so a simple restart usually resolves this.

If the problem persists after restarting, you can run a manual repair command. This is safe -- it just updates file ownership without changing any of your data:

docker run --rm -v "daaf_daaf-data:/daaf" busybox chown -R 1000:1000 /daaf

What this does in plain language: it starts a tiny temporary helper program, points it at your DAAF data, tells it to update all file ownership to match what the container expects, then cleans itself up. Your research files are untouched -- only the ownership metadata changes.

Claude Code asks for an API key every time I launch

Normally, DAAF remembers your login between sessions. If it keeps asking, the simplest permanent fix is to save your credentials in a settings file that DAAF reads every time it starts:

Find the file: In the daaf-docker folder on your computer (not inside the container), look for a file called environment_settings.txt. If it doesn't exist yet, create one by copying the example file:
- macOS / Linux: cp environment_settings_example.txt environment_settings.txt
- Windows: Copy-Item environment_settings_example.txt environment_settings.txt
Edit the file: Open environment_settings.txt in any text editor (Notepad, TextEdit, VS Code -- anything works). You'll see lines like ANTHROPIC_API_KEY=. Add your API key after the = sign, with no spaces. For example: ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxx
Restart DAAF: Stop and restart the container so it picks up the new settings. Run docker compose down then bash run_daaf.sh (or .\run_daaf.ps1 on Windows) from the daaf-docker folder.

Why this happens: Your login information is stored inside DAAF's workspace. Occasionally, container rebuilds or updates can reset this storage. The environment_settings.txt approach is more durable because it lives on your computer (outside the container) and is applied fresh every time DAAF starts.

Malformed authentication URL when trying to log in to Claude Code

When Claude Code prompts you to authenticate by opening a URL in your browser, the link can sometimes wrap across multiple lines in your terminal window. If you copy a URL that contains an accidental line break, the browser will fail to open it or show an error page.

How to fix it:

Paste the URL into a plain text editor (Notepad on Windows, TextEdit on Mac, or any simple editor -- not a word processor like Word, which can add hidden formatting).
Look for any line breaks in the middle of the URL and delete them. The entire address should be one continuous line with no spaces or breaks.
Copy the cleaned-up URL and paste it into your browser's address bar.

OpenRouter: "model not found" or authentication errors

OpenRouter is an alternative way to access Claude models without a direct Anthropic subscription (see Get Started -- Billing Options). If you're getting errors, check these three settings in your environment_settings.txt file (in the daaf-docker folder on your computer):

The API address must be exact. Make sure the line reads exactly: ANTHROPIC_BASE_URL=https://openrouter.ai/api -- no trailing /v1 at the end. Even a small difference will cause errors.
The Anthropic key line must be present but empty. You need this line in the file: ANTHROPIC_API_KEY= (with nothing after the equals sign). This tells DAAF to use OpenRouter instead of Anthropic directly. If this line is missing entirely, DAAF will try to connect to Anthropic's servers instead.
Clear any previous Anthropic login. If you previously logged in directly with Anthropic, that cached login can override the OpenRouter settings. Inside Claude Code, type /logout to clear it.

After making changes, restart DAAF (docker compose down then bash run_daaf.sh or .\run_daaf.ps1).

To verify it's working: Type /status inside Claude Code. It should show your connection details. You can also log into openrouter.ai ↗ and check the Activity Dashboard to see if requests are arriving.

Container seems really slow to build the first time

Completely normal -- the first build typically takes 5-15 minutes depending on your internet speed and computer. Here's what's happening behind the scenes:

Downloading the base operating system for DAAF's environment (~200 MB)
Installing Python 3.12 and 50+ data science packages
Installing R 4.5.3 with 60+ packages and the Quarto notebook tool
Installing geospatial libraries (GDAL, GEOS, PROJ) for mapping capabilities
Installing Claude Code

What you should see: A stream of progress messages in your terminal. Lines like Step 4/12 : RUN apt-get install... or Downloading polars-1.x.x... are normal. Let it run to completion.

Good news: This only happens once. Docker saves everything it downloads, so future starts take just a few seconds. If you ever need to rebuild (for example, after updating DAAF), only the parts that changed need to be re-downloaded.

I can't find my research files on my computer

By default, your research files live inside DAAF's isolated workspace, not in a regular folder on your computer. This is by design -- it keeps your research environment separate and protected. But it means you can't find those files by browsing your normal folders.

The simplest way in is the DAAF Control Panel -- run bash daaf.sh (or .\daaf.ps1 on Windows) from the daaf-docker folder, then choose 2) Browse Files (VS Code) to open the browser file manager, or 7) Create Backup to copy everything onto your computer. Here are the three ways in detail:

Use the browser-based file manager (easiest). Control Panel option 2 -- or run bash run_vscode.sh (.\run_vscode.ps1 on Windows) directly. This opens a full file browser at localhost:2720 where you can browse, download, and even edit any file in the DAAF workspace using your web browser. You can right-click files to download them to your computer.
Use the backup option. Control Panel option 7 -- or run bash backup_daaf.sh (.\backup_daaf.ps1 on Windows) directly. This creates a timestamped backup of your entire DAAF workspace as a regular folder on your computer inside daaf-docker/.
Copy specific files. To copy a single file or folder from DAAF to your computer, run this from the daaf-docker folder:
- docker compose cp daaf-docker:/daaf/research/your-project/report.md ./
This copies the file report.md from inside DAAF to your current folder. Replace the path with the file you want.

Tip: DAAF stores all research projects in /daaf/research/ inside the container. Each project gets its own folder with all scripts, data, notebooks, and reports.

How do I get help understanding or using DAAF itself?

The best part: you don't need to leave DAAF to get help with DAAF. Just ask! DAAF has a dedicated User Support mode for questions about the framework itself and the tools it runs on (Docker, Git (a version-tracking system for files), Claude Code). Simply type a question like "What is DAAF?", "How do engagement modes work?", "Something's not working right," "How do I give Docker more memory?", or "Help me understand the pipeline" and DAAF will recognize it as a User Support request. It can also look up official documentation for Docker, Git, and Claude Code online when needed.

In User Support mode, DAAF loads its own documentation and responds conversationally -- no subagents, no formal outputs, no workspace creation. When your questions naturally evolve into wanting to do something, DAAF will suggest switching to the appropriate mode.

For self-guided reading, the full user documentation suite is in user_reference/:

Understanding and Working with DAAF ↗ -- how DAAF thinks, decides, and collaborates
Best Practices ↗ -- tips for getting the most out of your sessions

How do I update DAAF to the latest version?

The easiest way is the DAAF Control Panel: from your daaf-docker folder, run bash daaf.sh (macOS/Linux) or .\daaf.ps1 (Windows) and choose 9) Check for Updates. It checks for available updates, shows you what's new, and handles the update safely -- including detecting and helping resolve any local edits you've made to DAAF's files. (If you'd rather run it directly, the same thing happens with bash update_daaf.sh or .\update_daaf.ps1.)

Your research files and project data are never affected by an update. When an update changes the underlying environment (for example, a new Python or R package), the updater detects it and offers to rebuild the container for you. Your research data is kept separate and stays safe through a rebuild.

Note for people upgrading from much older versions: if you installed a 2.1.x release, run the updater twice; a pre-2.1 install uses the one-time migrate_daaf script first. The updater will tell you if either applies to you.

How do I back up my research files?

The easiest way is the DAAF Control Panel: from your daaf-docker folder, run bash daaf.sh (macOS/Linux) or .\daaf.ps1 (Windows) and choose 7) Create Backup. (To run it directly instead, use bash backup_daaf.sh or .\backup_daaf.ps1.) This copies your entire research directory to a timestamped folder on your computer inside daaf-docker/. We recommend backing up before major updates or at the end of important analysis sessions. The backup is self-verifying -- it checks free disk space first and compares file counts and sizes so you know the copy completed cleanly.

⚠ A backup folder is sensitive -- store it privately. The backup also includes your Claude Code login and session history (kept in a hidden .daaf-claude-config/ subfolder), along with a couple of small manifest files that let a restore put file permissions and symbolic links back correctly (this is also what lets backups complete cleanly on Windows). Treat the whole backup folder as private, just as you would your login credentials.

You can also use the browser file manager (Control Panel option 2, or bash run_vscode.sh) to browse and download individual files.

Setup and Settings

Billing options, model choices, privacy, engagement modes, and other configuration decisions.

Can I run DAAF without Docker?

Technically yes, but it's not recommended and we can't provide support for it. Docker is what makes DAAF safe and reliable, for three important reasons:

Security. DAAF lets an AI write and run code on your computer. Docker creates a walled-off space where that code can't accidentally access your personal files, install unwanted software, or make system changes. Without Docker, Claude would run with your full computer permissions.
Reproducibility. Docker ensures that Python, R, all their data science libraries, and Claude Code are installed in exactly the same configuration every time. This means your analyses will produce the same results on any computer.
Easy recovery. If something goes wrong, you can tear down the DAAF environment and rebuild it from scratch in minutes. Your research data is kept separate and stays safe.

If you're experienced enough to want to run DAAF without Docker, you likely have the skills to adapt the setup yourself -- but the project can't offer troubleshooting help for non-Docker installations.

Should I use an API key or a Max subscription?

To use DAAF, you need a way to pay for Claude's AI processing. (DAAF itself is free and open-source -- the cost is for the AI service it runs on.) Think of it like a phone: the phone is free, but you need a phone plan. There are three main "plans":

Max subscription strongly recommended ($100 or $200/mo). DAAF is extremely usage-intensive by design -- based on real-world testing, API billing costs roughly 10x more than a Max subscription. A single full-pipeline analysis can easily cost $50-100+ via the API; the Max plan covers that at a flat monthly rate.

Factor	API Key	Max Subscription
What is it?	A secret code (like a password) that bills your Anthropic account per use. You get one from console.anthropic.com ↗	A monthly subscription to Anthropic's Claude service. Sign up at claude.ai ↗
Cost model	Pay per token (uncapped)	Flat monthly ($100-200/mo)
Cost predictability	Variable, can spike	Fixed
Usage limits	Unlimited (if paying)	Subject to plan tier limits
Rate limiting	Minimal	May hit limits during heavy sessions
Best for	Light/occasional use, organizational budgets	Regular DAAF usage (recommended)

Third option: OpenRouter -- pay-per-token access to Claude models with no monthly commitment (5.5% fee on credit purchases). Good for testing before committing to a subscription.

One thing to note: the Max plan does have usage limits per time window. If you're running several analyses in parallel (which you absolutely can!), you may occasionally hit a rate limit and need to wait a bit. The API key doesn't have that issue, but the cost adds up fast.

Which Claude model should I use?

DAAF ships with Opus 4.8 (with its 1-million-token context window) as the default, and staying on the default is a strong choice -- but it's far from the only good one. Empirical benchmarking across 20 models (DAAFBench) has produced clear, data-backed guidance:

Deepest reasoning -- Opus 4.8: the deepest analytical reasoning of the benchmarked set, best for complex methodology and nuanced judgment calls.
Best value -- Sonnet 4.6 or Sonnet 5: these match the Opus line on DAAF's orchestration benchmarks at a fraction of the cost, and are an excellent choice for most DAAF work.
Best without an Anthropic subscription -- GLM 5.2 (via OpenRouter): an open-weight model that lands roughly on par with the Opus line on orchestration at about 33% of the cost.
Budget-friendly -- DeepSeek V4 Flash (via OpenRouter): solid mid-tier performance at about 3% of flagship cost, worth exploring for less complex tasks.
Not recommended -- Haiku: adequate on basic questions but struggles with DAAF's multi-step protocols and skill routing.

One important caveat about these benchmarks: they measure how well a model follows DAAF's protocols -- routing requests, dispatching agents, loading the right skills -- not analytical reasoning depth or code quality directly. Opus may still have an edge on the hardest analytical work, but the gap between the top models is much smaller than previously assumed. DAAFBench: Choosing Your Model has the full per-model breakdowns; check the current results there before experimenting with cheaper or open-weight alternatives. Whichever model you pick, the High thinking level is recommended for research work.

How do I change the Claude model during a session?

Type /model in the Claude Code chat window. You'll see a list of available models -- use the up/down arrow keys to highlight the one you want and press Enter. The change takes effect immediately for everything after that point in your session.

Adjusting the thinking level: While a model is highlighted in the model selector, press the left and right arrow keys to change its thinking level (Low, Medium, High). Higher thinking means Claude takes more time to reason through complex problems before responding -- it produces better results but uses more of your subscription allowance. High is recommended for DAAF's research work.

Can I use DAAF with a different AI provider (OpenAI, Google, etc.)?

Yes -- DAAF gives you real choice here, and the alternative routes are supported pathways under active, extensive testing. There are a few things this question can mean, so let's take them in turn.

Different models through OpenRouter. OpenRouter is a model gateway that lets you route through a single pay-per-token API key -- already configured as Option C in DAAF's setup. Through it you can reach Anthropic's Claude models without a Max subscription, and also non-Anthropic models that perform well with DAAF. Our benchmarking (DAAFBench, across 2,799 runs on non-Anthropic models) found the standout is GLM 5.2, an open-weight model that lands roughly on par with the Opus line on orchestration at about 33% of the cost; DeepSeek V4 Flash performs appreciably worse but at a much lower price (~3% of Opus). One caveat: extended thinking, which DAAF uses heavily with Anthropic models, isn't available for non-Anthropic models through OpenRouter -- those models rely on their own native reasoning instead.

OpenAI GPT models and ChatGPT. Running DAAF on OpenAI's GPT models works and has been validated -- see the two dedicated entries below (Can I run DAAF on OpenAI GPT models? and Can I use my ChatGPT subscription instead of an OpenAI API key?) for how each route works and its honest caveats.

Porting DAAF to a different tool entirely. This is also possible but takes more effort. Most of what DAAF is -- the agent protocols, skill documents, workflow definitions, and validation checkpoints -- is just structured Markdown, and none of it is Anthropic-specific, so it would transfer to another agent harness immediately. What would need adaptation is the parts tied to Claude Code: the hooks system (.claude/hooks/, the safety guardrails), the permission configuration (.claude/settings.json), and some tool-invocation patterns. We'd genuinely be thrilled if someone forked DAAF for another harness -- the more researchers with access to rigorous AI-assisted analysis tooling, the better. If you're running DAAF with non-default models, please share your experience ↗ so we can keep refining this guidance.

Is my data sent to Anthropic? What about privacy?

This requires a nuanced answer. Here's how data actually flows:

All data analysis and computation happens directly on your machine. Your datasets live inside the Docker container on your local hardware. Scripts run locally, outputs are written locally, and there is no mechanism by which Claude Code sends entire datasets outside of your machine.
However, analytical outputs are inevitably sent to Anthropic. In the process of conducting data analysis, DAAF runs diagnostics (like examining individual table rows), statistical tests, data visualizations, report summaries, and so on. Because of the way chats with Claude in Claude Code work, these analytical outputs -- small "chunks" of your data in the form of results, sample rows, and summaries -- are sent to Anthropic's servers as part of the conversation. This is how AI-assisted analysis works at a fundamental level, because Claude needs to see information about the data itself to make good decisions about the analytical code.
DAAF enforces additional safety guardrails. Hooks and permission rules prevent Claude from uploading or exfiltrating data files, and the framework blocks reading, writing, or committing credential-like files (.env, *.pem, *.key, environment_settings*). The Docker container runs in a locked-down environment with restricted permissions, containing any unexpected behavior to the DAAF workspace. You can verify all of this yourself by reading the hook scripts in .claude/hooks/.
Your data privacy/data security exposure with Anthropic depends on your specific license and access method. Certain Enterprise agreements with Anthropic provide stronger data handling assurances, including FERPA and HIPAA compliance. Access through cloud platforms like AWS Bedrock or Google Vertex AI can offer additional data governance controls that keep data within your organization's cloud infrastructure. An important honesty note: DAAF ships configuration templates for the Bedrock and Vertex routes (in its environment_settings_example.txt file), but its maintainers haven't been able to validate those two routes end-to-end themselves -- so if your organization already runs on one of them, treat DAAF's support as a starting template you'll need to stand up and test in your own environment (and reports back are welcome). Under Anthropic's standard policy, API inputs and outputs are not used for model training, but the specifics depend on your exact plan, agreement, and access method -- verify Anthropic's current policies ↗ yourself.
OpenRouter adds an additional hop. If you use OpenRouter instead of a direct Anthropic connection, your analytical output transits through OpenRouter's servers in addition to the underlying model provider. Review OpenRouter's privacy policy ↗ alongside Anthropic's if you choose that route.

Bottom line: You need to be fully aware of and take ownership of exploring and understanding these nuances for your use case before using DAAF with any private, proprietary, or otherwise protected non-public data. If you work with education records (FERPA), health data (HIPAA), or any regulated data, consult your IT team and legal counsel about the appropriate access method for your compliance requirements. DAAF provides strong local safety guarantees, but the analytical conversation with Claude inevitably involves data exposure to the model provider's infrastructure -- and the terms of that exposure are between you and that provider. If your data genuinely can't leave your environment at all, see Can I use DAAF with data that can't leave my secure environment? in the Data Access section for DAAF's built-in synthetic-data workflow.

Is there a free way to use DAAF?

Not practically for full research analyses. DAAF is very usage-intensive -- a single complete analysis involves hundreds of AI interactions across multiple agents. The free tier and the $20/mo Pro tier of Claude don't provide nearly enough usage for this. You could use them for quick questions in User Support mode or simple Data Lookup requests, but a full analysis pipeline would exhaust the allowance very quickly.

More flexible and affordable billing via OpenRouter. OpenRouter offers pay-per-token access with no monthly commitment (5.5% fee on credit purchases), and -- critically -- access to high-performing open-weight models at a fraction of Anthropic's pricing. GLM 5.2 benchmarks competitively with the Opus line at roughly 33% of the cost, and DeepSeek V4 Flash offers passable mid-tier performance at roughly 3% of flagship cost. In concrete terms: a full pipeline analysis that might cost $50+ with Opus via the API could cost under $15 with GLM 5.2 through OpenRouter, or less than $1 with DeepSeek V4 Flash. That makes DAAF substantially more accessible than it was even a few months ago.

Cost remains a meaningful barrier to entry, but it's shrinking as open-weight models improve and inference costs fall. If you're running DAAF with non-default models, please share your experience -- community feedback on the quality-cost frontier directly informs the guidance here and in the DAAFBench results.

How much disk space does DAAF use?

The DAAF image is roughly 8.6 GB after building (the exact size varies with your Docker version and platform). It includes an Ubuntu 24.04 base, Python 3.12 with 46 pinned data science packages (statistics, geospatial, econometrics, visualization, machine learning), the geospatial system libraries (GDAL/GEOS/PROJ), Claude Code, R with 60+ pinned packages (tidyverse, fixest, survey, sf, and more), and the Quarto notebook tool. The R runtime, its packages, and Quarto account for roughly 2 GB of that total. Docker also keeps build-cache layers, so total Docker disk usage may be somewhat higher.

Beyond the image, your workspace grows as you create projects. Each research project accumulates scripts, data files, notebooks, and reports -- typically 50-500 MB per project depending on dataset sizes.

To check how much space Docker is using: Open Docker Desktop and look at the Images and Volumes sections, or run docker system df in your terminal for a summary.

To reclaim unused space: Run docker system prune in your terminal. This cleans up old, unused data that Docker has accumulated. It will ask for confirmation before deleting anything.

⚠ Important: Do not delete the item named daaf_daaf-data in Docker Desktop's Volumes section -- that contains all your research files and project work. Everything else can be safely cleaned up.

Can I use DAAF offline?

No -- an internet connection is required for two reasons:

AI processing happens online. When Claude analyzes your data, writes code, or answers questions, it communicates with Anthropic's servers over the internet. Without a connection, no AI interactions are possible.
Data fetching needs internet. DAAF downloads public datasets (like education data from the Urban Institute) directly from their online portals.

If you lose your connection mid-session, don't worry -- your work-in-progress files and session state are preserved in the Docker volume. Nothing is lost. Once you reconnect, you can resume right where you left off. If DAAF was in the middle of a complex pipeline, it may ask you to restart from a recent checkpoint.

Why are the notebook and log viewer ports bound to localhost only?

When you view notebooks, session logs, or the code editor through your browser, DAAF is running a small web server inside its container. By default, this server is configured so that only your own computer can access it -- other devices on your WiFi or office network cannot.

Why this matters: The notebook viewer and code editor are powerful tools that can execute code. If they were accessible to anyone on your network, someone else could potentially run commands inside your DAAF environment. Restricting access to your computer only is a security precaution.

In practice, this means you access these tools by typing localhost:2718 (notebooks), localhost:2719 (logs), or localhost:2720 (editor) in your web browser. This is the normal, expected behavior.

If you need to change this (for example, to access DAAF from a tablet on the same network), you can edit the docker-compose.yml file in the daaf-docker folder. Find the ports: section and remove 127.0.0.1: from the beginning of each port line. Then restart the container. Only do this on trusted networks.

What are engagement modes and how do I choose one?

DAAF has 9 different "modes" -- think of them as different types of conversations you can have. You don't need to memorize them because you don't choose modes manually. Just describe what you want to do in plain language, and DAAF automatically selects the right mode. For example:

"I want to analyze education spending trends in Virginia" → DAAF starts Full Pipeline
"What data is available on school discipline?" → DAAF starts Data Discovery
"Can you help me debug this script?" → DAAF starts Ad Hoc Collaboration
"What does DAAF stand for?" → DAAF starts User Support
"I have a CSV file I'd like DAAF to learn about" → DAAF starts Data Onboarding

For a complete list of all 9 modes and what they do, see Understanding DAAF -- The Nine Engagement Modes.

What are the /config and /model commands I keep seeing referenced?

These are commands you type directly into the Claude Code chat window (not your regular terminal). They start with a forward slash (/) and configure how Claude Code behaves:

/config -- Opens Claude Code's settings menu. The two important settings for DAAF are "Auto-compact" (set to False) and "Verbose output" (set to True). You only need to do this once.
/model -- Opens the model selector. Use arrow keys to pick your model (Opus 4.8, the default, is recommended) and press Enter.
/clear -- Resets the conversation, giving Claude fresh memory. Your files and data are not affected.
/exit -- Ends the Claude Code session.
/status -- Shows your current connection and model information.

These slash commands only work inside Claude Code's chat interface. They won't work in your regular terminal or PowerShell window.

Can I run DAAF on OpenAI GPT models?

Yes -- running DAAF on OpenAI's GPT models is a supported pathway under active, extensive testing, and it has been validated live. The goal is accessibility: you shouldn't need an Anthropic subscription to use DAAF. GPT runs the full DAAF stack -- multi-step tool loops, subagent dispatch, and two-tier model routing. There are two ways in:

Via OpenRouter (no rebuild): point your existing OpenRouter setup (Option C) at GPT model names like openai/gpt-5.6-sol (the strong tier) or openai/gpt-5.6-terra (the faster tier). This is config-only -- just environment variables in your environment_settings.txt file.
Via the DAAF provider shim (direct OpenAI API): DAAF bundles a small local "provider shim" that translates between Claude Code and OpenAI's API. Set DAAF_PROVIDER_SHIM=openai and your OPENAI_API_KEY, then point Claude Code at the shim. This route requires one image rebuild, because the shim starts automatically inside the container.

Two honest caveats. First, because DAAF ships with Claude as the default, a GPT session opens on a Claude model until you switch it -- just run /model after launch and pick your GPT model (if you forget, the first message fails with a loud error, so there's no silent wrong-model risk). Second, Anthropic doesn't officially support routing Claude Code to non-Claude models, and OpenRouter's Anthropic-compatible endpoint is officially scoped to Claude models -- GPT works through it in practice, but that's territory a vendor could change. Full step-by-step setup lives in the installation guide.

Can I use my ChatGPT subscription instead of an OpenAI API key?

Yes -- through a supported pathway under active, extensive testing. In plain terms: instead of paying per token for OpenAI API access, you reuse the ChatGPT subscription you already have. DAAF's provider shim has an alternate mode (SHIM_BACKEND_MODE=chatgpt) that routes Claude Code through your ChatGPT subscription's Codex backend using a one-time device login (the Codex tool ships in every DAAF image). Because this is the newest route, it especially benefits from your reports if you hit anything rough.

Two things to know up front:

Terms of service. This lane works through a backend interface that OpenAI doesn't officially offer for third-party tools -- OpenAI scopes subscription usage to its own official apps -- so OpenAI could change it, and you are responsible for compliance with OpenAI's terms of service. If you'd rather stay on an interface OpenAI officially offers, use the API-key route (the GPT-models question above).
A lower context ceiling. The ChatGPT/Codex backend enforces a much lower effective input limit than the model's full window -- measured at about 370,000 tokens for gpt-5.6-sol (compared with the model's 1-million-token window on the API route). This is a backend limit you can't raise from your side; on this lane you set CLAUDE_CODE_MAX_CONTEXT_TOKENS=370000 so DAAF's context tracking matches the real ceiling.

Full step-by-step setup (including enabling device-code login in your ChatGPT security settings first, which is off by default and the most common thing to miss) is in the installation guide.

How do I download a whole folder from the container?

Downloading a single file from the browser editor is easy -- right-click it in the explorer sidebar and choose Download, in any browser. A whole folder takes one extra step, because browsers don't have a built-in "download this folder" button the way they do for single files.

The reliable, works-everywhere method is to zip the folder first: right-click the folder, choose Compress → zip, then right-click the new .zip file that appears next to the folder and choose Download. You end up with one archive on your computer that you can unzip normally. When compressing, use the zip, tar, or tgz options only -- the bz2 and 7z choices won't work in the container. The Compress menu is built in, so there's nothing to install.

If you use Chrome or Edge, there's also a shortcut: right-click a folder and choose Download directly -- your browser copies the files into a location you pick (you get the files, not a single zip). This shortcut only works in Chrome and Edge; in Firefox or Safari, use the Compress → zip → Download method above.

If you don't see a Compress option, your container predates the feature -- update DAAF and rebuild once to pick it up. For a full backup of everything, the Control Panel's Create Backup option is still the simplest route.

Common Errors

Error messages you might encounter during DAAF sessions and what they mean.

"STOP: Suppression rate >50%"

This means more than half the data values in an important column have been hidden (or "suppressed") by the data publisher to protect people's privacy. This is common in education data -- when a group is very small (for example, fewer than 10 students of a certain race in a school district), the exact number is withheld so individual students can't be identified.

When more than 50% of the data is suppressed, the remaining values aren't reliable enough to draw meaningful conclusions. DAAF stops rather than producing a misleading analysis.

What you can try:

Broader geography: Instead of individual school districts, try state-level or regional data (larger populations mean less suppression)
Less demographic detail: Instead of breaking down by specific race/ethnicity subgroups, try using broader categories or totals
Different years: Some years may have better coverage than others
Report the limitation: In some cases, the suppression itself is a meaningful finding -- it tells you something about the data landscape

The notebook won't render in my browser.

The easiest way to view notebooks is the DAAF Control Panel: run bash daaf.sh (or .\daaf.ps1 on Windows) from your daaf-docker folder and choose 4) View Marimo Notebooks (Python). (The same convenience script runs directly as bash view_notebooks.sh / .\view_notebooks.ps1.) This handles container startup, port binding, and flag configuration automatically, and includes built-in port conflict detection.

Using R instead of Python? R projects produce Quarto notebooks (.qmd files) rather than Marimo, and these render to a static HTML page -- view one with Control Panel option 5) View Quarto Notebooks (R) (or bash view_quarto.sh / .\view_quarto.ps1 directly).

The Control Panel handles everything automatically and is the recommended approach. The troubleshooting steps below are only needed if you're running Marimo manually (which most users don't need to do):

If you're using the manual marimo run command and can't see anything at http://localhost:2718, check these things in order:

Is the container running? Check Docker Desktop's Containers panel. The daaf container should show as running.
Did you include the right flags? The command needs --host 0.0.0.0 --port 2718 --headless for Docker.
Is the port mapped correctly? Check your docker-compose.yml -- the line "127.0.0.1:2718:2718" under ports: maps the container's port to your host machine.
Is something else using port 2718? The view_notebooks convenience script detects this automatically.
Try a different browser or incognito/private window. Occasionally, browser extensions or cached state can interfere.
Check for errors in the terminal. If marimo itself hit an error (e.g., a missing dependency or a syntax error in the notebook), the error will appear in the terminal where you ran the marimo run command.

"Context utilization CRITICAL" and the session seems to stop

Not an error -- this is DAAF being responsible about Claude's working memory. Here's the concept: every time you or DAAF exchanges a message, reads a file, or runs code, it takes up space in Claude's "memory" for the current session (called the "context window"). Think of it like a desk -- as you pile on more papers, it gets harder to find what you need. Even though the desk is very large (Claude can handle up to 1 million "tokens" -- roughly 750,000 words), work quality starts declining well before it's completely full.

DAAF enforces percentage-based and absolute token thresholds -- whichever fires first:

Utilization	Status	What happens
< 40% and < 150k tokens	NOMINAL	Normal operations
≥ 40% or ≥ 150k tokens	ELEVATED	Works normally but delegates more to subagents
≥ 60% or ≥ 200k tokens	HIGH	Finishes current work, prepares session restart
≥ 75% or ≥ 250k tokens	CRITICAL	Stops new work, asks to restart session

Seeing CRITICAL means Claude's context is nearly full -- continuing would degrade work quality. DAAF would rather stop and restart cleanly than produce increasingly unreliable output.

Recovery steps:

Claude should have updated STATE.md with current progress and provided a restart prompt
Copy the restart prompt
Type /clear in the Claude Code chat window to reset the session. This is like clearing off the desk -- it gives Claude a fresh start with empty memory. All your files, scripts, and data are completely untouched -- only the conversation memory is reset.
Paste the restart prompt into the fresh session
Claude reads STATE.md and resumes exactly where it left off with a full fresh context window

Like saving your game before the battery dies -- the session state system was designed specifically for this.

Claude seems to have forgotten earlier instructions or decisions.

This is a known limitation of how AI language models work -- and it's the same reason you might lose track of details during a very long meeting. As a session gets longer, earlier information can get "pushed out" of Claude's active attention.

DAAF has several built-in mechanisms to handle this:

Context monitoring catches this proactively via the context-reporter hook, which tracks utilization and warns when quality may degrade.
STATE.md records all key decisions, checkpoint outcomes, and QA findings so they survive context pressure.
Plan.md serves as the methodology specification; STATE.md tracks execution progress, QA findings, and runtime state. Together they provide a recoverable record of the full analysis.
Session restart via /clear and the restart prompt in STATE.md gives Claude a fresh context window with all prior decisions preserved.

If you notice degradation, prompt Claude to re-read STATE.md and Plan.md, or restart the session.

Claude seems to be making things up about data variables or endpoints.

This phenomenon (sometimes called "hallucination" in AI contexts) is the most common -- and most important -- symptom to recognize. It happens when DAAF confidently states incorrect information about specific data details -- variable names, website addresses, or data coding schemes that sound right but don't match reality.

DAAF has extensive curated knowledge about supported data sources stored in skill files. When skills load correctly, agents access exact variable names, precise endpoint paths, correct coded values, and known caveats. When a skill doesn't load (which doesn't happen every time -- it's somewhat unpredictable), the agent falls back on general training data and fills gaps with plausible-sounding but potentially incorrect details.

What to do:

Make sure Verbose output is set to True in /config -- this is your primary tool for monitoring how agents decide which reference files to load
Ask DAAF to verify: "Double-check that variable name against the actual skill documentation" or "Did the agent load the CCD data source skill before writing that script?"
If it persists, try restarting with /clear -- a fresh context often resolves loading issues
Report persistent loading failures by opening an issue ↗ -- patterns help improve DAAF's loading reliability

For more detail, see Best Practices -- Monitoring DAAF's Internal Reference Loading ↗.

DAAF seems to be doing something I didn't ask for. How do I stop or redirect it?

This can happen when DAAF misclassifies your request into the wrong engagement mode, or interprets your question differently than you intended. You have full control at all times:

To interrupt immediately: Press Ctrl + C (or Cmd + C on Mac) in the terminal. This stops whatever Claude is currently doing. Your files and progress are safe -- nothing is lost.
To redirect: Just say what you actually wanted: "Actually, I just wanted a quick data lookup, not a full analysis" or "Hold on -- I want to change the approach." DAAF will adjust.
To start over: Type /clear to reset the session and start fresh. All your files remain intact.

DAAF is designed to check in with you at multiple points during longer workflows. At Full Pipeline checkpoints, you can review and adjust the direction before work continues.

Performance

How long things take, resource allocation, and running parallel analyses.

The analysis is taking a very long time. Is that normal?

Probably yes. A full-pipeline DAAF analysis is not a quick process -- by design. DAAF breaks every analysis into 12 stages across 5 phases. In data-heavy stages, every single script goes through an execute-then-review cycle (Claude writes the code, runs it, and then a completely separate Claude instance reads through the code line by line looking for mistakes -- like having a colleague review your work before you submit it).

Phase	What's happening	Typical duration
Phase 1 (Discovery)	Exploring data sources, deep documentation dives	5-15 minutes
Phase 2 (Planning)	Creating Plan.md and Plan_Tasks.md, validating	20-30 minutes
Phase 3 (Data Acquisition)	Fetching data, cleaning, QA per script	30-45 minutes
Phase 4 (Analysis)	Transformations, statistical analysis, visualizations, QA	60-90 minutes
Phase 5 (Synthesis)	Assembling notebook, writing report, final review	20-30 minutes

A typical full run exceeds 2-3 hours of Claude's active processing time, plus any time you spend reviewing at phase boundaries.

What makes things slower: more data sources (each needs a fetch/clean/QA cycle), complex joins across multiple datasets, QA revisions when the code-reviewer catches issues, rate limiting on Max subscriptions (Anthropic may temporarily slow down your requests during heavy use to manage server load), and network latency fetching data from the Urban Institute portal.

When to worry: if a single stage seems stuck for 20-30+ minutes with no progress, check whether Claude is waiting for your input at a checkpoint. If it's genuinely stuck, interrupt with Ctrl+C and ask Claude to check STATE.md and resume.

Can I allocate more resources to the Docker container?

Usually unnecessary. The AI processing (Claude thinking and responding) happens on Anthropic's servers, not on your computer -- so your computer's speed mainly affects data processing, not AI interactions.

If you're working with very large datasets and notice slowness during data processing (not during Claude's responses), you can give Docker more memory: open Docker Desktop, go to Settings (gear icon) → Resources, and increase the memory allocation. For most analyses, the defaults work fine.

Can I run DAAF analyses in parallel?

Yes! You can work on multiple research projects at the same time. Open a new terminal window (or tab) and start another DAAF session -- each session works independently with its own project folder.

Cost note: Each parallel session uses your Anthropic subscription allowance independently, so running two analyses simultaneously uses roughly twice the allocation as running them one after the other. On a Max subscription, this means you may hit the usage limit faster during heavy parallel work.

Data Access

Working with built-in education data sources and bringing your own data into DAAF.

The assistant says data is unavailable or returns empty results.

There are several possible reasons, and it's usually not a DAAF problem -- it's a data availability issue:

The data may not exist for what you asked. Not every dataset covers every year, state, or variable. For example, some education metrics aren't available before 2010, or certain demographic breakdowns aren't collected in every state. DAAF will usually tell you what's available during the Discovery phase.
The data source might be temporarily down. DAAF fetches data from external portals (like the Urban Institute Education Data Portal ↗). If the portal is experiencing issues, data requests will fail. Try again later.
The request might be too specific. Sometimes narrowing your search too much (e.g., a very specific school district + a specific race/ethnicity subgroup + a specific year) results in no matching data. Try broadening your geographic area, removing one filter, or using a range of years instead of a single year.
The endpoint or filters are wrong. Occasionally, the assistant may construct a query that doesn't quite match the API's expected parameters. If you suspect this, check the session logs to see the exact query that was attempted, and compare it against the Education Data Portal documentation ↗.

Best first step: Use Data Discovery Mode before starting a full analysis. Tell DAAF something like: "I want to explore what data is available on [your topic] for [your geography/years]." This lets you see what's available before committing to a full pipeline run.

I'm getting a "KeyError: HARVARD_DATAVERSE_API_KEY" error when fetching election data.

Unlike most of the education data DAAF uses (which is freely available), election data is hosted on Harvard Dataverse, a platform that requires a free account and a personal access key to download files. This is their policy, not a DAAF limitation.

To fix this:

Create a free account at dataverse.harvard.edu ↗
Log in, click your account name in the top-right corner, then select API Token from the dropdown menu, then click Create Token. Copy the token that appears -- you'll need it in the next step.
Add the key to the environment_settings.txt file in your daaf-docker/ folder on the host:
- HARVARD_DATAVERSE_API_KEY=your_token_here
If you don't have an environment_settings.txt file yet, copy the template first:
- macOS/Linux: cp environment_settings_example.txt environment_settings.txt
- Windows: Copy-Item environment_settings_example.txt environment_settings.txt
Recreate the container: docker compose down then bash run_daaf.sh (or .\run_daaf.ps1 on Windows)

Alternatively, you can set it manually inside the container before launching Claude Code: export HARVARD_DATAVERSE_API_KEY="your_token_here"

How current is the education data?

Education data always has a publication lag -- it takes time for data to be collected, cleaned, and published. Here are rough timelines for the major sources DAAF uses:

CCD (Common Core of Data -- public school data): 1-2 years behind
IPEDS (Integrated Postsecondary Education Data -- college/university data): 1-2 years behind
CRDC (Civil Rights Data Collection -- school discipline, access data): 2-3 years behind
College Scorecard (graduation rates, earnings data): 1-2 years behind, though earnings data can lag further
EdFacts (achievement and assessment data): 1-2 years behind

You don't need to memorize these -- DAAF automatically checks what years are available during the Discovery phase and will tell you if the data you're looking for hasn't been published yet.

Can I use my own data files instead of the built-in sources?

Yes! DAAF has a dedicated Data Onboarding Mode specifically for this. Here's how it works:

Get your file into DAAF. The easiest way is the browser-based file manager: open it from the DAAF Control Panel (bash daaf.sh / .\daaf.ps1, option 2) Browse Files (VS Code), or run bash run_vscode.sh / .\run_vscode.ps1 directly), then drag and drop your file into the DAAF workspace. You can also use docker compose cp ./yourfile.csv daaf-docker:/daaf/ from the terminal.
Tell DAAF about it. Start a session and say something like: "I have a CSV file called enrollment_data.csv that I'd like to onboard. Can you help me profile it?" DAAF will switch to Data Onboarding Mode automatically.
DAAF analyzes your data. It will examine the structure, statistics, relationships, and quality of your dataset, then create a reference document that future analyses can use -- so DAAF "remembers" what your data contains and how to work with it.

⚠ Privacy reminder: While your dataset files stay on your local machine, analytical outputs (sample rows, statistics, summaries) will be sent to Anthropic's servers as part of the Claude Code conversation. For sensitive, personally identifiable, or regulated data (like student records protected by FERPA, or HIPAA-related health data), consult your organization's data policies and review your Anthropic license terms before proceeding. See "Is my data sent to Anthropic?" in the Setup and Settings section above for the full picture -- and if the data genuinely can't leave your environment, see the next question.

Can I use DAAF with data that can't leave my secure environment?

Yes -- this is exactly the scenario DAAF's built-in synthetic-data workflow was designed for. It's the default approach for data that is sensitive, proprietary, personally identifiable, HIPAA/FERPA-governed, or locked in a secure enclave. The core idea is simple: your real data never enters the DAAF container at all. Here's the flow:

You profile the data locally. DAAF hands you a small, self-contained profiling script (it has no DAAF or container dependencies) that you run yourself, wherever the sensitive data actually lives -- your laptop, your enclave, your locked-down VM.
You review everything before sharing. The script produces a plain, human-readable summary. You read it and confirm you're comfortable with every number in it before anything leaves your environment. Nothing crosses the boundary until you say so.
DAAF builds a synthetic stand-in. You bring only that summary into the container. From the summary alone -- never the real data -- DAAF generates a synthetic dataset shaped like your real one (right columns, right types, plausible distributions) and writes a reusable reference for it.
You develop all your analysis code against the synthetic data. Write it, debug it, dry-run it -- the synthetic data behaves enough like the real thing to build a complete, working pipeline.
You finalize results against the real data. When the code is finished and vetted, you run it yourself against the real data, in its own secure environment, to get the actual numbers.

You choose how much the profiling step is allowed to measure, using a four-tier disclosure ladder -- pick the lowest tier that still lets you build your code:

Tier 1 (schema): only column names, data types, and the row count -- no values, no statistics.
Tier 2 (marginals, the default): per-column summaries such as category levels (small groups suppressed), numeric percentiles, and missingness rates -- but never raw minimums/maximums or example values.
Tier 3 (relationships): everything in Tier 2 plus how columns relate to each other (correlations, cross-tabs with small cells suppressed), so the synthetic data reproduces those relationships too.
Tier 4 (local high-fidelity synthesis): for the highest fidelity, you run a synthesizer locally that learns from the real data inside your environment, and only the resulting synthetic rows cross the boundary -- the real data and the fitted model both stay put.

⚠ One important caveat about what this protects. The synthetic-data workflow is a code-development scaffold, not an analytic substitute and not a formal privacy guarantee. The disclosure tiers are careful engineering that meaningfully limits how much leaves your machine, and lower tiers leave very little -- but they can't make the judgment call about whether a given summary is safe for your data under your rules. That judgment stays with you, which is why your review of the summary in step 2 is the real safeguard. Your data-governance, disclosure-review, and legal obligations remain your own to meet; the workflow reduces exposure, it doesn't adjudicate it.

What if you have enterprise-grade protections? If your organization already has stronger guarantees in place -- an Anthropic Enterprise agreement, AWS Bedrock or Google Vertex AI governance, or an institutional secure enclave with an approved model-access path -- you may be able to point DAAF at those environments and work with the data directly, no synthetic stand-in required. DAAF provides configuration templates for the AWS Bedrock and Google Vertex AI routes, but its maintainers haven't validated those routes end-to-end themselves, so treat them as a starting point you'll need to stand up and test in your own environment (and it's typically an involved setup that needs your IT department). The synthetic workflow is the recommended starting point because it works without any of that.

Session Logs and Diagnostics

Where logs live, how to view them visually, and how to use them for debugging.

Where are session logs stored?

Every time you use DAAF, it automatically saves a complete record of the session -- what Claude did, what files it created or modified, what code it ran, and what the results were. Think of these as detailed receipts for every session. They're saved in .claude/logs/sessions/ in multiple formats:

Format	File Pattern	Purpose
Markdown	`YYYY-MM-DD_HH-MM-SS_<session-id>_orchestrator.md`	A readable summary you can open in any text editor -- shows everything Claude did, step by step
JSONL	`YYYY-MM-DD_HH-MM-SS_<session-id>_orchestrator.jsonl`	A detailed data file for advanced debugging -- you generally won't need to open this directly
Subagent JSONL	`YYYY-MM-DD_HH-MM-SS_<session-id>_subagent_<agent-id>.jsonl`	Separate logs for each specialized AI agent that was called during the session

The orchestrator Markdown archive includes a Subagent Activity summary table listing each subagent's type, duration, tool uses, and final-message excerpt. Additionally, .claude/logs/activity.log records a timestamped entry every session start for a quick usage history overview. All logs are gitignored by default -- they stay local, never pushed to a repository.

Easiest way to view logs: Use the DAAF Control Panel -- run bash daaf.sh (or .\daaf.ps1 on Windows) from the daaf-docker folder and choose 3) View Session Logs (this runs view_logs.sh / .\view_logs.ps1 for you). It opens a visual timeline in your web browser that's much easier to navigate than reading the raw files.

Your logs stay completely private -- they're stored only on your computer and are never uploaded or shared automatically.

What happens to session logs if Claude Code crashes or I close the terminal unexpectedly?

Logs are preserved automatically. On the next session start, a background recovery scan archives any un-archived transcripts. You don't need to do anything -- DAAF handles this automatically.

How can I use session logs for debugging?

If something went wrong during a DAAF session -- unexpected results, an error you didn't understand, or behavior that seemed off -- session logs are your detective tools. The Markdown logs show exactly what the assistant did, in order -- every tool call, file read/write, subagent invocation, and output at each step.

DAAF includes the interactive DAAF Log Explorer, which renders session transcripts as a visual timeline in your web browser. The orchestrator's actions appear as a horizontal timeline bar, with subagent dispatches waterfalling downward. Click any block to see exactly what files were read, written, or executed -- with plain-language descriptions and clickable file references.

Quickest access: the DAAF Control Panel -- run bash daaf.sh (or .\daaf.ps1 on Windows) from your daaf-docker folder and choose 3) View Session Logs. To run the viewer directly instead (from your host machine, no container shell needed), use bash view_logs.sh (macOS/Linux) or .\view_logs.ps1 (Windows). Either way, this starts the container if needed, generates an activity manifest from all sessions, and starts a server. Open the printed URL in your browser. You can also run per-project log collection from inside the container using bash /daaf/scripts/collect_session_logs.sh and bash /daaf/scripts/generate_log_viewer.sh with your project path.

Note: The server requires port 2719 to be mapped in your docker-compose.yml. If you set up DAAF after this feature was added, it's already there. If not, add "127.0.0.1:2719:2719" under the ports: section and restart your container with docker compose down && docker compose up -d.

Alternatively, DAAF also processes every individual log transcript into a more intuitive markdown file showing the flow of the conversation alongside tool calling segments. You can find the relevant .md session log in .claude/logs/sessions/ (sorted by timestamp). The raw .jsonl file contains the complete raw transcript if deeper inspection is needed.

Are session logs shared or uploaded anywhere?

No -- your logs stay completely private. They are completely local, gitignored, and never uploaded. Only shared if you manually include excerpts in a bug report.

What's the difference between STATE.md and session logs?

Very different purposes:

Session logs are complete, raw transcripts of everything that happens in a Claude Code session. Automatically generated, stored in .claude/logs/, primarily useful for post-hoc debugging. Think of them as a security camera recording -- comprehensive but unfiltered. Browse visually using the DAAF Log Explorer rather than reading raw files.

STATE.md is a structured progress tracker that DAAF creates during full-pipeline analyses. It lives inside your project folder (research/[project]/STATE.md) and tracks the current analysis stage, passed checkpoints, decisions made, and next steps. It accumulates QA Findings Summaries, Final Review Logs, and Runtime Risks encountered during execution. (These are quality-check results, final-pass review notes, and any problems or limitations discovered along the way.)

STATE.md's primary purpose is enabling session recovery -- if a session runs out of context, you start a fresh session and STATE.md tells Claude exactly where to pick up. Like a bookmark with detailed notes.

What are the status bar and the agent panel showing me?

These are two at-a-glance displays Claude Code renders for you, both customized by DAAF.

The status bar runs along the bottom of your session. Reading across, it shows the active model (with its reasoning-effort level), the current working directory, the Git branch you're on, a context-usage meter for the session, and -- if you're on a Claude subscription -- your rate-limit windows (how much of your rolling 5-hour and 7-day usage allowance you've used up). That rate-limit readout is specific to Claude subscription plans for now; on API-key or other-provider sessions those segments simply don't appear.

The agent panel appears whenever DAAF has dispatched specialists to work in the background. It gives you one row per running specialist, and each row reports that specialist's type, its model, its current status, its token count, and its own context meter -- so you can watch several agents make progress at once.

The one thing worth glancing at during a long session is the context meter -- it tells you how full Claude's working memory is getting. You don't have to police it yourself, though: DAAF watches the same number and will pause at a safe stopping point (and hand you a restart prompt) before it ever runs high enough to hurt quality. See "Context utilization CRITICAL" in the Common Errors section for what happens when it does fill up.

Packages and Environment

Installing additional Python and R packages and managing the software environment inside DAAF's container.

How do I install additional Python or R packages?

The durable answer for either language is the same: add the package to DAAF's blueprint file (the Dockerfile) and rebuild. That keeps your environment reproducible, which is the whole point -- a package installed on the fly would vanish on the next rebuild and quietly break a later re-run. To keep your research reproducible, DAAF blocks its agents from installing packages at runtime in both Python and R, so you can't ask Claude to pip install or install.packages() for you (see Will packages installed at runtime persist? below for who can run an install and why it won't stick).

Permanent (recommended), for Python or R:

Open the file called Dockerfile in your daaf-docker folder using any text editor.
Find the user additions block near the end of the file (it's clearly labeled). Adding your package here is the fast path -- because nothing else depends on it, the rebuild only has to install that one package, usually seconds to a couple of minutes. Add Python packages with a uv pip install line and R packages with an install.packages() line, following the examples already in that block.
Save the file.
Rebuild DAAF: the easiest way is the DAAF Control Panel (bash daaf.sh / .\daaf.ps1) and its rebuild option, or run bash rebuild_daaf.sh (.\rebuild_daaf.ps1 on Windows) from the daaf-docker folder.

Your research files are unaffected -- only the environment is rebuilt. The easiest route of all is to ask DAAF to make the edit for you: "I need the networkx package permanently -- can you add it to the Dockerfile?" DAAF can edit the Dockerfile (that's a reviewable file change, which is allowed) and walk you through the rebuild; it just can't run the install itself.

R note: for a quick, session-only try, you can run install.packages("pkgname") yourself via a !-prefixed command in the Claude Code prompt or a host terminal (both bypass the agent guardrails) -- but it's a throwaway that disappears on the next rebuild. For anything you want to keep, use the Dockerfile. For full step-by-step guidance for both languages, see Extending DAAF ↗.

Can I use apt-get or sudo inside the container?

No, and this is by design. For security, DAAF intentionally runs with limited permissions -- the AI cannot install system-level software, gain administrator access, or make changes outside the DAAF workspace. This protects your computer.

If you need a system-level library (something beyond a Python or R package), add it to the Dockerfile before building. Look for the section with apt-get install and add your library name there. Then rebuild the container -- the easiest way is the DAAF Control Panel rebuild option (or bash rebuild_daaf.sh / .\rebuild_daaf.ps1 from the daaf-docker folder).

This is an advanced operation -- if you're unsure, ask DAAF: "I need [library name] installed at the system level. Can you help me modify the Dockerfile?"

Will packages installed at runtime persist across restarts?

No -- runtime installs are always temporary. First, a note on who can even run one: DAAF's agents are blocked from installing packages at runtime, so you can't ask Claude to do it. You can still run an install yourself -- type it as a !-prefixed command in the Claude Code prompt, or run it from a host terminal (both bypass the agent guardrails). But even then it's ephemeral: think of it like writing on a whiteboard -- it works right now, but gets erased when the container is next rebuilt or recreated, because runtime-installed packages live in the container's own filesystem, separate from where your research data is stored.

To make a package permanent (so it's always available and your analysis stays reproducible), add it to the Dockerfile and rebuild. See How do I install additional Python or R packages? above for step-by-step instructions.

Important distinction: Your research files (data, scripts, notebooks, reports) live in a separate storage volume and do persist across restarts and rebuilds. Only the software environment resets on rebuild -- your work is safe.

What package manager does DAAF use?

For Python, DAAF uses uv -- a modern package installer that works the same way as the standard pip tool but runs much faster. In the Dockerfile, packages are installed with uv pip install; that's the reproducible, rebuild-durable path where packages should go. For R, DAAF installs from Posit Package Manager (P3M) using date-pinned snapshots, so R packages are also installed from a consistent, versioned source.

Remember that DAAF's agents are blocked from runtime installs in both languages, so you can't ask DAAF to run uv pip install or install.packages() for you. If you want an ad-hoc, throwaway install for quick testing, run it yourself via a !-prefixed command or a host terminal (uv pip install --user packagename for Python, install.packages("pkgname") for R) -- but such installs are ephemeral and vanish on the next rebuild (see Will packages installed at runtime persist? above). Anything you want to keep belongs in the Dockerfile.

Technology Choices

Why DAAF uses the specific tools it does -- the reasoning behind each major technology decision.

Why Polars instead of Pandas?

Polars outperforms Pandas on the dimensions that matter most for DAAF's use case: performance on large datasets, lazy evaluation that lets you build a query plan before executing it, strong type safety that catches errors early, and an expression-based API that's more readable in code review.

For the kinds of datasets DAAF users typically work with -- often hundreds of thousands to millions of rows with complex joins and aggregations -- Polars' Rust-based engine provides meaningfully faster execution without requiring the user to think about optimization. The lazy evaluation model also makes it easier for DAAF to construct efficient query pipelines, since operations can be planned and optimized as a batch rather than executed line-by-line.

That said, Pandas is also installed in the container. If you have existing Pandas code or a strong Pandas preference, DAAF can work with it -- you'll just miss some of the performance and type-safety benefits that come with Polars by default.

Why Marimo instead of Jupyter?

Marimo notebooks are reactive -- when you change a cell, all dependent cells update automatically. This eliminates the hidden-state problems that plague Jupyter notebooks, where cells can hold stale values depending on execution order. For reproducibility, this is critical: a Marimo notebook always reflects the current state of the code.

Marimo notebooks are also stored as plain .py files rather than JSON, which makes them Git-friendly -- diffs are readable, merges are manageable, and version control works naturally. In Jupyter, notebook diffs are nearly impossible to review because the JSON format mixes code, outputs, and metadata.

The combination of reactive execution, Git-friendly format, and no hidden state makes Marimo a much better fit for DAAF's emphasis on reproducibility and auditability.

Why Docker instead of a virtual environment?

Docker provides three things that virtual environments cannot: true reproducibility (the exact same OS, libraries, and system dependencies on every machine), security isolation (Claude Code runs inside a container with dropped privileges and no access to your personal files), and consistent system dependency management (libraries that need C compilers, GDAL, or other system packages just work without per-platform debugging).

A virtual environment handles Python packages but not system-level dependencies, and it offers no security isolation at all -- a coding agent with a virtual environment has full access to your filesystem, credentials, and network. Docker's container boundary is what makes it safe to let an AI agent write and execute code on your machine.

The tradeoff is complexity: Docker Desktop is an additional install step, and the container model is unfamiliar to many researchers. DAAF's installer and helper scripts are designed to minimize this friction -- but if you're curious about what's happening behind the scenes, the Dockerfile in the repository is fully readable and you can ask your favorite LLM to help you interpret it.

Why Parquet for all data files?

Parquet preserves column types exactly -- integers stay integers, dates stay dates, and categories stay categories. CSV files lose all type information, which means every time you reload a CSV, your analysis tool has to guess what each column is. These guesses are frequently wrong (ZIP codes become numbers, date columns become strings), and silent type coercion is one of the most common sources of data analysis errors.

Parquet files are also compressed by default (typically 3-10x smaller than equivalent CSVs) and support columnar access, meaning you can read just the columns you need without loading the entire file into memory. For large education datasets, this translates to meaningfully faster load times and lower memory usage.

There's no CSV encoding ambiguity either -- no debates about delimiters, quote characters, or text encoding. A Parquet file is a Parquet file, and it reads the same way everywhere.

Why are scripts the primary artifact instead of notebooks?

DAAF produces Python scripts (.py files) as the primary analysis artifact because scripts provide a complete, sequential audit trail. Every line executes in order, top to bottom -- there's no ambiguity about execution sequence, no hidden state from out-of-order cell execution, and no risk of notebook cells holding stale values.

Scripts also version-control cleanly: git diff shows exactly what changed between versions, making it straightforward to review DAAF's revisions. And because scripts are immutable execution records (each version is saved separately, never overwritten), you always have a complete history of how the analysis evolved.

DAAF does generate Marimo notebooks as well -- but these serve as interactive exploration tools for reviewing results, not as the canonical record of the analysis. The script is the source of truth; the notebook is a lens for examining it.

We've got you covered

Terminal

Docker / Docker Desktop

Container

Volume

API Key

Environment Variable

Port

Claude Code

DAAF Control Panel

Installation Troubleshooting

Setup and Settings

Common Errors

Performance

Data Access

Session Logs and Diagnostics

Packages and Environment

Technology Choices

Report issues and get help

GitHub Issues

GitHub Discussions

Other ways to reach us