Best Practices for Apache Airflow® at Scale*

Most data workflows don’t fail in analysis. They fail in production. This free guide looks at how teams design, scale, and monitor Airflow pipelines so models, reports, and features keep running reliably instead of quietly breaking downstream.
*Message from this week's sponsor, Astronomer

New, opt-in editions from Data Elixir

In addition to Data Elixir’s regular weekly edition, we’re experimenting with community-curated, focused editions built from Bluesky posts. These surface high-signal links, with direct access to the surrounding context so you can read, follow, and join in when you want.

We’re starting with two editions:

Data Practice — Everyday data work, including SQL, statistics, visualization, R/Python, and tools. Published Wednesdays.
Learning Machines — Machine learning and AI, including tutorials, research, tools, and applications. Published Thursdays.

Opt in for free.

Posts & Tutorials

Positron: My Extensions and Settings

Here’s a practical post for optimizing your Positron (or VS Code) setup. Real settings, real extensions, and small UI tweaks that genuinely improve focus, readability, and everyday data science work, all easy to copy and paste into your own settings.
Emil Hvitfeldt

Turning Wearable Data into a Personal ML Early-Warning System

What happens when you give an LLM nearly a decade of your own health data? This thoughtful case study walks through labeling messy real-world data, engineering personal-baseline features, and why a small gradient-boosted model beat more complex approaches for early detection.
Ian Rowan

Regression

This is less about regression theory and more about doing regression well in R. It's a step-by-step example that emphasizes interpretation, model comparison, and modern tooling; ideal for instructors, analysts, and anyone refining their everyday modeling workflow.
Andrew Heiss

➡️ Reach 50,000+ Data Elixir readers. Click here for details 👉

Tools & Code

What's new in pandas 3

pandas 3.0 just landed and it finally fixes some long-standing pain points. This post explains what’s new, what improved, and what didn’t, including safer dataframe mutation, better pipelines, and where pandas still trails Polars.
Marc Garcia

Large Language Model tools for R

This site highlights the R packages that turn LLMs into real tools: structured prompts, model routing, IDE assistants, local models for privacy, plus agent and MCP plumbing so models can actually use your R session.
Luis D. Verde Arregoitia

Resources

Deep Learning with Python, Third Edition

Deep learning isn’t “new” anymore, but it’s been changing fast. This newly updated, free-to-read edition walks from first principles to transformers and diffusion, with runnable code and minimal math. A great reset for people who work with data and want a clear, hands-on mental model of modern deep learning.
François Chollet and Matthew Watson

Causal Inference - Uncovering cause and effect from data

Ever wonder whether your model is answering a causal question or just correlating stuff? This graduate-level course walks through counterfactuals, identification, and classic causal designs, with problem sets in R. Lectures and problem sets will be posted on this site throughout the semester.
University of Wisconsin-Madison | Anton Strezhnev

Outlier

A Diary of a Data Engineer

Why does every “small change” in data turn into a week-long project? This post argues that we’ve been solving the same data problems for decades, just with better marketing. This is a sharp, experience-driven take on fundamentals, modeling, and why the loop never really changes.
Simon Späti

Last Issue’s Top Links

Introducing beginners to the mechanics of machine learning - Miriam Posner
Claude Code and What Comes Next - Ethan Mollick
Using Trapezoids to Visualize Matrix Multiplication - Max Watson

This week's issue sponsored by:

Data Elixir - Issue 558