Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines.
Summary
A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task.
About the book
Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline’s needs.
What’s inside
Build, test, and deploy Airflow pipelines as DAGs
Automate moving and transforming data
Analyze historical datasets using backfilling
Develop custom components
Set up Airflow in production environments
About the reader
For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills.
About the author
Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies. Bas is also an Airflow committer.
Table of Contents
PART 1 – GETTING STARTED
1 Meet Apache Airflow
2 Anatomy of an Airflow DAG
3 Scheduling in Airflow
4 Templating tasks using the Airflow context
5 Defining dependencies between tasks
PART 2 – BEYOND THE BASICS
6 Triggering workflows
7 Communicating with external systems
8 Building custom components
9 Testing
10 Running tasks in containers
PART 3 – AIRFLOW IN PRACTICE
11 Best practices
12 Operating Airflow in production
13 Securing Airflow
14 Project: Finding the fastest way to get around NYC
PART 4 – IN THE CLOUDS
15 Airflow in the clouds
16 Airflow on AWS
17 Airflow on Azure
18 Airflow in GCP
Publisher : Manning Publications
Publication date : April 27, 2021
Language : English
Print length : 480 pages
ISBN-10 : 1617296902
ISBN-13 : 978-1617296901
Item Weight : 1.8 pounds
Dimensions : 7.38 x 1.2 x 9.25 inches
Best Sellers Rank: #1,287,970 in Books (See Top 100 in Books) #367 in Cloud Computing (Books) #520 in Python Programming #2,536 in Software Design, Testing & Engineering (Books)
Customer Reviews: 4.5 4.5 out of 5 stars (72) var dpAcrHasRegisteredArcLinkClickAction; P.when(‘A’, ‘ready’).execute(function(A) { if (dpAcrHasRegisteredArcLinkClickAction !== true) { dpAcrHasRegisteredArcLinkClickAction = true; A.declarative( ‘acrLink-click-metrics’, ‘click’, { “allowLinkDefault”: true }, function (event) { if (window.ue) { ue.count(“acrLinkClickCount”, (ue.count(“acrLinkClickCount”) || 0) + 1); } } ); } }); P.when(‘A’, ‘cf’).execute(function(A) { A.declarative(‘acrStarsLink-click-metrics’, ‘click’, { “allowLinkDefault” : true }, function(event){ if(window.ue) { ue.count(“acrStarsLinkWithPopoverClickCount”, (ue.count(“acrStarsLinkWithPopoverClickCount”) || 0) + 1); } }); });
11 reviews for Data Pipelines with Apache Airflow
Add a review
Original price was: $49.99.$42.64Current price is: $42.64.


Evan Volgas –
An excellent resource for learning and using Airflow
This book is great. It builds up piece by piece and explains what is going on every step of the way. It shows you best practices and goes into great detail on relatively advanced topics, in addition to covering all the basics. The code examples can easily be adapted for your use case and are very well documented and explained.I wish I had this book when I started using Airflow. I had used it for 2 years in production prior to reading this and only the first five chapters were already known to me. There is a lot of great material here for both new comers and knowledgable practitioners alike. I can’t recommend it highly enough.
Gino –
To the Point
This is the type of book where you can read the first two chapters and be good to go for fundamentals. The rest of the book is basically building up on what you learned. Such great instruction packed into a few papers. Probably one of the better written manuals for a framework/work-flow tool I’ve read so far… and I’ve read many this past year alone. Go on and get it.
Ronald –
A really useful book that goes beyond simple Airflow use
I like the fact that it infuses best practices for pipeline management apart from just using AA as a tool for implementing pipelines.I actually plan on re-reading portions of it again, apart from wanting to reference it for airflow-specific questions.
Daniel V. –
A well written and thorough book on Airflow
A great book on Airflow, how operate it, configure it, interface with 3rd party systems (particularly cloud or db related). I particularly liked the emphasis on some counter-intuitive features to prevent beginners from wasting time on figuring a couple of tweaks for themselves.
Chris Novitsky –
Great book
I’ve read a lot of CS books, this is in the top 5. It’s well written and full of domain knowledge.
CWC_NY –
A great guide to Airflow
This is a great guide to Airflow, covering the basics and advanced topics such as how to test dags and running tasks in containers. Highly recommended!
James L. Warfield –
Where is security addressed? Oh, yeah, page 322…
From a practitioner: There are many great things here, but… Security is not addressed until page 322. This is indicative of our data engineering culture, not just this book. Security should be the third thing covered after Extract and Load (we can wait on Transform until we’ve secured the data).
Chelsea Tower –
Great book
Absolutely great book. Airflow documentation on the internet can be fragmented and often overly abstracted. This book covers everything an aspiring dev needs to know using realistic (or at least represenative) examples. Thumbsup²
AQ –
Just finished reading chapter 7 and I am very impressed with the amount of details and the explanations provided. Even though this book was meant to explain Airflow, the authors went above and beyond to explain key concepts on how a data pipeline might look like beyond the orchestration power of Airflow. I feel very grateful to come across this book and will always use it as a reference.
bLEDuj –
AirFlowの基本的な操作、コーディング方法について記載されています。参考資料としては良いのかもしれませんが、即業務適用できるレベルのTipsは記載されていません。英語版を読み解ける方であれば、リファレンスを当たった方が有益です。
Shanwow –
The book was an easy read and provided better explanations and examples compared to the documentation.