Sale!

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Name: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
SKU: 62727
Availability: InStock

Add your review

Technology & Coding Books

Original price was: $59.99.Current price is: $37.00.

Tags: Applications, Big, DataIntensive, Designing, Ideas, Maintainable, Reliable, Scalable, Systems

Original price was: $59.99.Current price is: $37.00.

Sale!

Buy on Amazon

Note: Prices may fluctuate as sellers adjust them regularly. You'll see the latest price at final checkout.

Add to wishlistAdded to wishlistRemoved from wishlist 0

Add to compare

Description
Reviews (8)

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?

In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.

Peer under the hood of the systems you already use, and learn how to use and operate them more effectivelyMake informed decisions by identifying the strengths and weaknesses of different toolsNavigate the trade-offs around consistency, scalability, fault tolerance, and complexityUnderstand the distributed systems research upon which modern databases are builtPeek behind the scenes of major online services, and learn from their architectures

From the brand

oreilly

Databases, data science & more

Data Science

Data Visualization

Databases

Streaming

Sharing the knowledge of experts

O’Reilly’s mission is to change the world by sharing the knowledge of innovators. For over 40 years, we’ve inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.

Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.

Publisher ‏ : ‎ O’Reilly Media
Publication date ‏ : ‎ May 2, 2017
Edition ‏ : ‎ 1st
Language ‏ : ‎ English
Print length ‏ : ‎ 614 pages
ISBN-10 ‏ : ‎ 1449373321
ISBN-13 ‏ : ‎ 978-1449373320
Item Weight ‏ : ‎ 2.1 pounds
Dimensions ‏ : ‎ 6.9 x 1.2 x 9.1 inches
Best Sellers Rank: #5,693 in Books (See Top 100 in Books) #1 in Data Modeling & Design (Books) #1 in MySQL Guides #2 in Computer Software (Books)
Customer Reviews: 4.8 4.8 out of 5 stars (5,403) var dpAcrHasRegisteredArcLinkClickAction; P.when(‘A’, ‘ready’).execute(function(A) { if (dpAcrHasRegisteredArcLinkClickAction !== true) { dpAcrHasRegisteredArcLinkClickAction = true; A.declarative( ‘acrLink-click-metrics’, ‘click’, { “allowLinkDefault”: true }, function (event) { if (window.ue) { ue.count(“acrLinkClickCount”, (ue.count(“acrLinkClickCount”) || 0) + 1); } } ); } }); P.when(‘A’, ‘cf’).execute(function(A) { A.declarative(‘acrStarsLink-click-metrics’, ‘click’, { “allowLinkDefault” : true }, function(event){ if(window.ue) { ue.count(“acrStarsLinkWithPopoverClickCount”, (ue.count(“acrStarsLinkWithPopoverClickCount”) || 0) + 1); } }); });

8 reviews for Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

0.0 out of 5

★★★★★

Write a review

Show all Most Helpful Highest Rating Lowest Rating

Roshan Patel – December 8, 2025

A practical introduction to distributed systems
In today’s world, many of us have been tasked with building reliable, scalable services. Yet, more often than not, we rely on existing abstractions without fully understanding how the underlying systems work. Need a scalable database? Use MongoDB. Need a streaming service? Kafka is your go-to. While these tools get the job done, they often serve as crutches that prevent us from delving into the complexities of distributed systems. This book, Designing Data-Intensive Applications, is an eye-opener for anyone who has ever wondered about the internals of these services. It takes a deep dive into key concepts like consistency, exploring the critical differences between strong and weak consistency, and the trade-offs that come with each approach. For example, when a master node fails, how does a new master get elected? The book explains this process in depth, shedding light on the mechanics of fault tolerance. The book also provides clarity on how databases store and retrieve data efficiently. If you’ve ever come across PostgreSQL’s documentation and wondered, “What exactly is a B-tree?”, this book will make it crystal clear.It also goes into the common gotchas when working with transactions. You might think that using transactions makes you safe from concurrency issues, but that’s not always the case. The book explains why this happens and offers practical advice on how to avoid race conditions.What this book isn’t: If you’re a practitioner building distributed systems from scratch and looking for in-depth explanations of algorithms like Raft, Paxos, or other low-level details, this book might not be what you’re looking for. It serves more as a high-level introduction to distributed systems rather than a deep dive into the specifics of consensus algorithms. For those looking for more detailed, foundational material on distributed systems, I’d recommend checking out Tanenbaum’s Distributed Systems.

Helpful(0) Unhelpful(0)You have already voted this
Joey – December 8, 2025

Essential reading for anyone working on distributed systems in any capacity
Designing Data-Intensive Applications really exceeded my expectations. Even if you are experienced in this area this book will re-enforce things you know (or sort of know) and bring to light new ways of thinking about solving distributed systems and data problems. It will give you a solid understanding of how to choose the right tech for different use cases.The book really pulls you in with an intro that is more high level, but mentions problems and solutions that really anyone who has worked on these types of applications have either encountered or heard mention of. The promise it makes is to take these issues such as scalability, maintainability and durability and explain how to decide on the right solutions to these issues for the problems you are solving. It does an amazing job of that throughout the book.This book covers a lot, but at the same time it knows exactly when to go deep on a subject. Right when it seems like it may be going too deep on things like how different types of databases are implemented (SSTables, B-trees, etc.) or on comparing different consensus algorithms, it is quick to point out how and why those things are important to practical real-world problems and how understanding those things is actually vital to the success of a system.Along those same lines it is excellent at circling back to concepts introduced at prior points in the book. For example the book goes into how log based storage is used for some databases as their core way of storing data and for durability in other cases. Later in the book when getting into different message/eventing systems such as Kafka and ActiveMQ things swing back to how these systems utilize log based storage in similar ways. Even if you have prior knowledge or even have worked with these technologies, how and why they work and the pros and cons of each become crystal clear and really solidified. Same can be said of it’s great explanations of things like ZooKeeper and why specific solutions like Kafka make use of it.This book is also amazing at shedding light on the fact that so little of what is out there is totally new, it attempts to go back as far as it can at times on where a certain technology’s ideas originated (back to the 1800s at some points!). Bringing in this history really gives a lot of context around the original problems that were being solved, which in turn helps understanding pros and cons. One example is the way it goes through the history of batch processing systems and HDFS. The author starts with MapReduce and relating it to tech that was developed decades before. This really clarifies how we got from batch processing systems on proprietary hardware to things like MapReduce on commodity hardware thanks in part to HDFS, eventually to stream based processing. It also does great at explaining the pros and cons of each and when one might choose one technology over the other.That’s really the theme of this book, teaching the reader how to compare and contrast different technologies for solving distributed systems and data problems. It teaches you to read between the lines on how certain technologies work so that you can identify the pros and cons early and without needing them to be spelled out by the authors of those technologies. When thinking about databases it teaches you to really consider the durability/scalability model and how things are no where near black and white between “consistent” vs “eventually consistent”, these is a ton of nuance there and it goes deep on things like single vs multi leader vs leaderless, linearizability, total order broadcast, and different consensus algorithms.I could go on forever about this book. To name a few other things it touches on to get a good idea of the breadth here: networking (and networking faults), OLAP, OLTP, 2 phase locking, graph databases, 2 phase commit, data encoding, general fault tolerance, compatibility, message passing, everything I mentioned above, and the list goes on and on and on. I recommend anyone who does any kind of work with these systems takes the time to read this book. All 600ish pages are worth reading, and it’s presented in an excellent, engaging way with real world practical examples for everything.

Helpful(0) Unhelpful(0)You have already voted this
Erez – December 8, 2025

Insightful and Well-Structured – Best for Readers with Some Background
This book dives deep into its subject with clear structure and thoughtful explanations. The concepts are well-articulated and build on each other logically. However, to truly appreciate the depth and get the most out of it, I recommend reading it with some prior experience or familiarity with the topic. Overall, a highly valuable and rewarding read.

Helpful(0) Unhelpful(0)You have already voted this
Nikola Zifra – December 8, 2025

This book provides a high level overview but unfortunatly lacks quite a bit of detail

Helpful(0) Unhelpful(0)You have already voted this
Joachim O. – December 8, 2025

This book covers pretty much all topics which are relevant to managing databases or designing data models in more than 800 pages. It also provides detailed information about the inner workings of databases to the degree that you might be able to implement your own simple database.The book is very well didactically structured which is no surprise given that the author is a professor at Cambridge. For example, it explains batch processing algorithms (e.g. Map Reduce) and uses this as basis to delve into data streaming. Strong emphasis is laid on the problems with regards to distributed computing (replication, partitioning, node failures, etc.) and the discussion of the compromises one must make.Overall, an easy recommendation for anyone is interested in data architectures and the inner workings of databases which are the backbone of pretty much any application in today’s world.

Helpful(0) Unhelpful(0)You have already voted this
Andrea – December 8, 2025

Se siete IT appassionati del vostro lavoro e volete capire cosa c’è sotto le cose che usate quotidianamente, questo è un libro da non perdere. Non è un manuale, non è una guida né un tutorial, ma fa fede al sottotitolo: è un “viaggio” nello scibile sulla gestione computerizzata di dati, che aiuta a comprendere al di là del marketing gli strumenti che abbiamo a disposizione.Il libro è densissimo (come dimostra un bell’indice analitico di 30 pagine su un totale di quasi 600), ricco di riferimenti (come dimostrano le folte bibliografie al termine di ogni capitolo, per lo più risorse online) ed è evidente il background accademico dell’autore. E’ un libro che richiede tempo nella lettura e comprensione – se non si saltano i dettagli, si intende… ma nel caso lasciate perdere.Una buona metà del libro riguarda la modifica concorrente di dati e i sistemi distribuiti, la parte più terrificante e affascinante, dove vengono minuziosamente spiegati i problemi che presentano e gli algoritmi che li risolvono (ad esclusione dei problemi “bizantini”). Ho trovato …”confortante” l’analisi dell’acronimo ACID :)Chiude con un’analisi di ciò che l’autore si aspetta per il futuro; molto interessante il concetto di “unbundling” dei database.

Helpful(0) Unhelpful(0)You have already voted this
Mishan Janitha – December 8, 2025

Recommend book for software Engineers

Helpful(0) Unhelpful(0)You have already voted this
Amine – December 8, 2025

Un excellent bouquin, traite plusieurs problématiques de manière critique. Pas de solution miracle, on aura toujours des compromis. Une recommendation facile pour n’importe qui s’interesserait aux systèmes distribués.

Helpful(0) Unhelpful(0)You have already voted this