Site Reliability Engineering (SRE) Books

Douglas Hawthorne

Douglas Hawthorne @dfhawthorne1

4 books  

This collection covers books in the field of Site Reliability Engineering (SRE) which is the theory and practice of increasing the reliability and resiliency of deployed computer systems. SRE originated with Google.

Database Reliability Engineering [Book] Google Books
author: Laine Campbell / Charity Majors "O'Reilly Media, Inc." 2017 - 10
The infrastructure-as-code revolution in IT is also affecting database administration. With this practical book, developers, system administrators, and junior to mid-level DBAs will learn how the modern practice of site reliability engineering applies to the craft of database architecture and operations. Authors Laine Campbell and Charity Majors provide a framework for professionals looking to join the ranks of today’s database reliability engineers (DBRE).You’ll begin by exploring core operational concepts that DBREs need to master. Then you’ll examine a wide range of database persistence options, including how to implement key technologies to provide resilient, scalable, and performant data storage and retrieval. With a firm foundation in database reliability engineering, you’ll be ready to dive into the architecture and operations of any modern database.This book covers:Service-level requirements and risk managementBuilding and evolving an architecture for operational visibilityInfrastructure engineering and infrastructure managementHow to facilitate the release management processData storage, indexing, and replicationIdentifying datastore characteristics and best use casesDatastore architectural components and data-driven architectures
Seeking SRE [Book] Google Books
author: David N. Blank-Edelman "O'Reilly Media, Inc." 2018 - 08
Organizations big and small have started to realize just how crucial system and application reliability is to their business. Theyâ??ve also learned just how difficult it is to maintain that reliability while iterating at the speed demanded by the marketplace. Site Reliability Engineering (SRE) is a proven approach to this challenge.SRE is a large and rich topic to discuss. Google led the way with Site Reliability Engineering, the wildly successful Oâ??Reilly book that described Googleâ??s creation of the discipline and the implementation thatâ??s allowed them to operate at a planetary scale. Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now.Listen as engineers and other leaders in the field discuss:Different ways of implementing SRE and SRE principles in a wide variety of settingsHow SRE relates to other approaches such as DevOpsSpecialties on the cutting edge that will soon be commonplace in SREBest practices and technologies that make practicing SRE easierThe important but rarely explored human side of SREDavid N. Blank-Edelman is the bookâ??s curator and editor.
Site Reliability Engineering [Book] Google Books
author: Niall Richard Murphy / Betsy Beyer "O'Reilly Media, Inc." 2016 - 03
The overwhelming majority of a software systemâ??s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?In this collection of essays and articles, key members of Googleâ??s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. Youâ??ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficientâ??lessons directly applicable to your organization.This book is divided into four sections:Introductionâ??Learn what site reliability engineering is and why it differs from conventional IT industry practicesPrinciplesâ??Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)Practicesâ??Understand the theory and practice of an SREâ??s day-to-day work: building and operating large distributed computing systemsManagementâ??Explore Google's best practices for training, communication, and meetings that your organization can use
The Site Reliability Workbook [Book] Google Books
author: Betsy Beyer / Niall Richard Murphy "O'Reilly Media, Inc." 2018 - 07
In 2016, Googleâ??s Site Reliability Engineering book ignited an industry discussion on what it means to run production services todayâ??and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment.This new workbook not only combines practical examples from Googleâ??s experiences, but also provides case studies from Googleâ??s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didnâ??t.Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is.Youâ??ll learn:How to run reliable services in environments you donâ??t completely controlâ??like cloudPractical applications of how to create, monitor, and run your services via Service Level ObjectivesHow to convert existing ops teams to SREâ??including how to dig out of operational overloadMethods for starting SRE from either greenfield or brownfield
Created date: July 31, 2024