Site Reliability Engineering (SRE)
SRE
- wikipedia: Site Reliability Engineering
- sre.google: What is Site Reliability Engineering (SRE)? 🌟
- cloud.google.com: SRE vs. DevOps: competing standards or close friends?
- overops.com: DevOps vs. SRE: What’s the Difference Between Them, and Which One Are You?
- victorops.com: SRE vs. DevOps
- devops.com: SRE vs. DevOps — a False Distinction?
- devops.com: SRE vs. DevOps vs. Cloud Native: The Server Cage Match
- devops.com: Site Reliability Engineering 101: DevOps Versus SRE
- bmc.com: SRE vs DevOps: What’s The Difference?
- dzone: SRE vs. DevOps: SRE Is to DevOps What Scrum Is to Agile
- linkedin: DevOps vs Site Reliability Engineering
- Google: What is Site Reliability Engineering (SRE)? SRE is what you get when you treat operations as if it’s a software problem. Our mission is to protect, provide for, and progress the software and systems behind all of Google’s public services — Google Search, Ads, Gmail, Android, YouTube, and App Engine, to name just a few — with an ever-watchful eye on their availability, latency, performance, and capacity.
- opensource.com: What is an SRE and how does it relate to DevOps? The SRE role is common in large enterprises, but smaller businesses need it, too.
- thenewstack.io: Where Site Reliability Engineering Overlaps with DevOps
- openshift.com: From Ops to SRE - Evolution of the OpenShift Dedicated Team
- cncf.io: DevOps vs. SRE
- kelda.io: Why SREs Should be Responsible for Development Environments
- youtube: Platform9’s Madhura Maskasky says observability is also essential for diagnosing and debugging in order for SREs to “get to the root cause quickly enough so that you can feed that back to the development teams.” 🌟 Debugging remains complex. Debugging in “a world of microservices is a very difficult task,” requiring the identification of which specific part in a microservices deployment must be fixed, says Platform9’s Madhura Maskasky.
- “What’s happening to the system administrator or is the system administrator becoming an SRE? Are they going into different roles? Are they taking multiple roles? How do they play a part in ensuring that reliability in these new roles?”
- “The way Google defines SRE is that an SRE by nature needs to be someone who develops or writes code 50% of the time and only remaining 50% of the time they do the traditional ops/operations and this is because they want to do more through automation as part of the role of the requirements of the SRE himself, so that you can run apps that can serve billions of requests but that are still handled by a few dozens of SREs.”
- “Suddenly the role for SRE gets democratized and distributed among different roles (developers included)”.
- “Debugging remains complex. Debugging in “a world of microservices is a very difficult task”, requiring the identification of which specific part in a microservices deployment must be fixed”
- Observability is also essential for diagnosing and debugging in order for SREs to “get to the root cause quickly enough so that you can feed that back to the development teams.”
- linkedin.com: SRE: Key Insights-“Done the right way”
- hernan-david-hd.medium.com: 5 pilares del SRE/DevOps
- hernan-david-hd.medium.com: Breaking down SRE/DevOps into 5 key areas
- devops.com: How the SRE Role Is Evolving
- itprotoday.com: Why Site Reliability Engineering Is Key to Modern DevOps Among the hottest areas of growth in DevOps is the emerging field of site reliability engineering as organizations look to bake reliability into the earliest stages of the software development cycle.
- stackpulse.com: Managing Reliability for Monoliths vs. Microservices: The Challenges for SREs
- stackpulse.com: Managing Reliability for Monoliths vs. Microservices: Best Practices for SREs
- cloud.google.com: SRE at Google: Our complete list of CRE life lessons 🌟
- circonus.com: Monitoring for Success: What All SREs Need to Know
- infracloud.io: Site Reliability Engineering (SRE) Best Practices
- stackpulse.com: No, SRE Is Not the New DevOps – Unless It Is
- youtube: Viktor Farcic - What is the difference between SRE and DevOps?
- dzone: Remote server management - Common architectural elements
- dzone: Upcoming Trends in DevOps and SRE in 2021 🌟 DevOps and SRE are domains with rapid growth and frequent innovations. With this blog you can explore the latest trends in DevOps, SRE and stay ahead of the curve. The following trends are most likely to have a lasting impact in the field of DevOps and SRE:
- AIOps and Self-Healing Platforms
- Service Meshes
- Low-code DevOps
- GitOps
- DevSecOps
- dzone: SRE vs. DevOps: What are the Differences? SRE and DevOps are closely related concepts with some important distinctions between both, and many businesses can benefit from embracing both of them.
- dev.to: DevOps vs SRE: What’s The Difference?
-
thenewstack.io: How the SRE Experience Is Changing with Cloud Native 🌟 From Firefighting to Prevention for SREs. Empower Developers with Self-Service. Facilitate Developer Autonomy
Site Reliability Engineer (SRE) team Developers Operations team Provide and teach effective use of platform tooling to empower developers to be self-sufficient Treat SREs as application operation partners, not only as first responders to incidents Provide self-service platform deployment and observability, and enable visibility into ramifications of actions Document clear escalation paths for developers struggling in production Turn to ops teams for the “paved path” or centralized developer control plane Provide opinionated “paved path” platform or developer control plane (DCP), but allow developers to swap platform components if they also want to be accountable - dev.to: What You Need to Break into DevOps and SRE
- infoq.com: Observing and Understanding Failures: SRE Apprentices
- medium: Agile vs. DevOps vs. SRE… it’s not OR, it’s AND !
- thenewstack.io: Google SRE: Site Reliability Engineering at a Global Scale
- sre.google: sre-book - The Evolving SRE Engagement Model
- blogs.letusdevops.com: How much programming should I know for DevOps/SRE domain. And YES, you need to learn programming.
- devops.com: Day in the Life of a Site Reliability Engineer (SRE)
- devops.com: Top Nine Skills for SREs to Master 🌟
- devops.com: How SREs Benefit From Feature Flags
- toolbox.com: Site Reliability Engineering: What Is It and How Can It Help Scale Operations? 🌟 Site Reliability Engineering (SRE) is an essential task that bridges the gap between developers and operations. Here’s how organizations can refine it further by leveraging automation.
- devops.com: SRE Vs. Platform Engineering: What’s the Difference?
- cncf.io: DevOps vs. SRE vs. Platform Engineering? The gaps might be smaller than you think
- dzone.com: DevOps vs. SRE vs. Platform Engineer vs. Cloud Engineer; Substance or Semantics?
- phoenixnap.com: SRE Vs. DevOps: Differences Explained 🌟 Take an in-depth look at the similarities & differences between SRE & DevOps, their benefits, usual tasks, and go-to tools to explain their distinct roles in the software development lifecycle (SDLC)
- thenewstack.io: SRE vs. DevOps? Successful Platform Engineering Needs Both A look at the differences, what they do, how they benefit the business and why organizations need all three to succeed.
SRE Tools
- thenewstack.io: The Site Reliability Engineering Tool Stack
- getcortexapp.com: A guide to the best SRE tools
- thenewstack.io: The Best Site Reliability Engineering Tools in 2021
Service Level Objectives (SLO)
- SLOconf The first SLO Conference for Site Reliability Engineers
- thenewstack.io: Automate User Satisfaction with This GitOps-Friendly Spec for Service Level Objectives Organizations looking to tighten up their ops with some site reliability engineering (SRE) should take a look at the recently-released OpenSLO specification, a GitOps-friendly template for establishing Service Level Objectives (SLO) to specify and even enforce the range of reliability required (and afforded) for a system.
- sre.google: The Art of SLOs
- blog.acethecloud.com: A Step-by-Step Guide to Calculate SLAs, SLIs, and SLOs for Your IT Services
- medium.com/picsart-engineering: Prioritizing Development Efforts with SLOs in Microservices
OpenSLO
- OpenSLO specification 🌟 The goal of this project is to provide an open specification for defining and interfacing with SLOs to allow for a common approach, giving a set vendor-agnostic solution to defining and tracking SLOs. Platform specific implementation details are purposefully excluded from the scope of this specification.
Validate Service-Level Objectives of REST APIs Using Iter8
Google SRE Prodcast
- sre.google/prodcast The SRE Prodcast is Google’s podcast about Site Reliability Engineering and production software. In Season 1, we discuss concepts from the SRE Book with experts at Google.
Images
Videos
Click to expand!
Tweets
Click to expand!
Is it hard to find SREs? Dell: Developers do a good job as SREs because they know what exactly is happening. At the same time, we are also thinking about how we can have a developer rotation model too; essentially a rotation policy which is a learning process for us.
— The New Stack (@thenewstack) May 7, 2021
"Platform Engineering" is rapidly becoming the new DevOps or SRE. Almost every day we hear about another org building an internal developer platform or control plane.
— Daniel Bryant (@danielbryantuk) February 18, 2022
Want to know what platform engineering is, where the trends are going, and why you should care?
Read on 🧵👇
We're delighted to introduce Prodcast, Google SRE's podcast about Site Reliability Engineering and production software. In Season 1, we discuss concepts from the #SRE Book with experts at Google.#SREBook #reliability https://t.co/sOytXhXFyz
— Google Site Reliability Engineering (@googlesre) April 14, 2022