Skip to content

Site Reliability Engineering (SRE)

Nubenetes V2 Elite Portal

You are browsing the AI-Curated V2 Elite Edition. Looking for the exhaustive list of references? Check out the V1 Historical Archive.

Architectural Context

Detailed reference for Site Reliability Engineering (SRE) in the context of Platform & Site Reliability.

Continuous Delivery

Feature Management

Reliability Engineering

  • (2021) devops.com: How SREs Benefit From Feature Flags [COMMUNITY-TOOL] โ€” Investigates the role of feature flags in risk mitigation. Demonstrates how isolating application deployment from runtime feature activation enables SREs to instantly disable buggy paths without executing a full application rollback.

SLO Validation

REST APIs

Observability

Monitoring

SRE Fundamentals

  • (2021) circonus.com: Monitoring for Success: What All SREs Need to Know [COMMUNITY-TOOL] โ€” Examines monitoring frameworks focused on business outcomes rather than raw resource metrics. Guides SREs on setting up context-rich telemetry pipelines that prioritize application-level user experience over physical host utilization.

Service Level Objectives

Community Events

  • (2021) SLOconf [COMMUNITY-TOOL] โ€” Official site of SLOconf, the premier developer conference and community space focused entirely on Service Level Objectives, error budgets, and reliability-driven service platforms.

Declarative Standards

  • (2021) OpenSLO specification ๐ŸŒŸ โญ 1496 [YAML CONTENT] [ADVANCED LEVEL] ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ [DE FACTO STANDARD] โ€” The vendor-agnostic OpenSLO specification defines standard YAML schemas for declaring SLOs, SLIs, and error budgets. In 2026, it remains the standard for orchestrating declarative system health models inside GitOps automation.

GitOps

Google Best Practices

  • (2020) sre.google: The Art of SLOs [ADVANCED LEVEL] [DOCUMENTATION] [COMMUNITY-TOOL] โ€” Google's structural guide to defining user-centric Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Features processes to prevent metric fatigue and map indicators directly to consumer utility.

Site Reliability Engineering

Root Cause Analysis

Operations

Platform Engineering

Organizational Design

Strategic Alignment

SRE vs DevOps

Tooling

  • (2021) youtube: Viktor Farcic - What is the difference between SRE and DevOps? [COMMUNITY-TOOL] โ€” Technical video analysis contrasting the conceptual responsibilities of DevOps and SRE. Explores modern architectural pillars including service meshes, declarative GitOps workflows, DevSecOps integrations, and platform-driven self-healing mechanisms.

Site Reliability Engineering (1)

Best Practices

  • (2022) infracloud.io: Site Reliability Engineering (SRE) Best Practices [COMMUNITY-TOOL] โ€” Compiles architectural and cultural patterns for establishing cloud-native SRE. Key focuses include automated toil elimination, disaster recovery automation, chaos engineering practices, and objective-based release gates.

Cloud Native Ecosystem

Enterprise Architecture

Google Best Practices (1)

  • (2020) cloud.google.com: SRE at Google: Our complete list of CRE life lessons ๐ŸŒŸ [ADVANCED LEVEL] [DOCUMENTATION] [COMMUNITY-TOOL] โ€” A comprehensive aggregation of operational lessons compiled by Google's Customer Reliability Engineering (CRE) team. This resource serves as a core blueprint for implementing user-focused SLOs, identifying operational anti-patterns, and structuring incident communication paths.

Google SRE Book

  • (2016) sre.google: sre-book - The Evolving SRE Engagement Model [ADVANCED LEVEL] [DOCUMENTATION] [COMMUNITY-TOOL] โ€” Core chapter from Google's SRE Book defining the Service Engagement Model. Outlines operational boundaries, launch-readiness reviews, service onboarding criteria, and the programmatic handback of unstable systems to development teams.

Incident Management

Podcasts

  • (2021) sre.google/prodcast [COMMUNITY-TOOL] โ€” Google's 'Prodcast' SRE series. Features in-depth architectural and organizational conversations with Google operations engineers regarding multi-region failovers, disaster planning, and fleet-wide maintenance.

Tooling (1)

Software Engineering

Professional Development

Core Architectures

  • (2025) Skills for Real Engineers โญ 128202 [MARKDOWN CONTENT] [ADVANCED LEVEL] ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ [DE FACTO STANDARD] โ€” An exceptionally popular repository detailing the foundational principles, design philosophies, and architectural protocols required for master-level software delivery. While the curator focuses on career advancement, live engineering practice indicates that mastering these fundamentals is vital to surviving rapid AI development shifts. It represents an elite reference for engineering standardizations.

๐Ÿ’ก Explore Related: DevOps | Test Automation Frameworks | Performance Testing With Jenkins And Jmeter

๐Ÿ”— See Also: About | Postman