Software Alternatives, Accelerators & Startups

Google Site Reliability Engineering

How Google runs production systems.

Google Site Reliability Engineering

Google Site Reliability Engineering Reviews and Details

This page is designed to help you find out whether Google Site Reliability Engineering is good and if it is the right choice for you.

Screenshots and images

  • Google Site Reliability Engineering Landing page
    Landing page //
    2023-09-14

Badges

Promote Google Site Reliability Engineering. You can add any of these badges on your website.

SaaSHub badge
Show embed code

Videos

We don't have any videos for Google Site Reliability Engineering yet.

Social recommendations and mentions

We have tracked the following product recommendations or mentions on various public social media platforms and blogs. They can help you see what people think about Google Site Reliability Engineering and what they use it for.
  • AI Incident Management: Detect, Triage, and Resolve Issues Faster
    For teams already using AI ticket routing for customer support, the same principles apply to incident triage โ€” pattern matching, historical classification, and intelligent routing. Google's SRE handbook formalized many of these triage principles long before AI tooling existed. - Source: dev.to / 3 months ago
  • Understanding System Reliability: The Foundation of Modern Infrastructure
    Reliability is a property of your entire system, not just isolated parts. Your application might have rock-solid code, but if your database crashes and there's no failover, your system isn't reliable. This holistic view is essential site reliability engineering (SRE) practices emphasize that reliability must be considered across all layers of your infrastructure. - Source: dev.to / 6 months ago
  • DevOps team structures
    Site Reliability Engineering (SRE) solves operations as if it's a software problem. The SRE team strongly focuses on performance, capacity, availability, and latency for products operating at massive scale. Google pioneered this approach to manage continental-level service capacity. - Source: dev.to / 7 months ago
  • Passwords and Power Drills
    Sorry for the offtopic comment, but it's bizarre to me that Google is hosting their book on Github with a github.io domain. Their previous two SRE books are hosted at https://sre.google on Google-owned IPs.[0] What was that decision process? "We're Google, and we're literally writing a book about how good we are at hosting services. But hosting some static HTML files that are almost entirely text? That's a tough... - Source: Hacker News / 8 months ago
  • Monitoring & Observability: New Tools to Watch in 2025
    In 2025, observability is no longer just for SREs or DevOpsโ€”itโ€™s a cross-functional necessity. Whether youโ€™re debugging a production outage, tracking performance regressions, or optimizing user experience, your observability tools should provide clarity, not clutter. - Source: dev.to / about 1 year ago
  • Ask HN: Why do websites have scheduled downtime if AWS/GCP prove its not needed?
    Same difference... Read the book https://sre.google/. - Source: Hacker News / almost 2 years ago
  • Ask HN: What makes SRE great compared to "plain" DevOps?
    In my view it is having a dedicated team focusing their full mental bandwidth on pro-actively understanding and managing robustness of the system. In Pure DevOps, it seems to me developers often don't have the full picture of the system, and not enough bandwidth to foresee complex interactions from their changes. These are from my experiences spending one year as a developer in somewhat large a greenfield... - Source: Hacker News / over 2 years ago
  • How Site Reliability Engineering Is Different From DevOps
    Site Reliability Engineering, introduced by Google, extends the principles of software engineering to operations. Unlike DevOps, SRE places a stronger emphasis on reliability, availability, and scalability. SRE teams are tasked with maintaining the health and performance of systems by applying engineering practices to operations. The ultimate objective is to achieve a balance between service reliability and... Source: almost 3 years ago
  • API Product Managers, what's your workflow when designing and maintaining an API?
    Define SLOs for availability and latency. Google's SRE book is good reading for this. Source: about 3 years ago
  • Starting an SRE position soon. No prior experience (except IT). Any suggestions? Sorry if it's too general.
    Have you gone through the SRE Books? Source: about 3 years ago
  • Starting up with sre
    Google SRE books is always a good read. Source: about 3 years ago
  • Unsolicited perspective from a SRE interviewer, Part 2
    The inflection point for me was when I read a book on Site Reliability Engineering someone left on my desk (IDK why); I hated toil and wanted to design systems that just ran. When I finished the book, I knew this was the job that I wanted for my career. I wanted a career that was fulfilling, engaging, and high-paying so this fit the bell (I'll talk about comp in the next post). I started to upskill in that... Source: about 3 years ago
  • Employer adding 24/7 on-call rotation. What do I do?
    Read these books: https://sre.google/books/. Source: about 3 years ago
  • Going into an SRE Internship, what should I expect?
    Reading google's SRE books helped me the most during internship. Source: about 3 years ago
  • What are your processes around performance monitoring?
    That brings me to the last point, you're not doing the right thing, when you want to remind people to look at the dashboards just for the sake of looking at dashboards. There needs to be a reason for that. You should define a SLOs that indicates error rate and response times of your service that you should meet. And then you must take them seriously in your process. If you are tracking worse than the SLO, you must... Source: about 3 years ago
  • Career Advice
    It's important to be well rounded. If operations problems are what you're into then books like Google's series on SRE are a good place to look. Become knowledgeable about cloud computing and building distributed systems in general. Kleppman's Designing Data Intensive Applications is a good one for being good at designing systems. Source: over 3 years ago
  • Hacker News Discussion: Clojure Turns 15
    Perhaps you don't work on a large enough clojure codebase where this is an issue, but the common symptom on large codebases is that you cannot understand a piece of clojure code in isolation, you must have the entire module or even sometimes, the entire system in your mental context in order to understand the shape of the data some function you care about will receive and what properties it will have. Hmm, that... Source: over 3 years ago
  • Moving from Oracle dba to SRE role
    Google published a few free SRE books https://sre.google/books/. Source: over 3 years ago
  • Ask HN: Best practices for self-healing apps?
    First step is redundancy: having backups, failover, overprovisioning. Essentially prepared "plan Bs". Next step is introspection: aggregate monitoring and enough detail to figure out if there are issues. Next step is being notified when things break. I.e. Anomaly detection and alerting. Then, debuggability. Enough detail to solve issues. Disaster recovery testing is part of ensuring you actually have this, and not... - Source: Hacker News / over 3 years ago
  • Softwareentwickler: Was sind eure Tรคtigkeiten bei der Arbeit?
    Https://dl.acm.org/doi/fullHtml/10.1145/2854146 Https://sre.google/books/ Https://cloud.google.com/blog/topics/developers-practitioners/how-google-got-to-rolling-linux-releases-for-desktops?hl=en Https://en.m.wikipedia.org/wiki/Borg_(cluster_manager) Https://research.google/pubs/pub43438/. Source: over 3 years ago
  • looking for a mentor
    Since you're asking for mentorship I'm assuming you've already read the books on https://sre.google/books/ and related like https://www.amazon.com/Real-World-SRE-Survival-Responding-Maximizing-ebook/dp/B07BJKZQ7Y ? What skills did you think you needed more help with, or concepts were fuzzy? Source: over 3 years ago

Do you know an article comparing Google Site Reliability Engineering to other products?
Suggest a link to a post with product alternatives.

Suggest an article

Google Site Reliability Engineering discussion

Log in or Post with

Is Google Site Reliability Engineering good? This is an informative page that will help you find out. Moreover, you can review and discuss Google Site Reliability Engineering here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.