What a Site Reliability Engineer Really Does…in DevOps

We really, really build ourselves into a corner with the internet and mobile and cloud and Agile “at scale”. Good news is, we’re engineers that can invent ourselves out of anything, or at least that’s what’s made all this money so far.

What Is a Site Reliability Engineer?

Srsly. Wikipedia. Too lazy? Fine, from Wikipedia (please donate):

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to IT operations problems. The main goals are to create ultra-scalable and highly reliable software systems. According to Ben Treynor, founder of Google’s Site Reliability Team, SRE is “what happens when a software engineer is tasked with what used to be called operations.”[1]

What kind of this ninja trickery is this? Using common sense to make learn how to hire the best people in technology? Why would Google spill the beans on this hiring secret? Maybe they’re sick of dealing with our broken shit.

Our digital systems are ALL distributed and complex now. How can we still expect that having some ignorant code-jockey in a cubicle who never uses what they make control the entire business with the stroke of a keyboard? Because: we are cost-accounting brainwashed and forget that the job to do needs the right experience and skill to do it well. Meanwhile, we keep under-hiring operations and over-hire developers such that there’s a 1-to-who-knows ratio between the people that press one button and the people that press another.

If You’re Offended By What I’ve Described, Congratulations!

I am too. Things that are so complex no one person can understand them, those things are dangerous. Banking apps that aren’t secure, mapping apps that get us lost and late, social media apps that show our kids their first porn, CGM devices that cost more in maintenance fees than their worth…it offends me when these things don’t work. Technology that works is how I make sure I have money for a family, sponsored, biological, or otherwise.

Our tech industry should be hiring people that can comprehend the things they deliver. People pay for things that work. If you don’t care about others, at least you’ll care about making money, and “right” software in a customer-obsessed market makes the most money.

It’s particularly offensive when the hybrid phoenix of a job title that ‘Site Reliability Engineer’ embodies goes largely unnoticed in high tech corporate mindsets. “What the hell is that, your latest professional title advancement scheme? Just because you mashed these words together doesn’t mean you deserve a raise!!!” If you know the following things, you deserve a salary that rivals an enterprise VP of marketing:

  • What your software should do
  • How your software does what it does
  • How to communicate the value of the things you’re working on
  • Don’t mind being woken up when it’s broken for someone
  • Ignore those around you that don’t think the above is relevant to do their jobs

Go forth and make your first salary million in a few years, y’all who can. Do this well and grow.

Why Do We See Site Reliability Engineering on the Rise?

The tech industry is now at the point where we completely forgot that the persons who build software should know how to operate that software when it other people depend on it. Big money, consumer insatiability, customer centricity, and digital transformation has skyrocketed the imperative to make the modern enterprise business engine their engineering teams. We build shiny, complicated, and highly profitable things. What did we expect?

We, the nerds, lured jocks in with our shiny things such as the Altair, BBS, and the entire mobile revolution…and they brought their friends. CFOs, ‘professional CEOs’, and other people that look at a hoodie like its pajamas that violate the corporate dress code. We allowed things to get this way #waterfall #agile #WomenInTech by being egotistical, lazy, impatient, and unkind. These are our chickens coming home to roost.

And now we have to reinvent a way out of the ‘shallow engineering’ tech culture that looks skeptically at #DevOps as a management problem. I don’t mean that everyone on your engineering team has to code, but the people who do code should understand the impact of what they do. This is ethical and this is practical. This is how you make your next billions.

This is the new horizon for impactful, profitable, and scalable tech culture:

On #InternationalWomensDay, I guess that is all for now.


FUEL for Your Brain: On Focus, Usefulness, Execution, Learning

I hate acronyms. My dad used to use them far too much, the kind of guy that was more smart in retrospect than the kind of boy I was understood at the time. Kind, thoughtful, quiet, and invested in people around him.

My current thought product is an acronym, “FUEL”, based on a few key practices that I find are valuable to my current line of work as a Change Agent. These practices take time to develop and are only truly useful when used in parallel.

  • Focus: ability to right-size activities to close the task(s) currently underway
  • Usefulness: ability to gauge effectiveness of work and reprioritize based on new ideas/objectives/activities
  • Execution: ability to match skill to task, collaborate the plan, and resolve blockers as they arise
  • Learning: ability to observe outcomes and refactor them into useful ways to improve all of the above

Real-time Example of FUEL

Last night after a meetup, I had a beer with someone I’d met before at a local conference but hadn’t dived into. The opportunity presented itself, so I stayed a little later than I normally would. They are a CTO for a 50-person startup in town. Net-net:

Paul: “What’s weighing on you right now man, work related?” (L)
Them: “Kind of glad someone asked…we have people issues.” (E)
Paul: “You’re not alone…what kind?” (E/F)
Them: “There are a few ‘senior’ engineers that don’t produce like others.” (U)
Paul: “What’s your plan for them and the rest of your team?” (U)
Them: “We just laid off one after giving him a path, but the other two, I don’t know…maybe add metrics, visibility…they’re kind of SPOFs.” (U)
Paul: “…so you can quantify what you already know? How did we arrive here?” (U/L)
Them: “They were here at the beginning, hence ‘senior’, but one guy hasn’t committed code since Sept (5 months)!” (E)
Paul: “Got it, they don’t ‘git’ it. [laughs] How are you and the leadership team  helping to coach other junior engineers?” (L)
Them: “Well that’s the problem maybe. We don’t exactly have a culture yet, but our C-level relationships with each other are solid.” (U/E)
Paul: “I once heard that great leaders define their success by fostering other leaders. Do you know who your real ‘senior’ engineers are?” (F/L)
Them: “Well I kind of already know who deserves the chance to step up.” (E)
Paul: “That’s good, but not enough. People often hesitate on new things simply because they haven’t experienced how it works yet. Your coaching needs to help those people get over any blockers to proving one way or the other if they can do the job well/right/better.” (FUEL)
Them: “I’m going to talk to our CEO about this. Can I get your card?”
Paul: “Only if you intend to use it.”

Good enough exercise and learning for me for one night.