What a Site Reliability Engineer Really Does…in DevOps
We really, really build ourselves into a corner with the internet and mobile and cloud and Agile "at scale". Good news is, we're engineers that can invent ourselves out of anything, or at least that's what's made all this money so far.
What Is a Site Reliability Engineer?
Srsly. Wikipedia. Too lazy? Fine, from Wikipedia (please donate):
Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to IT operations problems. The main goals are to create ultra-scalable and highly reliable software systems. According to Ben Treynor, founder of Google's Site Reliability Team, SRE is "what happens when a software engineer is tasked with what used to be called operations."[1]
What kind of this ninja trickery is this? Using common sense to make learn how to hire the best people in technology? Why would Google spill the beans on this hiring secret? Maybe they're sick of dealing with our broken shit.
Our digital systems are ALL distributed and complex now. How can we still expect that having some ignorant code-jockey in a cubicle who never uses what they make control the entire business with the stroke of a keyboard? Because: we are cost-accounting brainwashed and forget that the job to do needs the right experience and skill to do it well. Meanwhile, we keep under-hiring operations and over-hire developers such that there's a 1-to-who-knows ratio between the people that press one button and the people that press another.
If You're Offended By What I've Described, Congratulations!
I am too. Things that are so complex no one person can understand them, those things are dangerous. Banking apps that aren't secure, mapping apps that get us lost and late, social media apps that show our kids their first porn, CGM devices that cost more in maintenance fees than their worth...it offends me when these things don't work. Technology that works is how I make sure I have money for a family, sponsored, biological, or otherwise.
Our tech industry should be hiring people that can comprehend the things they deliver. People pay for things that work. If you don't care about others, at least you'll care about making money, and "right" software in a customer-obsessed market makes the most money.
It's particularly offensive when the hybrid phoenix of a job title that 'Site Reliability Engineer' embodies goes largely unnoticed in high tech corporate mindsets. "What the hell is that, your latest professional title advancement scheme? Just because you mashed these words together doesn't mean you deserve a raise!!!" If you know the following things, you deserve a salary that rivals an enterprise VP of marketing:
- What your software should do
- How your software does what it does
- How to communicate the value of the things you're working on
- Don't mind being woken up when it's broken for someone
- Ignore those around you that don't think the above is relevant to do their jobs
Go forth and make your first salary million in a few years, y'all who can. Do this well and grow.
Why Do We See Site Reliability Engineering on the Rise?
The tech industry is now at the point where we completely forgot that the persons who build software should know how to operate that software when it other people depend on it. Big money, consumer insatiability, customer centricity, and digital transformation has skyrocketed the imperative to make the modern enterprise business engine their engineering teams. We build shiny, complicated, and highly profitable things. What did we expect?
We, the nerds, lured jocks in with our shiny things such as the Altair, BBS, and the entire mobile revolution...and they brought their friends. CFOs, 'professional CEOs', and other people that look at a hoodie like its pajamas that violate the corporate dress code. We allowed things to get this way #waterfall #agile #WomenInTech by being egotistical, lazy, impatient, and unkind. These are our chickens coming home to roost.
And now we have to reinvent a way out of the 'shallow engineering' tech culture that looks skeptically at #DevOps as a management problem. I don't mean that everyone on your engineering team has to code, but the people who do code should understand the impact of what they do. This is ethical and this is practical. This is how you make your next billions.
This is the new horizon for impactful, profitable, and scalable tech culture:
- Lisa Phillips, SRE @google
- Taewha Lee, SRE @samsung
- Jennifer Petoff, PM of SRE @google
- Sally Lehman, SRE @Oracle for Govt.
- Jennifer Kwentoh, SRE @aedcelectricity
- Karen O'Connell, SRE @google
- Mary Gardiner, SRE @google
- Joan Smith, SRE at @twitter and @google (and Doctor Who fan, sup!)
On #InternationalWomensDay, I guess that is all for now.