Personal Log 2020-09-02

Started later than I’d like, trying to spend more good morning time with the kiddos. Nothing huge on the schedule, some customer and internal meetings, some time for deep work.

Morning Popcorn

Started to spin up revisionary work on an integration between qTest and NeoLoad Web. Good lord I need to document my pre-documentation work better for myself. Critical commands and obscure UI workflows are killer in products you don’t know or care to know.

Had to remind someone how to get to competitive intel docs. LMGTFY also applies to internal docs for organizations that use G-Suite.

Redirected someone from using the old product requests system (Trello) to new one. Only saw this because I ‘watch’ everything in Trello and it comes via email summaries. Trello and email are gross, but hey, it worked today.

Responded to a net-new person interested in collaborating on the book. Good sign that this channel isn’t just a point-in-time blast, but an ongoing consideration from the channel admin now.

Combined two email threads about the same topic from two different people groups in our org. Always nice when you can help people realize they’re asking for or working on the same things as each other.

Helping to Prioritize Tactical Feature Requests

Hopped on with head of PM to discuss how to better prioritize tactical requests in the new product feedback and request system. Using a similar methodology as RICE (Reach-Impact-Confidence-Effort), my main suggestion was around how to represent urgency from the pre-sales or CSM side related to the topic, which isn’t really covered explicitly enough to successfully operationalize a prioritization model that works for the whole business. Well received and there other practical things such as customer, Salesforce links, revenue size and risk info that we would also have to provide as metadata regardless of how we sum the urgency factor up into the high level view.

Lunchtime Community Stuff

In BDO Slack, got DMed politely about helping a local startup get some product feedback from our community. Points for asking what the best way was first, points for having the CEO (techy) do it, saw some community members positively engaging already. Also suggested that if they really want community love and help, sponsoring the upcoming DevOpsDays Boston event would be good, and sent along the prospectus. That’s the nice thing about being an organizer in multiple groups, constructive forces.

While I was on the topic and pivoted from lunch, a few other event organizer-y emails, sponsor asks, and an internal huddle about something that can’t be ignored anymore.

Deep Work, Fast CLI Fix, Short-Circuiting Flys

Back to work-work: deep dive into the qTest integration. Looks like I have to completely create a Dockerfile from scratch (hello versioning hell) that includes their agent, not just the NeoLoad CLI and Python dependencies. Since their Docker example (stale, btw) was Ubuntu 16.04 based, struggled with getting Python 3.8/3.6 as default and proper pip requirements for almost an hour. Gave up and based off of Ubuntu 18.04 to simplify Python install process, and everything works much easier now. Also, their agentctl doesn’t seem to have subcommands properly documented (wait, here it is, halfway down a sea of blah blahs), so there’s no way to know if there’s an automated approach to configuring an agent on a host (will try again tomorrow). So far, what I’ve learned is:

  • Documentation is shit unless it’s optimized for Google via keywords
  • No matter how much vendor documentation sites push their search bar on you, it’s never as good as Google, and usually just downright awful
  • Cyclical documentation articles that bounce you back to high-level categories are bullshit
  • Documentation that is long enough to have sections and don’t provide permalinks to those sections is candy ass
  • Documentation should be written for all supported platforms, not just old versions of Winblows

Saw a product idea about the CLI come in from one of my best customers, made sense, so I implemented it, created a PR and pushed a pre-release version to address the issue. Customer and support comms, then feedbacks that it’s better. Turn around time: less than 30mins.

Figured out why I was getting a roles and permissions error deep in the qTest agent execution logs. Turns out, my trial to their platform had expired, so, nothing that seemed at all related between the error messages and the actual problem. Nice. Emailed our contacts to ask for proper technical partner license acquisition, hoping to hear from them soon. Reported the blocker to stakeholders.

Short-circuited for the third time an ask from the CEO a fledgling startup who’s been stalking Neotys and me about combining their tech with ours. They have one, maybe two customers, and the technology paradigms between them and us by definition are just so far from a match. They’re just looking for legitimacy and customer lists. A big waste of our time, and I won’t have it, I certainly won’t waste my bosses time on it, though they have discussed and been dismissed in the past. Some people just don’t understand when there’s not something there, neither from a tech/architecture perspective nor a business/user case. We already have our priorities, this is a fly buzzing around the ointment at this point.

Dinner and Event Organizer Meetings

Back home for dinner and family time. Partner is listening to upcoming virtual classroom stuff for the new school year. Glad we and 20% of our community families, that’s one whole elementary school out of the five in our city, emphatically said that they wouldn’t be sending their kids back. At the beginning of the pandemic, we flattened the curve about demand on hospitals, why the hell would we all be expected to forget that the same applies with our kids and throwing them back into the schools all at the same time, even if that’s for half-days (which btw doesn’t help working parents more than stay-at-home by much).

Back to DevOps community organizing, event sponsors working group 30mins weekly until the event is done. There are very few ‘other fruits to squeeze’ for sponsorship dollars this year. I guess that makes it all the more meaningful, the companies and organizations that have committed already. The fund for next year will be okay, we won’t be adding to it this with this year’s revenue for sure, but at least we reached a milestone precedent by committing to donate all the net ticket revenue that comes into good causes! Boy, I worked for months getting alignment between groups on how to make that happen, and along with another organizer to put the right pressure words on the right people at the right time, we now we have a list of causes and agreement across groups that it will happen. We also have a process and clear criteria that can scale and be reused again next year. We will also have data about how many people ‘feel generous’ when offered an option to give more than the recommended ticket price, and data on how many people pick a free ticket when faced with the option that 100% of their contribution will go to something altruistic. We will need to write a retro, draft it up beforehand to publish the day after, about how much money and who it went to; this is not something that can wait for weeks or months like the AV post-production in prior years. People will want to know.

More Like Water, Less Like Waterfall

It’s been too long a time since I published something here. The more time I commit to professional and volunteer and personal projects, the less time time I feel I have to write. What a bullshit excuse too, because I book time on my calendar for other operational things, why not this? All it takes is diligence and sticking to a scheduled time.

In tech, people use the word waterfall like a curse word due to historical reasons, a derogatory label…but an actual waterfall is a continuous stream that doesn’t repeat itself. I want to be more like water, as Bruce Lee said, but for more than the reasons he had.

As Master Lee indicated, we should be ready to change, like water in a glass conforms to the situation around it, in martial arts being rigid and stiff leads to being slow and overly anticipatory. Pangai-noon, translated as “half hard, half soft” also indicates that we need to keep strength and application in balance with speed and adaptability. Mindfulness also plays a huge part in this, living in “the now”, being present in each moment, like water that completely fills every dip and cleft from the riverbed to the edge, but also constantly seeking balance at the surface.

Anyway, I hypothesize that it will increase my mindfulness to exhale my field thoughts and experiences (minus personally identifiable stuff of course) to this blog on a consistent basis. I will dedicatedly try this for about 3 months, writing at least twice a week, and not give up because I missed one or two of these personal appointments. I will simply regale out what learnings I can from the main areas of my daily work, since it is so broad. If these pseudo-minutes strike on a topic meaty enough to write as it’s own post, fine. If not, fine.

It will, at the very least, force me to share something, at least, on a frequent basis. Less like ‘waterfall’ deployment as they say in software, more [continuous] like water.

Thoughts on DevOps vs. Enterprise Culture Clash

Probably not unlike you, every day I work with folks caught in a clash between organizational processes and technology imperatives. “We have to get this new software up and running, but the #DevOps group won’t give me the time of day.”

Large organizations don’t have the luxury of ‘move fast, break stuff’; if they did, their infrastructure, security, financial, and software release processes would be a chaotic mess…far more than usual. But how does one ‘move fast’ without breaking enterprise processes, particularly ones that they don’t understand?

Enterprise, Know Thyself

The answer is simple: encourage engineers to always be curious to know more about their environment, constraints, and organizational culture. The more you know, the more nimble you’ll be when planning and responding to unanticipated situations.

Today I had a call with a health care company, working to get docker installed on a RHEL server provisioned by an infra team. What was missing was that the operator didn’t know that the security team using Centrify to manage permissions on that box required tickets to be created to grant ‘dzdo su’ access for a very narrow window of time. Additionally, the usual ‘person to connect with’ was off on holiday break, so we were at the mercy of a semi-automated process for handling these tickets, and because they had already put in a similar request in the past 7 days, all new tickets would have to go through a manual verification process. This frustrated our friend.

The frustration manifested in the form of the following statement:

Why can’t they just let me have admin access to this non-production machine for more like 72 hours? Why only 2 meaasly hours at a time?

– Engineer at an F100 health care organization

My empathy and encouragement to them was to “expect delays at first, don’t expect everyone to know exactly how processes work until they’ve gone through them a few times, but don’t accept things like this as discouragements to your primary objective.”

If everything were easy and no problems existed, kind words might be useless. When things are not working that way, knowing how to fix or overcome them goes a long way, just like a kind word at the right time. We crafted an email to the security team together explaining exactly what was needed AND WHY, as well as an indication of the authority and best/worst case timelines that we were operating under, and a sincere thank you.

Enterprise “DevOps” Patterns that Feel Like Anti-Patterns

In my current work, I experience a lot of different enterprise dynamics at many organizations around the world. The same themes, of course, come up often. A few dynamics I’ve seen in play when enterprises try to put new technology work in a pretty box (i.e. consolidate “DevOps engineers” into a centralized team) are:

  1. Enterprise DevOps/CloudOps/infra teams adopt the pattern of “planned work”, just like development teams, using sprints and work tracking to provide manageable throughput and consistency of support to other organizational ‘consumers’. This inherits other patterns like prioritization of work items, delivery dates, estimable progress, etc.
  2. Low/no context requests into these teams get rejected because it’s slow/impossible to prioritize and plan based on ambiguous work requirements
  3. The amount of control and responsibility these teams have over security and infrastructure systems the organization is often considered “high risk”, so they’re subject to additional scrutiny come audit time

That last point about auditing, particularly the psychological impacts on ‘move fast’ engineers, cannot be understated. When someone asks you to break protocol ‘just this one time’, it’s you that’s on the hook for explaining why you took action to do so, rarely the product owner or director who pressured the engineer to do it.

Technical auditors that are worth anything more than spit will focus on processes instead of narrow activities because to comb through individual log entries is not scalable…but verifying that critical risk mitigative processes are in place and checking for examples of when the process is AND isn’t being followed…that’s far more doable in the few precious weeks that auditing firms are contracted to complete their work.

The More You Know, The Faster You Can Go (Safely)

An example of how understanding your enterprise organization’s culture improves the speed of your work comes from an email today between two colleagues at F100+:

Can you confirm tentative dates when you are planning to conduct this test? Also will it take time to open firewall, post freeze incident tickets can be fast tracked?

– Performance Engineering at Major Retailer

This is a simple example of proper planning. Notice that the first as is for concrete dates, an inference that others also need to have their shit together (in this particular case because they’re conducting a 100k synthetic user test against some system, not a trivial thing in the slightest). The knowledge that firewall rules have to be requested ahead of time, and to notify incident response that potential issues reported may be due to the simulation, not real production traffic, comes from having experienced these things before. Understanding takes time.

Another software engineer friend of mine in the open-source space and I were discussing the Centrify thing today, and he asked: “why can’t they just set up and configure this server with temporary admin rights off to the side, then route appropriate ports and stuff to it once it’s working?” Many practitioners in the bowels of enterprises will recognize a few wild assumptions there, and in no way is this a slight of my friend, but rather an example of how different thinking is from two very different engineering cultures. More specifically, those who are used to being constrained as opposed to those who aren’t often have a harder time collaborating with each other because they’re reasoning is predicated on very different past experiences. I see this one a lot.

DevOps Is an Approach to Engineering Culture, not a Team

This is my perspective after only 5yrs of working out what “DevOps” means. I encourage everyone to find their own by having their own journey of curiosity, keyboard work, and many conversations.

There is and never should be a DevOps ‘manifesto’. As Andrew Clay Shafer (@littleidea) once said, DevOps is about ‘optimizing for people’, not process or policy or one type of team only. Instead of manifesto bullet points, there are some clear and common principles that have stayed the test of time since 2008:

  • A flow of work, as one way as possible
  • Observability and Transparency
  • Effective communication and collaboration
  • A high degree of automation
  • Feedback and experimentation for learning and mastery

Some of the principles above come from early work like The Phoenix Project, The Goal, and Continuous Delivery; others come from more formalized research such as ISO and IEEE working groups on DevOps that I’ve been a part of over the past 3 years.

I don’t tend to bring the “DevOps is not a team” bit up when talking with F100s primarily because:

  • it’s not terribly relevant to our immediate work and deliverables
  • enterprises that think in terms of cost centers always make up departments, because “we have to know who’s budget to pay them from and who manages them”
  • Now that DevOps is in vogue with various IT leaders and just like the manifestation of Agile everywhere now, DevOps is perceived as ‘yet another demand to do things differently from management’, so after being restructured, engineers often have enough open wounds that I don’t need to throw salt on
  • if this is how people grok DevOps in their organization, there’s little I as an ‘outside’ actor can do to change it…except maybe a little side-conversation over beers here and there, which I try to do as much as appropriately possible with receptive folks

However, as an approach to engineering culture, DevOps expects people to work together, to “row in the same direction”, and to learn at every opportunity. As I stated at the beginning of this post, learning more about the people and processes around you, the constraints and interactions behind the behaviors we see, being curious, and having empathy…these things all still work in an enterprise context.

As the Buddha taught, the Middle Path gives vision, gives knowledge, and leads to calm, to insight, to enlightenment. There is always a ‘middle way’, and IMO is often the easiest path between extremes to get to the place where you want to be.

Put That in Your Pipeline and Smoke Test It!

I rarely bother to open my mouth as a speaker and step into a spotlight anymore. I’ve been mostly focused on observing, listening, and organizing tech communities in my local Boston area for the past few years. I just find that others’

A friend of mine asked if I would present at the local Ministry of Testing meetup, and since she did me a huge last-minute favor last month, I was more than happy to oblige.

“Testing Is Always Interesting Enough to Blog About”

Permissioned quote from the Boston DevOps community, Dec 12th 2019. James Goin, DevOps Engineer

The state and craft of quality (not to mention performance) engineering has changed dramatically in the past 5 years since I purposely committed to it. After wasting most of my early tech career as a developer not writing testable software, the latter part of my career as of late has been what some might consider penance to that effect.

I now work in the reliability engineering space. More specifically, I’m a Director of Customer Engineering at a company focusing on the F500. As a performance nerd, everything inherits a statistical perspective, not excluding how I view people, process, and technology. In this demographic, “maturity” models are a complex curve across dozens of teams and a history of IT decisions, not something you can pull out of an Agilista’s sardine can or teach like the CMMI once thought it could.

A Presentation as Aperitif to Hive Minding

This presentation is a distillation of those experiences to date as research and mostly inspired to learn what other practitioners like me think when faced with challenges in translating the importance of holistic thinking around software quality to business leaders.

Slides: bit.ly/put-that-in-your-pipeline-2019

Like I say at the beginning of this presentation, the goal is to incite collaboration about concepts, sharing the puzzle pieces I am actively working to clarify so that the whole group can get involved with each other in a constructive manner.

Hive Minding on What Can/Must/Shouldn’t Be Tested

The phrase ‘Hive Minding‘ is (to my knowledge and Google results) a turn-of-phrase invention of my own. It’s one incremental iteration past my work and research in open spaces, emphasizing the notions of:

  • Collective, aggregated collaboration
  • Striking a balance between personal and real-time thinking
  • Mindful, structured interactions to optimize outcomes

At this meetup, I beta launched the 1-2-4-All method from Liberating Structures that seemed to work so well when I was in France at a product strategy session last month. It so well balanced the opposite divergent and convergent modes of thinking, as discussed in The Creative Thinker’s Toolkit, that I was compelled again to continue my active research into improving group facilitation.

Even after a few people had to leave the meetup early, there were still six groups of four. In France there were eight contributors, so I felt that this time I had a manageable but still scaled (4x) experiment of how hive minding works with larger groups.

My personal key learnings

Before I share some of the community feedbacks (below), I should mention what I as the organizer saw during and as outcomes after the meetup:

  • I need to use a bell or chime sound on my phone rather than having to interrupt people once the timers elapse for each of the 1-2-4 sessions; I hate stopping good conversation just because there’s a pre-agreed-to meeting structure.
  • We were able to expose non-quality-engineer people (like SysOps and managers) to concepts new to them, such as negative testing and service virtualization; hopefully next time they’re hiring a QA manager, they’ll have new things to chat about
  • Many people confirmed some of the hypotheses in the presentation with real-world examples; you can’t test all the things, sometimes you can’t even test the thing because of non-technical limitations such as unavailability of systems, budget, or failure of management to understand the impact on organizational risk
  • I was able to give shout-outs to great work I’ve run across in my journeys, such as Resilient Coders of Boston and technical projects like Mockiato and OpenTelemetry
  • Quite a few people hung out afterward to express appreciation and interest in the sushi menu of ideas in the presentation. They are why I work so hard on my research areas.
  • I have to stop saying “you guys”. It slipped out twice and I was internally embarrassed that this is still a latent habit. At least one-third of the attendees were women in technology and as important as being an accomplice to improving underrepresented communities (including non-binary individuals), my words need work.

A Few Community Feedbacks, Anonymized

Consolidated outcomes of “Hive Minding” on the topics “What must be tested?” and “What can’t we test?”
  • What must we test?
    • Regressions, integrations, negative testing
    • Deliver what you promised
    • Requirements & customer use cases
    • Underlying dependency changes
    • Access to our systems
    • Monitoring mechanisms
    • Pipelines
    • Things that lots of devs use (security libraries)
    • Things with lots of dependencies
  • What can’t we test?
    • Processes that never finish (non-deterministic, infinite streams)
    • Brute-force enterprise cracking
    • Production systems
    • Production data (privacy concerns)
    • “All” versions of something, some equipment, types of data
    • Exhaustive testing
    • Randomness
    • High-fidelity combinations where dimensions exponentially multiply cases
    • Full system tests (takes too long for CI/CD)

A few thoughts from folks in Slack (scrubbed for privacy)…

Anonymized community member:

Writing up my personal answers to @paulsbruce’s hivemind questions yesterday evening: What can/should you test?

  • well specified properties of your system, of the form if A then B. Only test those when your gut tells you they are complex enough to warrant a test, or as a preliminary step to fixing a bug, and making sure it won’t get hit again (see my answer to the next question).
  • your monitoring and alerting pipeline. You can never test up front for everything, things will break. The least you can do is test for end failure, and polish your observability to make debugging/fixing easier.

What can’t/shouldn’t you test?

  • my answer here is a bit controversial, and a bit tongue in cheek (I’m the person writing more than 80% of the tests at my current job). You should test the least amount possible. In software, writing tests is very expensive. Tests add code, sometimes very complex code that is hard to read and hard to test in itself. This means it will quickly rot, or worse, it will prevent/keep people from modifying the software architecture or make bold moves because tests will break/become obsolete. For example, assume you tested every single detail of your current DB schema and DB behaviour. If changing the DB schema or moving to a new storage backend is “the right move” from a product standpoint, all your tests become obsolete.
  • tests will often add a lot of complexity to your codebase, only for the purpose of testing. You will have to add mocking at every level. You will have to set up CICD jobs. The cost of this depends on what kind of software you write, the problem is well solved for webby/microservicy/cloudy things, much less so for custom software / desktop software / web frontends / software with complex concurrency. For example, in my current job (highly concurrent embedded firmware, everything is mocked: every state machine, every hardware component, every ocmmunication bus is mocked so that individual state machines can be tested against. This means that if you add a new hardware sensor, you end up writing over 200 lines of boilerplate just to satisfy the mocking requirements. THis can be alleviated with scaffolding tools, some clever programming language features, but there is no denying the added complexity)

To add to this, I think this is especially a problem for junior developers / developers who don’t have enough experience with large scale codebases. They are either starry-eyed about TDD and “best practices” and “functional programming will save the world”, and so don’t exercise the right judgment on where to test and where not to test. So you end up with huge test suites that basically test that calling database.get_customer('john smith') == customer('john smith') which is pretty useless. much more useful would be logging that result.name != requested_name in the function get_customer

the first is going to be run in a mocked environment either on the dev machine, on the builder, or in a staging environment, and might not catch a race condition between writers and readers that happens under load every blue moon. the logging will, and you can alert on it. furthermore, if the bug is caught as a user bug “i tried to update the customer’s name, but i got the wrong result”, a developer can get the trace, and immediately figure out which function failed

Then someone else chimed in:

It sounds like you’re pitting your anecdotal experience against the entire history of the industry and all the data showing that bugs are cheaper and faster to fix when found “to the left” i.e. before production. The idea that a developer can get a trace and immediately figure out which function failed is a starry-eyed fantasy when it comes to most software and systems in production in the world today.

The original contributor then continues with:

yeah, this is personal experience, and we don’t just yeet stuff into production. as far data-driven software engineering, I find mostly scientific studies to be of dubious value, meaning we’re all back to personal experience. as for trace driven debugging, it’s working quite well at my workplace, I can go much more into details about how these things work (I had a webinar with qt up online but I think they took it down)

as said, it’s a bit tongue in cheek, but if there’s no strong incentive to test something, I would say, don’t. the one thing i do is keep tabs on which bugs we did fix later on, which parts of the sourcecode were affected, who fixed them, and draw conclusions from that

Sailboat Retrospective

Using the concept of a sailboat retrospective, a few things that I’d like to improve are below, namely:

Things that propel us:

  • Many people said they really liked the collaborative nature of hive minding and would love to do this again because it got people to share learnings and ideas
  • Reading the crowd in real-time, I could see that people were connecting with the ideas and message; there were no “bad actors” or trolls in the crowd
  • Space, food, invites and social media logistics were handled well (not on me)

Things that slowed us:

  • My presentation was 50+ mins, way too long for a meetup IMO.

    To improve this, I need to:
    • Break my content and narratives up to smaller chunks, ones that I can actually stick to a 20min timeframe on. If people want to hear more, I can chain on topics.
    • Recruit a timekeeper from the audience, someone who provides accountability
    • Don’t get into minutia and examples that bulk out my message, unless asked
  • Audio/video recording and last-minute mic difficulties kind of throws speakers off

    To fix this? Maybe bring my own recording and A/V gear next time.
  • Having to verbally interrupt people at the agree upon time-breaks in 1-2-4-All seems counter to collaborative spirit.

    To improve this, possibly use a Pavlovian sound on my phone (ding, chime, etc.)

Things to watch out for:

  • I used the all-to-common gender-binary phrase “you guys” twice. Imagine rooms where it would somehow be fine to say that, but saying “hey ladies” to a mixed crowd would be considered pejorative to many cisgender men. Everything can be improved and this is certainly one thing I plan to be very conscious of.
  • Though it’s important to have people write things down themselves, not everyone’s handwriting can be read back by others after, and certainly not without high-fidelity photos of the post-its afterward.

    To improve this, maybe stand with the final group representatives and if needed re-write the key concepts they verbalize to the “all” group on the whiteboard next to their post-it.

More reading:

Afterthoughts on Hive Minding

It’s a powerful thing to understand how your brain works, what motivates you, and what you don’t care about. There are so many things that can distract, but at the end of the day, there are very few things measurable immediately worth having done. Shipping myself to Europe until next week, for example, has already had measurable personal and professional impact.

One thing I experienced this week after injecting a little disruption to conformity yesterday was what I now call “hive minding”, or otherwise assisting independent contributors in rowing in the same direction. The classical stereotype of “herding cats” infers that actors only care about themselves, but unlike cats, a bee colony shares an intuitive, survival imperative to build and improve the structure that ensures their survival. Each bee might not consciously think about “lasting value”, but it’s built into their nature.

Be Kind, Rewind

I’m always restless, every success followed by a new challenge, and I wouldn’t have it any other way, but it does lead to a growing consideration about plateauing. Plateauing is a million times worse than burning out. There are plenty of people and companies that have burned out already but are still doing something “functional” in a dysfunctional industry, and if the decision is to flip that investment, it’s an easy one to make. Fire them, trade or cut funding; but what do you do with a resource when they plateau?

I think you’ll know you’ve plateaued when you find yourself without restlessness. If necessity is the mother of invention, restlessness is the chambermaid of clean mind. Al least for me, like a hungry tiger in a cave, I must feed my restlessness with purposeful and aligned professional work. The only problematic moment with me…I like to get ahead of the problem of someone telling me what to do by figuring out what we (everyone, me and them) should be doing before someone dictates it with less context.

The sweet spot of this motion is to do this together, not in isolation and not dictatorially, but coalescing the importance of arriving at the “right” goals and in alignment at the same time. The only surprises when you’re riding the wave together is what comes next, and when you engineer this into the process, surprises are mostly good.

It took a while to arrive at this position. I had to roll up sleeves, work with many different teams in multiple organizations, listen to those whose shoes I don’t have the time or aptitude to fill, figure out how to synthesize their inputs into cogent and agreeable outcomes, and do so with a level of continuity that distinguishes this approach from traditional forms of management and group facilitation.

Don’t Try This On Your Own

The cost of adaptability is very high. If I didn’t have an equally dedicated partner to run the homefront, none of this would work. She’s sought out the same kind of commitment and focus on raising the kids as I do with what goes into pays the bills. There are very few character traits and creature comforts we share, but in our obsession over the things that make the absolute best come out of what we have, she more than completes the situation.

In this lifestyle, I have to determine day by day and week by week what net-new motions/motivations I need to pick up and which I need to put down, either temporarily or permanently. This can feel like thrash to some, but for me, every day is a chance to re-assess based on all the days before now; I can either take that opportunity or not, but it is there despite whether I do or not take it. If my decisions are only made in big batches, similar to code/product releases, I inherit the complexities and inefficiencies of “big measurement”…namely, granularity in iterative improvement.

Feedback Loops, Everywhere

As I explore the dynamics of continuous feedback loops beyond software and into human systems, a model of frequency in feedback and software delivery not as separate mechanisms, but as symbiotic, emerges. The more frequently you release, the more chances there are for feedback. The more feedback you can synthesize into value, the more frequently you want to release. One does not ‘predict’ the other; their rate bounds each other, like a non-binary statistical model.

What I mean is that a slow-release cycle predicts slow feedback and slow feedback predicts low value from releasing frequently; a fast feedback mechanism addicts people to faster release cycles. They share the relationship and depending on how extreme the dynamics feeding into one side of the relationship, the other one suffers. Maybe at some point, it’s a lost cause.

An example from the performance and reliability wheelhouse is low/slow performance observability. When you can’t see what’s causing a severe production incident, the live investigation and post-mortem activity is slow and takes time away from engineering a more reliable solution. Firefighting takes dev, SRE, ops, and product management time…it’s just a fact. Teams that understand the underlying relationship and synthesize that back into their work tend to use SEV1 incidents as teachable moments to improve visibility on underlying systems AND behavioral predictors (critical system queue lengths, what levels of capacity use constitute “before critical”, architectural bottlenecks that inform priorities on reducing “tech debt”, etc.).

The point is that feedback loops take time and iterative learning to properly inject in a way that has a positive, measurable impact on product delivery and team dynamics.

Going from Feedback Loops to Iterations…Together

All effect feedback loops have one thing in common: they measure achievement levels framed by a shared goal. So you really have to work to uncovered shared goals in a team. If they suit you and/or if you can accept the awesome responsibility to challenge and change them over time, it’s a wild ride of learning and transforming. If not, find another team, company, or tribe. Everyone needs a mountain they can traverse and shouldn’t put themselves up to a trail that will destroy them. This is why occasionally stepping back, collaborating, and reporting out what works and what doesn’t is so important. Re-enter the concept of “team meetings”.

Increasingly, most engineers I talk to abhor the notion of more meetings, usually because they’ve experienced their fair share of meetings that don’t respect their time or where their inputs have not been respectfully synthesized in a way they can see. So what, meetings are a bad thing?

Well, no, not if your meetings are very well run. This is not one person’s job, though scrumbags and mid-level management with confirmation bias abound, and especially so because they don’t have built-in NPS (net promoter score). A solution I’ve seen to the anti-pattern of ineffective meetings is to establish common knowledge of what/how/why an “effective” meeting looks like and expect these behaviors from everyone in on the team and in the org.

How to Encourage Effective Collaboration in Meetings

Learn to listen, synthesize, and articulate back in real-time. Too much time goes by, delay and context evaporate like winter breath. Capture as much of this context as you can while respecting the flow of the conversation. This will help you and others with remembering and respecting the “why”, and will allow people to see what was missing (perspectives, thinking, constructs), afterward. Examples of capture include meeting minutes, pictures of post-its, non-private notes from everyone, and even recordings.

But in just about every team and organization there’s a rampant misconception that ALL meetings must produce outcomes that look like decisions or action items. These are very beneficial, but I’ve seen people become anti-productive when treating themselves and others as slaves to these outcomes. Taking decisions too early drives convergent attitudes that are often uninformed, under-aligned, and often destructive.

Some of the most effective meetings I’ve had share the following patterns:

  • know why you’re meeting, provide context before, and set realistic expectations
  • have the “right” people in the room
    • who benefit from the anticipated outcomes and therefore are invested in them
    • who bring absolutely critical perspective, where otherwise invalidates outcomes or cause significant toil to refactor back in afterward; not to few
    • who contribute to functional outcome (as opposed to those who are known to bring dysfunction, don’t respect the time of others, argue over align); too many
  • agree on what positive and negative outcomes look like before starting in
  • use communication constructs to keep people on track with producing outcomes
  • have someone to ensure (not necessarily do all the) capture; note and picture taker
  • outcomes are categorized as:
    • clear, aligned decisions (what will happen, what worked, what didn’t, what next)
    • concrete concerns and missing inputs that represent blockers to the above
    • themes and sense of directional changes (i.e. we think we need to change X)
    • all info captured and provided as additional context for others

Trust AND Verify

One thing I keep finding useful is to challenge the “but” in “trust, but verify”. In English, the word “but” carries a negating connotation. It invalidates all that was said before it. “Your input was super important, BUT it’s hard to understand how it’s useful”…basically means “Your input was not important because it was not usable.”

My alternative is to “trust and verify”, but with a twist. If you’re doing it right, trust is easy if you preemptively provided an easy means to verify it. If you provide evidence along with your opinion, reasonable people are likely to trust your judgment. For me, rolling up the sleeves is a very important tool in my toolbelt to produce evidence for or against a particular position. I know there are other methods, both legitimate and nefarious, but I find that practical experience is far more defensible than constructing decisions based on shaky foundations.

All this said, even if you’re delivering self-evident verification with your work, people relationships take time and certainly take more than one or two demonstrative examples of trustability to attain a momentum of their own. Trust takes time, is all.

Takeaways and Action Items from This Week

Democratic decision processes are “thrashy”. Laws and sausages: no one wants to know how they’re made. In small teams going fast, we don’t have the luxury of being ignorant of outcomes and the context behind them. For some people, “democracy” feels better than dictatorial decisions being handed down without context; but for those who still find a way to complain about the outcomes, they need to ask themselves, “did I really care enough to engage in a functional and useful way, and did I even bother to educate myself on the context behind the decision I don’t like?”

Just like missing a critical perspective in a software team, in a global organization, when one region or office dominates an area of business (U.S. on sales, EU on security, for instance), this will inevitably bias outcomes and decisions affecting everyone. As the individual that I report to puts it, “scalability matters to every idea, not just when we’re ready to deploy that idea”. Make sure you have the right “everyone” in the room, depending on the context of your work and organizational culture.

Someone I once met and deeply respect once told me “it’s not enough to be an ally, you need to be an accomplice“. In context, she was referring to improving the epic dysfunction of modern technology culture by purposefully including underrepresented persons. Even if we make a 10% improvement to women’s salaries, hire more African-American engineers, create a safer place for LGBTQ, I still agree with the premise that doing these things isn’t good enough. Put it another way, receiving critical medical treatment for a gushing head wound isn’t an “over-compensation”, it’s a measured response to the situation. The technology gushing head wound, in this case, is an almost complete denial from WGLM (white guys like me) that there is a problem, that doing nothing continuously enables the causes of the problem, that leadership on this doesn’t necessarily look or think like us, and that this isn’t necessarily needed now.

Bringing it back to the wheelhouse of this article, true improvement culture doesn’t just take saying “sure, let me wave at you as an ally while you go improve the team”. It takes being an accomplice (think a getaway driver), we should ALL be complicit in decisions and improvement. Put some skin in the game, figure out how something truly worth improvement and your effort maps to your current WiP (work in progress) limits, and you may find that you need to put something less worth your time down before you can effectively contribute to improvement work. Surrounding yourself with folks who get this too will also increase the chances that you’ll all succeed. This is not an over-compensation, it is what everyone needs to do now to thrive, not just survive.

Crossing Cross-functional Chasms

Initialization Phase

My first evening in La Ciotat: I picked up a rental car in town due to the good graces of Giulia, the front desk assistant who was coming off of her shift. She called, then insisted she drive me to the pick-up office where the lady attending was on her way out to deliver a car. Thirty not so awkward minutes later after discussing dog grooming and training techniques in great depth, the attendant came back and shortly I had a car. It was just starting to rain and it had been years since my last stick shift. Crossed fingers and no stalls to get back to La Rose, but by that time, waters were pouring out in sheets. The sprint from car to lobby was in place of the shower I had hoped to take earlier.

The hotel restaurant was leaking everywhere and occasionally losing power. The great thing about room charges is that a full bar downstairs doesn’t require the internet to hand over all sorts of drinks. People outside the lower forty-eight seem to intuit what to do when the credit card terminal is out of service. The lightning was faster and closer than I’ve ever seen from my fishing town, so it was a good time to revert to battery power and write this.

My recent recipe is bourbon (or tequila) and a splash of each lemon juice, creme de cacao, absinthe, shaken and filtered into a highball with a thick peel of an orange. A few months ago it was half high-quality sake and half Prosecco with a flake of rosemary. In a pinch, anything works, and when your bartender has got flooding issues to deal with, you can ponder life under a canopy and try to stay dry. The following are my thoughts from underneath all of this.

Planning Phase

The thing about my work, it isn’t scalable because it serves different goals than other kinds of work. Like Kent Beck describes in his “3-x” model, there are modes of work that optimize for different localized outcomes but all serve the same high-order goal. What is that goal? As Eliyahu Goldratt states, “the goal of an organization is to make money”. Certainly commercial ones, but even non-profits need to do this in order to exist. I exercise aspects of each of Kent’s 3 modes: explore, expand, and extract. I dig holes to find gold and when I hit it I dig hard, and then try to scale that out to optimize efforts to extract that gold.

In his epic distillations, the Innovators Dilemma and Solution, Clayton Christensen puts a fine point on how if a company is not thinking of its next horizon at all points in the current extract motion, it has no lasting future. Despite the dilemma of where to divest funds and how to prioritize “next” work, I am looking to do that for whomever I work with. I want to help optimize what’s currently being extracted, translate learnings into gaps and undiscovered opportunities, and continuously listen and learn “what’s next” (ref: Tim O’Reilly in “WFT?: What’s the Future and Why It’s Up to Us”. If we’re not doing that, either homogeneously as all actors in an organization or as a unique sub-function, then we’re dooming our employees and product to obsolescence.

Implementation Phase

I do…a lot…of things in my current organization. Pre-sales guidance, analyst relations, strategic product planning, blog writing, speaking, webinars, on-site customer planning sessions, technical prototyping, automated pipeline and testing examples, collaboration with support on key customers, building examples, positioning and messaging assistance, customer engineering, amongst others. “Cross-functional” is an easy way of putting it. When friends ask about what I do, I just say “I help technology companies make better decisions.”

But when your cross-functional, you get to see how diverse people groups are and how differently they structure their goals. For some, it’s money, but for others it’s lead acquisition, and for yet others, it’s sprint deliverables and low-defect release cycles. For leadership it’s all of these things plus happy employees, happy customers, and happy selves. I want all of these things and more…happy me and happy mine (family) which requires balance. Balancing multiple objectives takes a lot of practice, similar to my experiences with Uechi Karate-do. Balance isn’t a static state, it’s an ability to re-balance and prioritize based on shifting needs and constraints.

In planning one of four strategy sessions with one of the founders, I found myself thinking “our goals are not the same, he wants to prioritize an existing backlog around reporting, but I want to define the new state of the art for our industry”. After realizing that he had a different goal, we played better together; but I am not distracted. Maintaining the status quo has never been my strong suit and I’m more useful when focused on what’s next, not just what already was.

This is my current approach to balance: understand what drives people (myself included), listen to everyone, provide value that’s aligned to these motives, and circulate what makes sense to the organization. Catching the moment when a founder’s goals and my own differ happens in real-time, but only if I’m exercising balance along these guidelines.

Deployment Phase

This week, the plan is to listen, a lot. Especially because of the language gap, but also thanks to my eclectic manners of verbal communication, as evidenced last time, less seems to be more here. I am working to lock down the details of a new position, focused on bringing the customer perspective to every area of business and a translation of my own invention. The activities performed today have a tendency to be…predictable, easily replicable, and therefore boring to someone like me.

Though billed as “strategy sessions”, my feeling is that current leadership understands the need for all elements of the business to be engaged deeply…”lean forward” as I often call it. The real strategy happens next week, in decision meetings amongst founders and key business owners separate from the rest of the employees. This is an interesting model, right-fit I think for humans who need time to digest and consider various perspectives and potential directions.

Though many ideas and directions will be discussed over the next 7 days, we’ll need to prioritize and I don’t own the company. All I can do is help those who do have a clear understanding of internal and external dynamics, provide requisite evidence for my positions, and improve relationships with my counterparts here in France.

Monitoring Phase

Feedback loops are important whether you automate them or not (but automating them is the smart way to do it). How do you automate human interactions though? The closest I’ve gotten is to “pull forward”…in my upcoming role, building in the demand and supply of effective internal and external collaboration. The partner channel is a significant dynamic in my current organization, much of it is channel but there is also a contingent technical element, as all good partnerships between tech companies should have. A colleague of ming is fantastic at tactical and technical delivery in this scope, but to scale these efforts out to the whole organization takes project/program management that he’s not particularly keen to deliver himself.

A key element to “monitoring” my effect is to A) have traceable inclusion in conversations (via Salesforce currently) and B) through volunteered backchannel context measure how many times my involvement improves what we’re doing, in the partner, sales, marketing, and development work. This week would be an example of execution, then after next week, I would explicitly ask my leadership what value they heard as a result of my presence. Pull forward isn’t a zero-effort enterprise, but is absolutely necessary if you take your individual impact lifecycle seriously.

Reintegration Phase

The bartender tonight quickly called for backup and started taking care of the flooding issues. I suspect this was because he knew that if he had just sat there and let the hotel restaurant get flooded, someone would ream the shit out of him the next day. He got ahead of the problem and solved it. This is what DevOps and SRE is all about, seeing that no one else is solving for the lasting value and putting patterns in place to help others to do exactly that.

In our current state, this organization takes its time to synthesize and integrate learnings. Faster than most other teams I see but not as fast as some, as with everything pace can be improved. More importantly, alignment and transparency must always be improved and that is not zero-effort in the slightest either. In a prior position, an SVP once stated that “alignment is about 50% of my time spend”. With marginal variance, I wish that applied across all roles and responsibilities in every team and every organization I work with. Imagine the impact you’d have if 100% of your 50% heads-down work was wildly effective. That is what rowing in the same direction looks like.

Reprise

For this post, there is no “call to action” other than if you want to leave a comment or engage on social channels. I can always find something worthwhile for you and I to do together. Drop a line and let’s chat about that.

On Volunteering for Tech Community Work

DevOpsDays Boston volunteers

This is my attempt to distill what I’ve learned over the past two years of contributing to maintaining and improving the local DevOps community in Boston. As in all cases, what we think we “know” at a point in time can/may/should change as our journey progresses. Rather than prescriptive, this article is descriptive of my current approach.

TL;DR

  • do it for ‘right’ reasons that align with community values
  • really know what you’re getting into and commit proportionally
  • be prepared to spend time and energy, particularly on personal adaptation
  • manage your life work-in-process limit carefully
  • expect something unexpected out of it

A True Community is Tangible

There are many definitions of “community”, but here’s mine at this point:

A group of people dedicated to sharing and improving along with each other.

– me, provisionally as of 2019

If helping to do this sounds easy, it is not. Dedication means graciously and humbly collaborating with other organizers, diligently following up with a million little details, listening and understanding where people in the community are coming from, actively seeking personal improvement and synthesizing into volunteer work, responding quickly to important issues in the community and of course, making/taking time from your personal and professional life to do these things reliably.

For an organizer, these things are very tangible because they cost us what we can never get back: time. For other community members, the value comes from elements like knowledge sharing, conversations, job opportunities, or even a sense of simply being part of a community. Showing up at a meetup event or a casual coffee chat is the first step, and understanding why you enjoyed brings you back.

One of the most concrete, tangible outcomes of being part of the community I work in is that there’s always an opportunity to learn something new and useful, which by definition you can’t predict. The more you give, the more you get. The more you’re open-minded, the more mindful and receptive to good ideas you become. The more you listen, the more you hear.

Fostering Unity and Diversity at the Same Time

We all come to the table with a variety of backgrounds, experiences, and perspectives. This is the true strength of a healthy community. Balancing time constraints and the distribution curve of appetites for themes is tricky, but doable when facilitators are dedicated and have the interests of each other (including the whole community) in mind.

There’s usually a unifying set of core topics and themes related to some common interests (such as learning new tech or how to deal with office culture or professional improvements). Straying too far outside this focus often contributes to our “variety” getting in the way of sharing and improving, though sometimes including topics seemingly unrelated to the tech side of the conversation can provide opportunities for divergent thinking and new knowledge.

An Organizer’s Roles and Responsibilities in the Community

At a small scale, this may happen organically, but as the group scales, this doesn’t happen without organizers. A community organizer is someone who:

  • looks out for the best interests of all members of the community
  • is dedicated to creating more value than they capture [1]
  • ensures that underrepresented perspectives have a voice
  • deals with the logistics of shared spaces (events, chats)
  • ensures that the values and principles in the Code of Conduct are respected
  • develops a plan of succession and redundancy in organizing responsibilities
  • insulates the community from negative or unwanted solicitations

I’ve had to great privilege to work with a bunch of other organizers and volunteers, and one of the key things I see make up for the aggregation of individual intents and sometimes dysfunctions is the willingness to come back and do/be better.

One thing that’s always a challenge is to maintain a pool of organizers greater than one or two active individuals. Think of it as resiliency engineering, a topic close to my personal wheelhouse. The more trustworthy and reliable individuals are involved, the more trustworthy and reliable the outcome.

Thrash vs. Progress

Everyone brings their own personal interests into a group equation. When the guiding principles include alignment and responsibility, it works really well for everyone. When someone puts their personal interests above the common good, all is not lost, but causes a lot of “thrash”. I define thrash as work which ultimately does not progress and improve the common good.

Some folks need to thrash around with new ideas and values, and this is okay too. Healthy synthesis of new ideas and values is important. Provided that there’s room for people to bring these ideas to the table and that progress on existing values is in play, we all win.

Getting Ahead of Toxicisity

I’ve worked in a number of places and communities where behavior is sub-par. When something is introduced which causes a body (or community) to make a backward motion towards it’s shared and the common good, particularly on a repeated basis, I would call this “toxic”. In a DevOps mindset, this can happen too, where anti-patterns such as burnout and distrust crop up because little details aren’t being addressed.

In DevOps communities, many are familiar with the Westrum model of organizational culture: pathological, bureaucratic, and generative. Though often misused to label groups and people at an aggregate level, the most useful employment of this lens IMO is around behaviors and activities. Those can be changed, those can be provided guidance where people have specific decisions to make.

Two examples from our local meetup come to mind: use of alcohol and meritocratic rhetoric. Though innocuous in small amounts, both quickly detract from community spirit and sense of safety in our culture.

In one thread, someone suggested that we bring our own bottles of scotch. I love my happy hours the same as warm-blood, but immediately before that, I witnessed and heard from a number of other community members that an unannounced and unplanned open bar at a meetup detracted from fostering a safe space. I quickly hopped on this, making sure that we all consider how this idea contributes to values in our Code of Conduct. Call me the culture cop, but after collaborating with other organizers on the approach, it was clear that we wanted to get ahead of where it was going and pivot the conversation back to the purpose of the community.

In another backchannel thread with community members, we discussed the rising trend of conversations assuming that everyone had the same opportunities leading to a notion that all people would be rewarded based on the merit of their works. In the current tech culture, meritocracy is rampant, most notably due to open-source mainstays such as the Linux Code of Conduct and revulsion of GNU founder R.M.S after a career of misogynistic and belligerent behavior. Not everyone can just walk up to a table and be valued by those there. As a common good of our community, we do our best to steer away from the ‘meritocracy lens’ and bring conversations back to a place that is useful for all members of the community.

Improvement Means Earning and Granting Trust

One of my biggest goals is to find someone better than me at these things that not only fulfills above bulleted organizer requirements but that I can trust to improve and grow the community in ways I can’t. Knowing that I probably won’t be able to keep up the same level of commitment to our community indefinitely, succeeding myself with other trustworthy individuals is not a zero-effort endeavor in the slightest. Nothing happens overnight, and the best time to do something truly good is always ‘now’.

It’s definitely been interesting to co-organize the meetup, facilitating events and threads, hopefully earning an amount of trust in our community about why I’m spending my time and energies on it. What I’ve learned is that granting someone trust is a very different proposition than earning it. One thing suggested to me in a backchannel was that “ally is a term someone uses on you, not something you “. After bouncing this around in the back of my mind for months, I’ve realized that I don’t need to focus on wait for people to ascribe community members (especially organizers) trust, we just need to do the best we can and demonstrate trustworthy behavior.

As one of 6 organizers of the yearly DevOpsDays Boston event, I was in a place to understand that forging an initial relationship with the Inclusive Tech Lab and Resilient Coders communities was a great direction (most props go to Laura Stone, another co-organizer of the meetup and event). The heart and spirit of these other groups align with the values and demonstrated levels of commitment that I know are required to drive the community forward.

Listening to and Integrating Feedback

Imagine all the times that someone spoke words that you didn’t understand…thing you weren’t in a place to receive. What a shame. What do you do about that?

Mostly always looking to get better, but one thing I’ve realized is that the moments my mouth is open should be marginal compared to time I spend hearing what others have to share. If doing it right, the times I speak should be guided by three questions:

  • does this have to be said (and why)?
  • does this have to be said by me?
  • does this have to be said right now?

Though everyone suffers from subjectivity (implicit in the above bullets), listening and empathizing with what you know about someone’s perspective increases the “you AND them” funnel for new and important ideas. Also, reading or listening to lots of books and blogs also widens perspective.

Get Involved, Be Good

Though not a requirement to attending, I’m surprised at the number of regular attendees who haven’t actually read the Code of Conduct but still naturally adhere to the values and principles within. I take this as a sign of common interest in the community but don’t take for granted the responsibility to make sure behaviors are inline.

If you’re thinking of helping, definitely reach out. If not, still, leave comments. I appreciate feedback and engagement on approach to facilitating in the community, always.

This is Why #DevOps

Since well before 2008, DevOps as a keyword has been growing steadily in mindshare. This post is my version of a landing page for conversations I have with folks who either know little about it, know a lot about it, or simply want to learn why DevOps is not a hashtag buzzword.

TL;DR: if you’re not interested in reading a few paragraphs of text, then nevermind. If you really must jump to my own articulation of DevOps, click here.

DevOps Is a Question, Not an Answer

The word “DevOps” is a starting point in my conversations with folks along the journey. Whether it’s over obligatory beers with local community members after a monthly meetup, in a high-pressure F100 conference room, in my weekly meetings with the IEEE, or internally in organizations I work with and accept benefits from, “DevOps” is often the start of a really important dialog. It’s a question more than a statement, at least the way I use it day to day, spiked with a small challenge.

It’s a question of “what are you doing now, and where do you want to be”. It’s a question of “how are the things you’re doing now incrementally improving, maybe compounding, your efforts”. It’s a question about how you remove both your backlog of toil and resolve the inbound toil that daily changes to software systems represents. It’s a question of “how do you chose to think about your work as you’re doing it?” DevOps is an improvement mindset, not just about code and configuration, but about organizational cohesion, communication, collaboration, and engineering culture.

DevOps Is More than “Developers and Operations”

Unlike those whose staunch opinions limit the scope of DevOps to refer to activities of developers and operations engineers, though managing scope creep is important, the agency and effect of DevOps are clearly intertwined with an organizations capability to also foster a set of core values and principles. For over a decade, we’ve seen some teams going so much faster than others that it doesn’t matter who’s slower, the result out to production is buggy, insecure, and unscaleable systems. This very common and dysfunctional clock speed mismatch also extends to testing, risk and compliance, budgeting (i.e. part of planning), architecture, and proper monitoring/telemetry/measurement practices.

Similarly, an ‘agile’ team find it very hard to be ‘agile’ without the buy-in of leadership, support from finance and legal, alignment with marketing and sales, and deep connection to its customers/stakeholders as well as a whole slew of other examples that underline how agile can’t be just a dev-centric worldview.

I hesitate to invent more terms such as ‘DevBizOps’, ‘DevTestOps’, and ‘DevSecOps’ simply to segment off these augmented versions of DevOps when these areas of concern should be integrated into systems delivery lifecycles, to begin with. I don’t want to conflate concepts, yet so many elements overlap, it’s hard to know how to have one conversation without having three others too. At the end of the day though, it’s as arbitrary to me that people invent new terminology as it is that some others chose to dig their fundamentalist heels in about exclusively keeping the scope of DevOps to two traditionally distinct groups, particularly when we see modern interpretations of DevOps blends skills and responsibilities to the point that we highly value (pay) SREs and “10x developers” because they defy narrowly scoped job opportunities in favor of cross-functional roles.

What are *My* Core Values and Principles of DevOps?

Again, this is a question you should seek to answer in your own way, but after only five years of immersion and purposefully seeking out other perspectives daily, a few key elements seem to hold:

  • Transparency
    • Observability
    • Traceability
    • Open Systems
    • Great Documentation
  • Inclusivity
    • Collaboration
    • Swarming / Blameless Fault Handling
    • Customer/User-focus
    • Prioritization to Value
  • Measurement
    • Building Quality In
    • High-value Feedback Loops
    • Useful Monitoring and Telemetry
    • Clear SLA/SLO/SLIs
  • Improvement
    • Automate AMAP and by default
    • Continuous Learning
    • Coaching and Mentoring
  • Alignment
    • Small Batches, Short-lived Change Cycles
    • Clear Processes (Onboarding, Approvals, Operationalizing, Patching, Disposal)
    • Work Contextualized to Biz Value/Risk
    • All Stakeholders Represented in Decisions

DevOps Is Inclusive by Nature

When dialed in, the values and principles of DevOps foster an environment of high-trust and safety, synthesize perspectives without losing focus, and balance personal and team capacity to compound individual contributions rather than burn out talented professionals. DevOps is not, as Audre Lord (via Kelsey Merkley) puts it, our “master’s tool”, so long as we don’t make it such; rather, if we decide that DevOps must not inherit the meritocracy and exclusivity of corporate management and Agilista methodology, it must be a truly better alternative than that which came before.

This is how DevOps can change things, this is why it must. New thinking and new voices provide a way out of tech monoculture. In order to improve a system, you need more than the one option you’ve already found faulty. This is why I personally support local non-profits like Resilient Coders of Boston. DevOps values and principles offer a far better chance for new voices to be at the table, at least the version I’m advocating.

DevOps Improves Organizations In-Place

Imagine if your teams, your colleagues, your leadership really internalized and implemented the core values and principles defined above. Doing so in finance, human resources, marketing, sales, and even [dare I say] board of trustees groups would cause a natural affinity for supporting systems engineering teams. Conversely too, developers about to change a line of code would be more likely to ask “how does this impact the customer and my organization’s goals?”, “what documentation do customers and sales engineers need to effectively amplify the new feature I’m developing?”, and “how could this process be improved for everyone?”.

There is an old stereotype of programmers being bad at “soft skills”, resulting in miscommunication, disconnect of work from business value, “not my problem” mentality, and throw-it-over-the-fence mindset. My perspective on DevOps is that none of these things would have the room to materialize in organizations that put processes in place to ensure the above values and principals are the norm.

Everyone can do these things, there’s nothing stopping us, unicorn to F100. Importantly, transitioning from what is currently in place to these values takes time. Since time is money, it takes money too. DevOps advocacy isn’t good enough to make the case for these changes and the spend that comes attached. No one developer or even team can change an organization, it takes Gestalt thinking and demonstrated value to get people ‘bought in’ to change.

Facilitating DevOps Buy-In

The first thing to get straight about DevOps is that it takes conversation and active engagement. This is not and never should be something you go to school or get a certificate for; it is a journey more akin to learning a martial art like Uechi-ryu than a course you pay thousands of dollars for at some tech industry Carnivale.

Collect those in your organization is interested in having a conversation about cultural implications of DevOps values and principles, using something lightweight like a lunchtime guild, a book club, or even a Slack channel. Listen to people, what they think and already know, and don’t assume your perspective is more accurate than theirs. Be consistent about when/what you do together to further the dialog, hold a local open spaces meetup (or find someone who does this, like me), and invite people from outside the engineering team scope, such as a VP or member of product management at your organization, and ASK THEM afterwards what they thought.

Once you have people from different levels engaging on similar wavelengths about DevOps, ask them to help each other understand what are some first tangible and tactical steps to improve the current situation based on some of the core values and principles either defined above or further crafted to fit your circumstances. Get a headline name in one or more regional DevOpsDays organizing committees to come visit as an outside perspective to what you’ve got going. And importantly, make time for this improvement. SEV1 incidents aside, there’s always some weekly space on everyone’s calendar that’s an optimal time to get together.

Or you can just ping me [at] paulsbruce [dot] io or on Twitter and we can figure out a good way for you to get started on your own DevOps journey. I’m always happy to help.

On Lack of Transparency in SaaS Providers

As many organizations transition their technical systems to SaaS offerings they don’t own or operate, I find it surprising that when a company acquires a 3rd-party offering deployed on said offerings, they are often told to “just trust us” about security, performance, and scalability. I’m a performance nerd, that and DevOps mindset are my most active areas of work and research, so this perspective is scoped to that topic.

In my experience amongst large organizations and DevOps teams, the “hope is not a strategy” principle seems to be missing in the transition from internal team speak and external service agreement. Inside a 3rd-party vendor, say Salesforce Commerce Cloud, I’m sure they very skilled at what they do (I’m not guessing here, I know folks who work in technical teams in Burlington MA). But even espousing a trust-but-verify culture internally, when your statement to customers who are concerned about performance at scale of your offering is “just trust us”, seems maligned.

TL;DR: SaaS Providers, Improve Your Transparency

If you provide a shared tenancy service that’s based on cloud and I can’t acquire service-level performance, security audits, and error logs that are isolated to my account, it’s a transparent view into how little your internal processes (if they even exist around these concerns) actually improve service for me, your customer.

If you do provide these metrics to internal [product] teams, ask “why do we do that in the first place?” Consider that the same answers you come up with almost always equally apply to those external consumers who pay for your services that are also technologists, have revenue on the line, and care about delivering value successfully with minimal issues across a continuous delivery model.

If you don’t do a good job internally of continuously measuring and synthesizing the importance of performance, security, and error/issue data, please for the love of whatever get on that right now. It helps you, the teams you serve, and ultimately customers to have products and services that are accurate, verifiable, and reliable.

How Do You Move from “Trust Us” to Tangible Outcomes?

Like any good engineer, when a problem is big or ambiguous, start breaking that monolith up. If someone says “trust us”, be specific about what you’re looking to achieve and what you need to do that, which puts the ownness on them to map what they have to your terms. Sometimes this is easy, other times it’s not. Both are useful areas of useful information, what you do know and what you don’t. Then you can double-click into how to unpack unknowns (unknowables) in the new landscape.

For SaaS performance, at a high level we look for:

  • Uptime and availability reports (general) and the frequency of publication
  • Data on latency, the more granular to service or resource the better
  • Throughput (typically in Mbps etc.) for the domains hosted or serviced
  • Error # and/or rate, and if error detail is also provided in the form of logs
  • Queueing or otherwise service ingress congestion
  • Some gauge or measure of usage vs. [account] limits and capacity
  • Failover and balancing events (such as circuit breaks or load balancing changes)

You may be hard-pressed to expect some of these pieces of telemetry provided in real-time from your SaaS provider, but they serve as concrete talking points of what typical performance engineering practices need to verify about systems under load.

Real-world Example: Coaching a National Retailer

A message I sent today to a customer, names omitted:

[Dir of Performance Operations],

As I’m on a call with the IEEE on supplier/acquirer semantics in the context of DevOps, it occurs to me that the key element missing in [Retailer’s] transition from legacy web solution last year to that which is now deployed via Commerce Cloud, the lack of transparency (or simply not asking on our part) over service underpinnings is a significant risk, both in terms of system readiness and unanticipated costs. My work with the standard brought around two ideas in terms of what [Retailer] should expect from Salesforce:

A) what their process is for verifying the readiness of the services and service-level rendered to [Retailer], and

B) demonstrated evidence of what occurs (service levels and failover mechanisms) under significant pressure to their services

In the past, [Retailer’s] performance engineering practice had the agency to both put pressure on your site/services AND importantly how to measure the impact on your infrastructure. The latter is missing in their service offering, which means that if you run tests and the system results don’t meet your satisfaction, the dialog to resolve them with Salesforce lacks minimum-viable technical discussion points on what is specifically going wrong and how to fix it. This will mean sluggish MTTR and potentially synthesizing the expectation of longer feedback cycles into project/test planning.

Because of shared tenancy, you can’t expect them to hand over server logs, service-level measurements, or real-time entry points to their own internal monitoring solutions. Similarly, no engineering-competent service provider can reasonably expect for consumers to “just trust” that an aggregate product-plus-configuration-plus-customizations solution will perform at large scale, particularly when mission-critical verification was in place before fork-lifting your digital front door to Salesforce. We [vendor] see this need for independent verification of COTS all the time across many industries, despite a lack of proof of failure in the past.

My recommendation is that, as a goal of what you started by creating a ticket with them on this topic, we should progressively seek to receive thorough information on points A and B above from a product-level authority (i.e. product team). If that’s via a support or account rep, that’s fine, but it should be adequate for you to be able to ask more informed questions about architectural service limits, balancing, and failover.

//Paul

What Do You Think?

I’m always seeking other perspectives that my own. If you have a story to tell, question, or otherwise augmentation to this post, please do leave a comment. You can also reach out to me on Twitter, LinkedIn, or email [“me” -at– “paulsbruce” –dot- “io”]. My typical SLA for latency is less than 48hrs unless requests are malformed or malicious.

Performance Engineer vs. Tester

A performance engineer’s job is to get things to work really, really well.

Some might say that the difference between being a performance tester and a performance engineer boils down to scope. The scope of a tester is testing, to construct, execute and verify test results. An engineer seeks to understand, validate, and improve the operational context of a system.

Sure, let’s go with that for now, but really the difference is an appetite for curiosity. Some people treat monoliths as something to fear or control. Others explore them, learn how to move beyond them, and how to bring others along in the journey.

Testing Is Just a Necessary Tactic of an Engineer

Imagine being an advisor to a professional musician, their performance engineer. What would that involve? You wouldn’t just administer tests, you would carefully coach, craft instruction, listen and observe, seek counsel from other musicians and advisors, ultimately to provide the best possible path forward to your client. You would need to know their domain, their processes, their talents and weaknesses, their struggle.

With software teams and complex distributed systems, a lot can go wrong very quickly. Everyone tends to assume their best intentions manifest into their code, that what they build is today’s best. Then time goes by and everything more than 6 months old is already brownfield. What if the design of a thing is already so riddled with false assumptions and unknowns that everything is brownfield before it even begins.

Pretend with me for a moment, that if you were to embody the software you write, become your code, and look at your operational lifecycle as if it was your binary career, your future would be a bleak landscape of retirement options. Your code has a half-life.

Everything Is Flawed from the Moment of Inception

Most software is like this…not complete shit but more like well-intentioned gift baskets full of fruits, candies, pretty things, easter eggs, and bunny droppings. Spoils the whole fucking lot when you find them in there. A session management microservice that only starts to lose sessions once a few hundred people are active. An obese 3mb CSS file accidentally included in the final deployment. A reindexing process that tanks your order fulfillment process to 45 seconds, giving customers just enough time to rethink.

Performance engineer doesn’t simply polish turds. We help people not to build broken systems to begin with. In planning meetings, we coach people to ask critical performance questions by asking those questions in a way that appeals to their ego and curiosity at a time that’s cost effective to do so. We write in BIG BOLD RED SHARPIE in a corner of the sprint board what the percentage slow-down to the login process the nightly build as now caused. We develop an easy way to assess the performance of changes and new code, so that task templates in JIRA can include the “performance checkbox” in a meaningful way with simple steps on a wiki page.

Engineers Ask Questions Because Curiosity Is Their Skill

We ask how a young SRE’s good intentions of wrapping u statistical R models from a data sciences product team in Docker containers to speed deployment to production will affect resources, how they intend on measuring the change impact so that the CFO isn’t going to be knocking down their door the next day.

We ask why the architects didn’t impose requirements on their GraphQL queries to deliver only the fields necessary within JSON responses to mobile app clients, so that developers aren’t even allowed to reinvent the ‘SELECT * FROM’ mistake so rampant in legacy relational and OLAP systems.

We ask what the appropriate limits should be to auto-scaling and load balancing strategies and when we’d like to be alerted that our instance limits and contractual bandwidth limits are approaching cutoff levels. We provide cross-domain expertise from Ops, Dev, and Test to continuously integrate the evidence of false assumptions back into the earliest cycle possible. There should be processes in place to expose and capture things which can’t always be known at the time of planning.

Testers ask questions (or should) before they start testing. Entry/exit criteria, requirements gathering, test data, branch coverage expectations, results format, sure. Testing is important but is only a tactic.

Engineers Improve Process, Systems, and Teams

In contrast, engineering has the curiosity and the expertise to get ahead of testing so that when it comes time, the only surprises are the ones that are actually surprising, those problems that no one could have anticipated, and to advise on how to solve them based on evidence and team feedbacks collected throughout planning, implementation, and operation cycles.

An engineer’s greatest hope is to make things work really, really well. That hope extends beyond the software, the hardware, and the environment. It includes the teams, the processes, the business risks, and the end-user expectations.

Additional Resources: