Jekyll2022-06-21T22:50:08+00:00https://paulsbruce.io/feed.xmlpaulsbruce.io (input) => outputThoughts on tech, culture, and humanity. My own opinions.OpenTelemetry Community Day Austin 20222022-06-21T21:13:41+00:002022-06-21T21:13:41+00:00https://paulsbruce.io/blog/2022/06/opentelemetry-community-day-austin<p>Preface: this blog post is just my travel log, personal reflections, and thoughts
from my time conversing with other community members at <a href="https://events.linuxfoundation.org/open-telemetry-community-day/">OpenTelemetry Community Day
in Austin on June 20th, 2022</a>. If any corrections or retractions need be made, <a href="http://localhost:4000/contact/">let me know</a> and I’ll be happy to do so!</p>
<p><img src="/assets/images/2022/06/20220619_105324.jpg" alt="Before conferencing, debauchery of the best kind" /></p>
<ul>
<li><a href="#tldr---one-community-day-is-not-enough">TL;DR - One Community Day Is Not Enough!</a></li>
<li><a href="#what-is-opentelemetry-in-three-sentences">What is OpenTelemetry in Three Sentences</a></li>
<li><a href="#why-opentelemetry-community-rocks">Why OpenTelemetry Community Rocks</a></li>
<li><a href="#session-takeaways">Session Takeaways</a></li>
<li><a href="#after-thoughts-from-austin-back-to-boston">After-thoughts from Austin back to Boston</a></li>
<li><a href="#summary">Summary</a></li>
</ul>
<h1 id="tldr---one-community-day-is-not-enough">TL;DR - One Community Day Is Not Enough!</h1>
<ul>
<li>OTel adoption, use cases, and its community continue to grow rapidly</li>
<li>Those that could show up, did, and it was worth the trip</li>
<li>Need more consumer uses/lessons presentations, but…</li>
<li>…the un-conference afternoon sessions were fantastic!</li>
<li>The project NEEDS more contributors, to everything really</li>
<li>Regional and regular meetups on OTel will carry the community forward</li>
</ul>
<h1 id="what-is-opentelemetry-in-three-sentences">What is OpenTelemetry in Three Sentences</h1>
<p>“OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior. OpenTelemetry is generally available across several languages and is suitable for use.” - <a href="https://opentelemetry.io">opentelemetry.io</a></p>
<p>The above three sentences are a great start, but of course there’s more to it than that:</p>
<ul>
<li>It provides a <strong><em>vendor-agnostic model for emitting traces and metrics</em></strong> (soon logs and profiling data) from systems</li>
<li>Once implemented, it provides a staggering and necessary amount of backward and forward compatibility to not only its own componentry, but to a plethora of:
<ul>
<li>back-ends (a.k.a. DIY or hosted storage and insights solutions</li>
<li>processing filters (pipelines)</li>
<li>languages / runtimes / frameworks to encourage visibility into new and legacy systems</li>
<li>telemetry transformations</li>
</ul>
</li>
<li>It is a project in the CNCF that as of <a href="https://www.cncf.io/blog/2021/08/26/opentelemetry-becomes-a-cncf-incubating-project/">August 26, 2021</a> is Incubating (next is graduated)</li>
<li>It includes auto-instrumentation packages for top popular runtimes including (but not limited to):
<ul>
<li><a href="https://opentelemetry.io/docs/instrumentation/java/automatic/">Java</a></li>
<li><a href="https://opentelemetry.io/docs/instrumentation/python/automatic/">Python</a></li>
<li><a href="https://opentelemetry.io/docs/instrumentation/ruby/automatic/">Ruby</a></li>
<li><a href="https://opentelemetry.io/docs/instrumentation/net/automatic/">.NET</a></li>
<li><a href="https://opentelemetry.io/docs/instrumentation/js/getting-started/nodejs/#instrumentation-modules">Node.js</a></li>
</ul>
</li>
<li>Its many parts are maintained by many really fantastic individuals, some working for/with vendors, others working as consumers and community contributors</li>
<li>As its key componentries intersect many languages and runtimes, various SDKs are a changing tapestry of readiness <a href="https://opentelemetry.io/status/">statuses</a></li>
</ul>
<p><img src="/assets/images/2022/06/20220620_091204.jpg" alt="The least-complicated view of the complex landscape of OTel componentry statuses" /></p>
<h1 id="why-opentelemetry-community-rocks">Why OpenTelemetry Community Rocks</h1>
<p>In 2018-2019 I had a number of conversations (<a href="https://www.youtube.com/watch?v=1O6hO8YLDwA">re</a>) with contributors to the OpenCensus and OpenTracing projects as they were realizing they needed to combine efforts into one project, OpenTelemetry. Unlike other online and project-driven “communities” I’ve experienced in the past, there was a sense of vendor-neutrality (though there’s plenty of vendors in the observability space) and that this was an idea whose time had come.</p>
<p>Since then, my contribution has been to cultivate an inclusive and non-insular community of people that want to learn, share, and (maybe even) contribute back to the OpenTelemetry project. For three consecutive years, I’ve been organizing <a href="https://o11yfest.org">o11yfest</a>, an observability-focused and community-driven event. Each year, <a href="/blog/2021/12/accounting-for-privilege/">we donate equal or greater than the operating budget</a> of the conference to some <a href="https://o11yfest.org/sponsor#contributor-sponsorship-details">really good causes</a>. It is purposely NOT part of the CNCF, as we want it to remain independent from that financial machine.</p>
<p>We also encourage contribution from everyone, not just voices from the CNCF and OpenTelemetry project, and put a fair bit of effort into <a href="https://o11yfest.org/sponsor#call-for-proposals">screening out vendor/sales pitches</a> and blah blahs. This year (2022) in May, we saw more attendee engagement than the past two years, and even had a number of people submit <a href="https://o11yfest.org/2022/preaction">“pre-action” videos</a> since all of our pre-recorded talks were available before the conference.</p>
<p>Community is what you make it. So I’m doing my part, what I can, and looking forward to encouraging others to do so as well. In the next coming months, we’ll be partnering with regional OpenTelemetry and observability meetup groups to double-down on contributing (hopefully and specifically about helping OpenTelemetry with their documentation needs).</p>
<h1 id="session-takeaways">Session Takeaways</h1>
<h2 id="keynote-state-of-the-opentelemetry-community">Keynote: State of the OpenTelemetry Community</h2>
<p>Alolita started out with some project milestones and history.</p>
<p><img src="https://pbs.twimg.com/media/FVs7MxgVIAAdTVN?format=jpg" alt="OpenTelemetry Milestones" /></p>
<ul>
<li>Metrics delayed b/c … priorities and proper core</li>
<li>A variable landscape of readiness, with logs on the horizon</li>
<li>OTel is growing! - how “community” is measured, only contributions?</li>
</ul>
<p><img src="https://pbs.twimg.com/media/FVs7WtfUsAAkmzO?format=jpg" alt="Contribution Sources" /></p>
<ul>
<li>“Independent” contribution group adds up</li>
<li>“Compliance tests” (re: prometheus collaborations)</li>
<li>Semantic Conventions, EU Feedback, Client Telemetry (front and back end signals)</li>
<li>Client Instrumentation, Agent Management, and Profiling (re: <a href="https://pyroscope.io">Ryan Perry, Pyroscope.io</a>!)</li>
<li>So [maybe too] many ways to get involved; priorities and batch-fit to bandwidth
<ul>
<li>Running OpenTelemetry meetups and publishing to the otel site</li>
</ul>
</li>
</ul>
<h2 id="community-updates">Community Updates</h2>
<p>OTel Comms SIG lead Austin Parker shared additional/complimentary info and thoughts
about the community at large</p>
<ul>
<li>It’s been to years since the last (virtual) OTel Community Day, almost all new faces</li>
<li>Growth! 2x contributors, 3x contributions</li>
<li>“Public Adopters”; and info shared by “vendor-partners”</li>
<li>“Documentation needs help!” - CNCF Slack #otel-comms</li>
<li>CNCF now accepting <a href="https://www.cncf.io/people/ambassadors/">“OpenTelemetry Ambassadors”</a></li>
</ul>
<p><img src="/assets/images/2022/06/20220620_093444.jpg" alt="Community Update, Austin Parker, 2x growth in contributors, 3x growth in contributions. Impressive." /></p>
<h2 id="unconference-session-topics">Unconference Session Topics</h2>
<h2 id="maintainers-panel">Maintainers Panel</h2>
<p><img src="/assets/images/2022/06/20220620_110008.jpg" alt="Panel: Austin, Daniel Dyla, Jack Berg, Aaron Clawson, and Amir Blum" /></p>
<ul>
<li>challenges
<ul>
<li>??? (something I missed, need to coordinate with others)</li>
<li><a href="https://github.com/open-telemetry/opentelemetry-collector-contrib">contrib repo</a>: how is it maintained, patches applied, etc.
<ul>
<li>single owners for specific instrumentation libraries</li>
</ul>
</li>
<li>not enough engineering hours
<ul>
<li>note: how would we incentivize contributions (not just usage and issues)</li>
</ul>
</li>
</ul>
</li>
<li>Q: how do you relate <a href="https://cloud-native.slack.com/archives/C03JFUAJXT4/p1655738403456849">slack</a>
<ul>
<li>Dan: bubble of “enlightened”…??? (will reach out to Dan in CNCF for actual point made)</li>
<li>Jack: if your biz has dependencies on an OSS thing, it’s strategic to have eng knowledge via contrib</li>
<li>Aaron: start small, even 10% of time is a big ask; easier to “made a fix, can I release?”</li>
<li>Amir: I don’t do it for the vendor…infers trust over “planned work vs. bandwidth”</li>
</ul>
</li>
<li>Q: Biggest challenge getting library authors to instrument their libraries natively…? <a href="https://cloud-native.slack.com/archives/C03JFUAJXT4/p1655738729865949">slack</a>
<ul>
<li>Amir: implementing natively right now may be…complicated…as some things aren’t yet GA’d</li>
<li>Ted (Young): not quite yet, need things to be stable</li>
<li>Dan: wouldn’t want to ask someone to make effort only to break it for them later</li>
</ul>
</li>
<li>NOTE (Austin): need help with OTel Collector and output being ‘certified’ in supply-chain</li>
<li>Q: What have been the biggest challenges with implementing the API and SDK across the supported languages in a standardized way? How much does the SDK configuration feel ‘native’ to the language?
<ul>
<li>Dan: forcing things like naming conventions</li>
<li>Jack: the spec is more ergonomic guidance</li>
</ul>
</li>
<li>Q (Audience): [observability at Adobe] For other SIGs, planning to add auto-instrumentation as standard across
<ul>
<li>Jack: (Trask? should have been here, Java auto-instrumentation)</li>
<li>Dan: When APIs aren’t stable yet, any change means changing lots of dependencies</li>
<li>Dan: JS backwards-compat esp. around Node packages has been tricky; we’re working on it</li>
<li>Aaron: it’s tough for Go, can’t patch just wrap things; it’s a monumental task, maybe wrapping is the best</li>
<li>Dan: the Operator DOES do some automatic injection…Node, Python, Java</li>
</ul>
</li>
<li>Q: What’s the recommendation on PULL/PUSH based methods, and how does OpenMetrics fit in?
<ul>
<li>Aaron: if you have something that’s working, use that. OTel collector is a bridge between app and backends.</li>
<li>Dan: challenge the assumption that Pull-based “won”, common deployment is to chain collectors</li>
<li>Jack: The collector is your friend, it’s a translator and an enricher of telemetry</li>
<li>Dan: about interoperability, we’ve been working a lot with Prometheus team closely</li>
</ul>
</li>
<li>Q: If you *did have all the contributor bandwidth you could desire, how would it be put to best use?
<ul>
<li>Dan: seeking and placing diversity of skills and expertise that isn’t already in the project</li>
<li>Amir on ease-of-adoption, which I think is already starting to be changed via the End-user Feedback group.</li>
</ul>
</li>
</ul>
<h2 id="opentelemetry-and-service-meshes">OpenTelemetry and Service Meshes</h2>
<p><img src="/assets/images/2022/06/20220620_110634.jpg" alt="Michael Haberman being awesome in a very short period of time" /></p>
<p>After beers with the Aspecto team on Sunday at <a href="https://www.easytigerusa.com/">Easy Tiger</a>, it was clear that these guys are on a mission to put a [much needed] dent in our universe. On the walk back, Eran talked about how being a founder in this space requires an extreme amount of focus, which IMO they have and are demonstrating on multiple fronts.</p>
<p>Without betraying any particular details, the conversation revolved around themes like “how contributing to OTel isn’t at all about what your employer pays you to do”, “how much ‘magic’ people expect without doing anything”, regulated industry constraints with adopting recent/moving/early OTel components, and how much interest there is in the thriving observability community in Tel Aviv.</p>
<p>The short is, for a Boston boy like me, these guys know their shit and work hard as fuck.</p>
<p>Michael, who coincidentally <a href="https://o11yfest.org/speakers/michael-haberman">spoke about Malabi at o11yfest in Ma</a>y, made the following great points:</p>
<ul>
<li>just because you turn on traces doesn’t mean that you are doing tracing</li>
<li>traces without shared context are just more noise to signal ratio</li>
<li>the context added via mesh about service paths is critical to understanding versioned routing</li>
<li>deployed in a service mesh, the mesh IS part of the app, no just wrapping</li>
<li>trace “brokenness” includes no root span; gaps in traces indicate a critical lack of context propagation</li>
<li>bad news:
<ul>
<li>increase cost (more spans)
<ul>
<li>what about selective/adaptive propagation?</li>
</ul>
</li>
<li>head sampling (only percentage, thus some waste)
<ul>
<li>based on root from client? - lots of work</li>
</ul>
</li>
<li>Configuration (Envoy changes needed to export OTLP)
<ul>
<li>Maybe next year, we’ll have it all solved</li>
</ul>
</li>
</ul>
</li>
<li>what’s interesting about meshes
<ul>
<li>many companies he talks to who are using meshes, are also using an ecosystem of additional functionality</li>
<li>just like plugins, they can cause issues when combined together</li>
<li>which means we NEED better telemetry and tracing</li>
</ul>
</li>
</ul>
<p><img src="/assets/images/2022/06/20220620_134449.jpg" alt="Aspecto founders, Eran Grabiner and Michael Haberman, and the very excellent OTel maintainer, Amir Blum" /></p>
<h2 id="personal-hiring-break">Personal: Hiring Break</h2>
<p>I then took time to meet with a late-stage candidate for Product Manager of our Incubation Engineering group.
<a href="https://www.tricentis.com/company/careers/">My group</a> got to an offer acceptance, so that was a really nice lift going in to the afternoon sessions.</p>
<p><img src="/assets/images/2022/06/2022-06-21-closed-on-inceng-pm.png" alt="This happened because we volitioned it into the universe" /></p>
<h2 id="lunch-and-networking">Lunch and Networking</h2>
<p>Since the event was somewhere between 50 and 75 attendees, I found it very easy to be cycling back through the clusters of folks who I had already chatted up, all within about 45 mins. This was actually nice as a forcing function to really strike up or contribute to meaningful dialog, since there was very little room to wander.</p>
<p><img src="/assets/images/2022/06/20220620_085532.jpg" alt="Sharr Creeden, Henrik Rexed and me!" /></p>
<h2 id="debugging-opentelemetry">Debugging OpenTelemetry</h2>
<p>Sadly, having had to take a call, I the first part of this lightning talk from Ted Young. However, subsequent comments from other attendees confirmed what everyone already knows which is Ted is awesome and 80% of the time he speaks it is useful…all the time. <a href="https://o11yfest.org/speakers/ted-young">His 2021 o11yfest presy about The Value of Design in OpenTelemetry</a> stands as true amidst a year of major efforts to GA as it did back then.</p>
<p>However, I spent the better part of the afternoon in un-conference discussions
with Ted and other attendees about Tracing and Testing topics, so while I missed
his most recent incarnation of coolness, and his body of work speaks for itself.</p>
<h2 id="breakout-sessions">Breakout Sessions</h2>
<h3 id="ebpf-and-auto-instrumentation">eBPF and auto-instrumentation</h3>
<p>The best part of the day, seriously, but as open and engaged conversations go, I
was more invested in the people in the room than taking notes on my laptop.</p>
<p>The short is that most people in the room needed a simple description of what <a href="https://ebpf.io/">eBPF</a> is
and very few people who were versed on the topic. Key taking points are:</p>
<ul>
<li>lots of buzzword interest in “eBPF” as an emerging topic</li>
<li>many people looking for a magic solve to “observability” via auto-instrumentation</li>
<li>how does eBPF improve the auto-instrumentation motion</li>
<li>auto-instrumentation…does it help teams adopt better telemetry or impede intentional thought</li>
<li>how auto-instrumentation is a stepping stone to make teams want more…and do more to get it</li>
</ul>
<p>Fortunately, we had an amazing person (Libby) who volunteered to capture topics on
the sketch board.</p>
<p><img src="/assets/images/2022/06/20220620_143354.jpg" alt="Topics and key points from the discussion on eBPF and auto-instrumentation" /></p>
<h3 id="sessions-i-didnt-attend-but-would-want-to">Sessions I didn’t attend (but would want to)</h3>
<p>Sadly, I cannot physically be in two rooms at once. Hopefully there was a recording
of this discussion, but if not and someone has notes, I would LOVE to link to them
from this post here.</p>
<ul>
<li>Intermediate OpenTelemetry and Signal Correlation</li>
<li>Continuous Profiling</li>
<li>OpenTelemetry Collector</li>
</ul>
<h3 id="tracing--open-telemetry-for-testing">Tracing / Open Telemetry for Testing</h3>
<p>Again, being invested in the conversation and capturing meaningful topics was more
important than taking copious notes. As such, I did take the cue from Libby and
make sure that at least some of the key points of the discussion were documented:</p>
<p><img src="/assets/images/2022/06/20220620_160056.jpg" alt="Key discussion points about OpenTelemetry and Testing" /></p>
<h1 id="summary">Summary</h1>
<p>Worth the trip. Lots going on in this space. Needs contributors, not simply adopters.
Get involved. Reach out to me or any of the visible core maintainers. Expect growth.
Come be part of the community as it thrives.</p>
<!--
# After-thoughts from Austin back to Boston
- OTel is a way for tools to truly be "part of the chain"
- you must respect inbound, not just outbound context
- Get involved in the End-User Feedback and Profiling SIGs
- Profiling: Pixie is the big player, but Pyroscope is fully OSS -->{"login"=>"pbruce", "display_name"=>"Paul Bruce", "first_name"=>"Paul", "last_name"=>"Bruce"}Preface: this blog post is just my travel log, personal reflections, and thoughts from my time conversing with other community members at OpenTelemetry Community Day in Austin on June 20th, 2022. If any corrections or retractions need be made, let me know and I’ll be happy to do so!Telemetry to Transform Testing2022-03-25T19:13:41+00:002022-03-25T19:13:41+00:00https://paulsbruce.io/blog/2022/03/telemetry-to-transform-testing<p><a href="https://www.skilupdays.io/cicd-22/agenda/session/873478">Sign up to access the full broadcast</a></p>
<p>[embed]https://youtu.be/Ju168RTf8dc[/embed]</p>
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRdp3lLf2LYbYdmD34ALYWoXpb6w2FPbGeaE_N7YvvC0h07hZTbRgz9hl5woQ6qEFCdXVy5sKaNJFg3/embed?start=true&loop=true&delayms=3000" frameborder="0" width="480" height="299" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>
<h1 id="session-title">Session Title</h1>
<p>Transform Your Continuous Testing with (Open)Telemetry</p>
<h1 id="session-description">Session Description</h1>
<p>Except for instrumented unit testing, it’s often really hard to know what’s exactly
what’s going wrong when your tests fail, especially when our systems
are now highly distributed and involved multiple APIs, micro-frontends, and 3rd-party
services. Versioning across these dependencies and complex rollout processes also
further obfuscate what the heck is really going wrong when your tests fail.</p>
<p>Enter “telemetry”, and specifically OpenTelemetry. Technology that emits contextual
and timeseries-ready data about what’s going on dramatically improve everyone’s
ability to isolate, diagnose, and resolve issues quickly.</p>
<p>BOTH systems AND tests that share modern, distributed context such as OpenTelemetry
span and baggage details transform testing into more precise and actionable feedback.
Come learn about how to inject context back into your work in this session.</p>
<h1 id="three-key-takeaways">Three Key Takeaways</h1>
<ul>
<li>How additional context dramatically improves actionable outcomes of testing</li>
<li>How OpenTelementry applies to both software systems AND testing processes</li>
<li>How to get started using OpenTelemetry in your code bases</li>
</ul>
<h1 id="speaker-bio">Speaker Bio</h1>
<p>Paul Bruce is a passionate technologist, helping to transform enterprise software teams and delivery practices. He chairs o11yfest (May 9-12), volunteers locally, skateboards, and co-organizes DevOpsDays Boston and the Boston DevOps community. His technical research wheelhouse includes cloud management, high availability service architecture, API design and experience, continuous testing at scale, and organizational learning frameworks. He writes, listens, and teaches about software delivery patterns in enterprises and key industries around the world. Oh, and he’s hiring devs for his incubation engineering team! You can read more at: https://paulsbruce.io</p>{"login"=>"pbruce", "display_name"=>"Paul Bruce", "first_name"=>"Paul", "last_name"=>"Bruce"}Sign up to access the full broadcast3Vs to Transform Testing - Verification, Validation, Volition2022-03-09T09:13:41+00:002022-03-09T09:13:41+00:00https://paulsbruce.io/blog/2022/03/verification-validation-volition<p>NOTE: This is a placeholder article to house slides, comments, and notes related to
my presentation for <a href="https://tsqa.org/schedule-of-events-tsqa-2022-conference">TSQA 2022 Wild, Wild Test</a> on 2021-03-09.</p>
<ul>
<li><a href="https://docs.google.com/presentation/d/1ikKk7i9iuvvYV0Leb0k1xi9WNWzVdaZYkJJLxp71CYo/edit?usp=sharing">Slides on Google</a></li>
</ul>
<p>Once the video is made available, I will link to it here.</p>{"login"=>"pbruce", "display_name"=>"Paul Bruce", "first_name"=>"Paul", "last_name"=>"Bruce"}NOTE: This is a placeholder article to house slides, comments, and notes related to my presentation for TSQA 2022 Wild, Wild Test on 2021-03-09.Defining ‘Developer’2021-12-29T11:43:21+00:002021-12-29T11:43:21+00:00https://paulsbruce.io/blog/2021/12/defining-developer<p>It may sound like an unnecessary errand in 2021 to have to define what “developer” (<a href="https://en.wikipedia.org/wiki/Programmer">“programmer”</a>) means, but once again I find myself in the complimentary and contradictory position of having to disambiguate what people in software mean when they casually say “dev” or “developer” in rooms of like-minded individuals.</p>
<h1 id="tldr">TL;DR</h1>
<p>A ‘developer’ is someone who contributes functionality, typically in the form of code, to a product or service that runs in production to drive the material goal of an organization. This usually means:</p>
<ul>
<li>Translating user stories into running software</li>
<li>Further detailing requirements of above stories</li>
<li>Estimating and/or preparing to implement changes</li>
<li>Primarily adding new, then secondarily updating or fixing existing, functionality</li>
<li>Writing unit tests which verify core code components of the above changes</li>
<li>Consulting with other stakeholders (architects, users, operations engineers)</li>
<li>Describing the material operating conditions of the functionality</li>
</ul>
<p>It does not primarily mean, though sometimes involves:</p>
<ul>
<li>writing pipeline or infra-as code</li>
<li>whiling hours away trying to get Kubernetes to do what you want it to do</li>
<li>writing functional and <del>non-functional</del> operational tests</li>
<li>clicking around through cloud consoles like AWS/Azure/GCP to get software to work</li>
</ul>
<p>In short, the ‘magic’ is consuming whatever it takes to produce software. I dare ‘test-ers’, ‘DevOps’, or product managers/owners to do this. I fucking dare them, and if they do, it will still be something a professional programmer has to rewrite so that it A) makes sense to anyone else, B) is maintainable once shipped, and C) makes everyone money.</p>
<p><img src="/assets/images/2021/12/code-coffee-magic.png" alt=""How Coffee Works: Coffee goes in mouth, magic inside the body turns it into code"" /></p>
<h1 id="a-developers-best-local-goal">A Developer’s Best (Local) Goal</h1>
<p>The best goal of a developer is to safely and efficiently translate user needs into working software that achieves their organization’s goal, preferably better than other attempts to do the same in other industryAnd examples. This is a <a href="https://medium.com/prodopsio/devops-theory-of-constraints-cf1477f9bd1a#:~:text=A%20decision%20made%20based%20on%20local%20optima%20will%20in%20general%20not%20be%20as%20bad%20as%20a%20random%20one">“local optima”</a> or local goal and contributes to their team’s less local one, which is the same as above, but with additional liberties and constraints that change the dynamics of how best to accomplish the goal.</p>
<h2 id="a-developers-disoptimal-approach">A Developer’s Dis/Optimal Approach</h2>
<p>Regarding the latter, though every individual in the team is subject to personal ambitions and inhibitions, it is <a href="https://www.stellarperformance.com/articles/advice/stellar-leaders/politics-the-dirty-word.html">shitty politics</a> to live by them. In the past, I’ve experienced colleagues who do this, and not only do they contribute to <a href="https://cloud.google.com/architecture/devops/devops-culture-westrum-organizational-culture">pathological</a> outcomes, they undermine their own effectiveness and sense of satisfaction from their work.</p>
<p>Most developers, well, we work in teams. Even if we are “ICs” (individual contributors), we are surrounded by others that need what we produce and need us to be at our best, not just from an individual perspective, but for others around us. If you doubt this perspective, consider trying to get anything done without having working relationships with other team members that help you do the above and below bulleted activities.</p>
<p>So, from a tactical perspective, yes, the above bullet points about what a developer does represents the 80% under the Bell curve of what IMO a developer does. As a life-long programmer (skill and curiosity-driven, not vocational ‘developer’), and if you are in fact a developer, you may find yourself doing many other things like:</p>
<ul>
<li>screwing around with an issue tracking systems</li>
<li>contributing your thoughts to peer reviews</li>
<li>playing video or board games with some colleagues</li>
<li>learning about new technologies, usually on your own time/dime</li>
<li>defending your position against product managers/owners who skip <a href="https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disaster#O-ring_concerns">O-ring</a> warnings</li>
</ul>
<p>This doesn’t change the fact that if your primary goal is aligned with the above <a href="#tl-dr">TL;DR</a>, you are what I consider a ‘developer’. You are not a ‘tester’ (though sometimes you write tests) or a ‘product manager’ (though sometimes you have to fill knowledge gaps) or an accountant or a physicist or porpoises or even a manager (though you may have direct reports or lead a team of engineers)…the direct output of your work is <em>‘working software that achieves the organization’s goal’</em>.</p>
<h1 id="the-organizations-global-goal">The Organization’s (Global) Goal</h1>
<p>For most organizations, even non-profit ones, what looks like a developer doing their job well <a href="https://dansilvestre.com/the-goal-eliyahu-goldratt/#:~:text=The%20goal%20of%20any%20organization,making%20money%20are%20non%2Dproductive.">is to produce software that makes or receives money</a>, or in some indirect but immediate way contribute to that end. Code can be many things, compilable or interpreted, imperative or declarative (or functional, etc.), sometimes even supportive of others doing those things (such as in the case of infrastructure-as-code for the release process). But never forget that:</p>
<blockquote>
<p>The Goal of every organization is to make money, therefore so is yours</p>
</blockquote>
<p>For any given profitable startup in <a href="https://www.cbr.com/cowboy-bebop-ein-origin-netflix/">“zoos next to the dragon sanctuary and unicorn exhibit”</a>, or even run-of-the-mill enterprise Fortune 100 that can retain long-term talent (equally rare), revenue-as-code isn’t hard to argue about. But for NPOs, let me walk you through it.</p>
<p><img src="/assets/images/2021/12/cowboy-bebop-faye-ein.jpeg" alt="Netflix is truly chicken-shit for cancelling Cowboy Bebop after the first season...or playing a long game" /></p>
<p>Two sentient beings that are moving towards a common goal need to work together. To register as a non-profit, at least in most countries that allow for it, you need a ‘board’…a group of individuals that can agree on a charter…and a charter, which includes goals. Non-profits either burn out on an all-volunteer model or hire/retain qualified [enough] individuals to drive and achieve the charter mission, or iterate on what should-be/is achievable at any given phase of the NPO. If an NPO is truly worth existing for any matter of time, money comes into play, either in management thereof or in order to compensate for said expertise and efforts. To generate revenue in any venture related to the software industry, at some point, software engineers are required. Even if they are volunteer, the non-profit organization needs to generate streams of revenue, for philanthropic targets, and sometimes to support the organization’s core function to do so. NPOs need to ‘make money’ to survive, either to funnel to and justify their altruisms or/and to do that via skilled and focused professionals.</p>
<h1 id="developer-adjacent-likenesses">Developer-adjacent Likenesses</h1>
<p>If you’re still not convinced that I’ve laid out a Goal, Approach, Outcomes and Tactics that sufficiently clarify what a ‘developer’ is uniquely, let me try a visual I’ve been working on called “A Dev Does as a Dev Is”:</p>
<p><img src="/assets/images/2021/12/20211229_134113.jpg" alt="'A developer does as a developer is': 80% of time is spent coding and 20% on everything else less comfortable" /></p>
<p>In essence, think honestly about where you most feel comfortable. If that’s not writing code with a clear mission and autonomy to implement well-understood requirements in some type of code, then you’re not a programmer (disambiguation of ‘developer’). If you are more on the problem-solving side, which is perfectly fair, you’re still a dev, and code is a material part of your day-to-day.</p>
<p>If you spend most of your time trying to avoid situations where you have to “resort to code”, then you’re not what I’d called a ‘developer’. Not that a ‘good developer’ will rush to code any chance they get (this would also be an anti-pattern), but that the outputs of a ‘developer’ are primarily measured by <em>working software that achieves their organization’s goal’</em>.</p>
<p>NOTE: if you are a non-hashtag ‘DevOps’ person, you may be a half dev (like SRE time-spend) and half everything else that your organization needs. In this you are a special snowflake and worth every penny your company is (or should be) well-paying for you, since you think holistically about two sides of your work and of others’ work.</p>
<p>This is my blog and I can say whatever I want. If you don’t like it, feel free to hit me up on <a href="https://twitter.com/paulsbruce">Twitter</a> or <a href="https://www.linkedin.com/in/paulsbruce">LinkedIn</a>, quote and hate me, or better, leave useful comments that further this dialog.</p>{"login"=>"pbruce", "display_name"=>"Paul Bruce", "first_name"=>"Paul", "last_name"=>"Bruce"}It may sound like an unnecessary errand in 2021 to have to define what “developer” (“programmer”) means, but once again I find myself in the complimentary and contradictory position of having to disambiguate what people in software mean when they casually say “dev” or “developer” in rooms of like-minded individuals.Accounting for Privilege2021-12-07T21:43:47+00:002021-12-07T21:43:47+00:00https://paulsbruce.io/blog/2021/12/accounting-for-privilege<p>After a few years of volunteer organizing <a href="https://devopsdaysbos.org/">DevOpsDays Boston</a> and other local tech events, I found that there
were some things I wanted to work out personally in other sandboxes, but still drive myself to “provide
more value than I consume”. Specifically, I want to put my money where my mouth is regarding under-representation and white privilege in the tech industry. So I did, with a lot of help from others.</p>
<p>This year I can report that <strong><em>$21,000 from <a href="https://devopsdaysbos.org/">DevOpsDays Boston</a></em></strong> and around <strong><em>$17,000 from <a href="https://o11yfest.org">o11yfest</a></em></strong> have been donated to a combination of <a href="https://www.resilientcoders.org/">Resilient Coders</a>, <a href="https://kodeconnect.org/">KodeConnect</a>, <a href="https://www.yearup.org/">YearUp</a>, <a href="https://www.blackgirlscode.com/">Black Girls CODE</a>, <a href="https://stopaapihate.org/">Stop AAPI Hate</a>, <a href="https://aidindia.org/donate/covid-relief-fund/">COVID Relief Fund for India</a>, and <a href="https://translifeline.org/">Trans Lifeline</a>. Some in combined effort, some with heavy personal effort, but every dollar here bears the organizing groups’ willingness and effort to go above and beyond.</p>
<ul>
<li><a href="#the-final-distribution-and-tallies">Jump to the Final Distribution and Tallies</a></li>
</ul>
<h1 id="how-did-we-arrive-here">How Did We Arrive Here?</h1>
<p>If I were to retroactively reconstruct my general thinking, it has been:</p>
<ol>
<li>I didn’t set out to help to drive white privilege, but once I realized how bad it is, I can’t ignore my responsibility to do something meaningful about it</li>
<li>I want to bring other voices and perspectives to the table, so the less I talk/speak/present and the more I can help under-represented folks do that, the better off we all are</li>
<li>Moving into primarily organizing role(s), another layer of privilege, this helps with the above but should also inherit the same motion to <a href="/blog/2016/02/7-practical-tips-for-inclusion/">“step up by stepping aside”</a> in the long-term</li>
<li>The ideas and hypotheses about how to do this stuff must be informed by those who are under-represented, not just my empathy or inclusion assumptions</li>
<li>Some motions and approaches need trial-and-learning cycles before expecting them to work in the broader context of the <a href="https://devopsdaysbos.org/">DevOpsDays Boston</a> event organizer’s group</li>
</ol>
<p>Ultimately, after countless organizing and board meetings, policy discussions, delayed emails, and other legitimate but frustratingly complex dialogs, I realized that I could run outside experiments myself, find what works and what doesn’t there, and then synthesize that back in to the bigger groups I work with.</p>
<h1 id="why-money-isnt-that-just-another-tech-bro-handout">Why Money, Isn’t That Just Another Tech Bro Handout?</h1>
<p>I don’t think of myself as a ‘tech bro’. I don’t:</p>
<ul>
<li>like to hang out with people who just look, sound, think, or act like me or agree with me</li>
<li>have every day of the week available to just stay ‘in the city’ for a late meetup</li>
<li>constantly try to advance my career by stepping on others</li>
<li>go around trying to fix people simply because I can code</li>
<li>talk or shout people down, like, ever</li>
<li>assume that I will by default own things (or matter) in the future</li>
</ul>
<p>I do:</p>
<ul>
<li>err on the side of self-less</li>
<li>make mistakes and look to correct them when identified</li>
<li>care about the well-being of others, usually more than my own :(</li>
<li>everything I can think of to encourage non-white-non-bros to elevate and be treated equitably, both in my full-time job and in my volunteer groups</li>
<li>read, listen, and absorb as much as I can possibly take about:
<ul>
<li>historical inequities</li>
<li>unconscious bias</li>
<li>gender equality</li>
<li>non-binary and gender-neutral identity</li>
</ul>
</li>
<li>bear a HUGE debt of guilt for the racism that is the entire history of the United States of America</li>
</ul>
<p>In short, I’m learning and taking responsibility for what I can and should do to enable and accelerate others to help change the equations. It’s more than I can say for some, but still never enough for the size of what I feel.</p>
<p>In 2019 (remember the days of old), some of the ‘best thinking’ in the group was to offer free ‘under-represented’ tickets. This never sat well with me, though as one of the group willing to try things, I did my part to visit various meetups and asked the organizers if/how it would be possible to encourage people to take them. We had tracking by code so we know that even after many of the 50 tickets were given out locally, this didn’t really compel people to come to some local tech event. And not to mention, this was putting organizers in a place of power over ‘who to give them to’, how ‘under-represented’ do you have to be to deserve one, etc. Blech. That model was firmly in ‘handout’ territory and disappeared quickly.</p>
<p>Then COVID crashed the world and 2020’s event went virtual, which actually freed us up to offer tickets at a sliding scale, including a free option. The fear from others was that we would get people who would have otherwise paid getting free tickets instead. This was exacerbated by using Hopin, a ticketing + virtual event platform, which at that time (and since last I checked in May 2021) had no way to set defaults and sort order on how different ticket types appeared for people. Either way, the tech world was only starting to realize at that time that all events would be virtual indefinitely. So though it wasn’t perfect, 2020 ticketing aired on the side of inclusive (defaulting to ‘free’), and no one had to get in the middle of ‘who deserves a free ticket’, which itself was a huge improvement.</p>
<p>So fast forward from ‘free tickets’ to directly generating $17,000 for important causes, no, this money isn’t a handout. I consider it the first repeatable model for what reparations and preparations are due other folks and communities, more than I can personally afford any other way. And in some small way, it’s my own way of dealing with guilt and the need for rightly-human recompense about things like this:</p>
<p><a href="/assets/images/2021/12/20211208_121234.jpg"><img src="/assets/images/2021/12/20211208_121234.jpg" alt="From "They Called Us Enemy" p.24-25 - George Takei, Justin Eisinger, Steven Scott, Harmony Becker" /></a></p>
<p><strong><em><a href="https://www.amazon.com/They-Called-Enemy-George-Takei/dp/1603094504">From “They Called Us Enemy” p.24-25 - George Takei, Justin Eisinger, Steven Scott, Harmony Becker</a></em></strong></p>
<p>Also, two of the three good causes have already reached back out to us to personally thank us AND to figure out how we can collaborate together in 2022! This is already part of my charter as part of the <a href="https://bostondevopsnetwork.org">Boston DevOps Network</a> board who underwrites the <a href="https://devopsdaysbos.org/">DevOpsDays Boston</a> event, and I’m excited to develop this interest into action and tangible positive impact moving forward.</p>
<h1 id="the-breakdown-how-did-this-work">The Breakdown: How Did This Work?</h1>
<p>Conferences, virtual or otherwise, require capital to start and complete. Vendors in a virtual conference are online platforms (like Restream, Ti.to Ticketing, Vito.Community, Live Captioning persons, Otter.ai for alternative Transcripts, Zoom Pro for breakouts, etc.) and logistics (like A/V, Graphic Recording, creatives artistry, swag/distribution, MoneyOps/AR/AP). This stuff costs real money. I was able to do it with about $15k all told. At times I had to shift personal money into my LLC account as CapEx to cover one or two things before receiving sponsor money. I only ever wanted to break even, and mostly focus on how to turn interest (attendee tickets and sponsorship overflow) directly into donations to good causes.</p>
<p>So that’s why (at least for virtual <a href="https://o11yfest.org">o11yfest</a>), I only needed three ‘premiere sponsors’ (total $21k). Then I can encourage attendees as much as possible from their hearts to donate, with a default of $30, but also provide a free ticket option to include those that really couldn’t afford it otherwise. The rest of the tech companies who inquired about sponsorship after the premiere spots were gone, I was able to encourage to ‘go directly donate a minimum of $2k to one of these good causes, just forward me the email proof of donation, and you get gold level perks’. Five startups took me up on this, an easy way to attach their names and logos to a community-driven event, generating a total of $10k in donations (which I didn’t have to accounts-receivable for, less to do and get fees taken from). Finally, more than 2/3rds of the attendees opted to donate something, many the default of $30 or more per ticket, totaling about ~$5,400 in net ticket sales (minus the platform percentage) which was all by definition earmarked for donations.</p>
<h1 id="what-worked-well">What Worked Well</h1>
<p>A lot, surprisingly, and many because of the group perspective, group effort, and virtuous individuals.</p>
<ul>
<li>heavy emphasis on inclusivity, from the organizing groups to the expectations to premiere sponsors, the graphics and the content/presentations; I’m grateful and proud of what we accomplished here</li>
<li>having an ‘entourage’ of organizers; even if I did a lot of the General Manager stuff, it always comes down to a team effort at the actual end of the day</li>
<li>paying for one of my favorite independent musical artist to do a special COVID-remote show for us, aside from supporting them financially, it put FUN on the menu and lots of great feedback about it afterward</li>
<li>commissioning a young local graphical artist to come up with ‘mascot’ art; lots of little and different robots that could be used in everything from digital platform themes to cards to hoodies and gift boxes to fridge magnet sets. they were everywhere and clearly identified how unique <a href="https://o11yfest.org">o11yfest</a> would be and was.</li>
</ul>
<h1 id="what-didnt-work-as-well">What Didn’t Work As Well</h1>
<p>Oh there were so many things, but here are the ones worth sharing:</p>
<ul>
<li>Don’t bother with virtual swag bags/boxes mailed to people, the costs and logistics are are not work the ‘cutsie’ effect (I only did this because it helped people not feel so disconnected during COVID)</li>
<li>Hoodies…rather than commissioning, storing, distributing centrally through a vendor, there are plenty of online self-service options where attendees who want to buy their own can pay and keep their address and PII to themselves. The only hoodies I did were 30, for speakers and volunteers, and that was exhausting to deal with, but then I got to hand-write unique thank you notes to each of them :)</li>
<li>Paying lots of money for dedicated server for ‘the afterparty’ platform (gather.town); I needed to make sure people weren’t denied access in a free account, but like 30 people showed up and I budgeted for 300.</li>
</ul>
<h1 id="the-final-distribution-and-tallies">The Final Distribution and Tallies</h1>
<table>
<tbody>
<tr>
<td>From</td>
<td>To</td>
<td>How Much</td>
<td>Why</td>
</tr>
<tr>
<td><a href="https://devopsdaysbos.org/">DevOpsDays Boston</a></td>
<td><a href="https://www.yearup.org/">YearUp</a></td>
<td>$7,000</td>
<td>organizers disbursal</td>
</tr>
<tr>
<td><a href="https://devopsdaysbos.org/">DevOpsDays Boston</a></td>
<td><a href="https://kodeconnect.org/">KodeConnect</a></td>
<td>$7,000</td>
<td>organizers disbursal</td>
</tr>
<tr>
<td><a href="https://devopsdaysbos.org/">DevOpsDays Boston</a></td>
<td><a href="https://www.resilientcoders.org/">Resilient Coders</a></td>
<td>$7,000</td>
<td>organizers disbursal</td>
</tr>
<tr>
<td><a href="https://devsecopsdaysboston.org">DevSecOps Days Boston</a> - synopsys</td>
<td><a href="https://www.resilientcoders.org/">Resilient Coders</a></td>
<td>$2,000</td>
<td>contributor sponsorship</td>
</tr>
<tr>
<td><a href="https://o11yfest.org">o11yfest</a> - Harness.io</td>
<td><a href="https://stopaapihate.org/">Stop AAPI Hate</a></td>
<td>$2,000</td>
<td>contributor sponsorship</td>
</tr>
<tr>
<td><a href="https://o11yfest.org">o11yfest</a> - Chronosphere</td>
<td><a href="https://stopaapihate.org/">Stop AAPI Hate</a></td>
<td>$2,000</td>
<td>contributor sponsorship</td>
</tr>
<tr>
<td><a href="https://o11yfest.org">o11yfest</a> - attendees</td>
<td><a href="https://stopaapihate.org/">Stop AAPI Hate</a></td>
<td>$500</td>
<td>balance of net ticket sales</td>
</tr>
<tr>
<td><a href="https://o11yfest.org">o11yfest</a> - attendees</td>
<td><a href="https://translifeline.org/">Trans Lifeline</a></td>
<td>$4,500</td>
<td>balance of net ticket sales</td>
</tr>
<tr>
<td><a href="https://o11yfest.org">o11yfest</a> - StackPulse</td>
<td><a href="https://www.blackgirlscode.com/">Black Girls CODE</a></td>
<td>$2,000</td>
<td>contributor sponsorship</td>
</tr>
<tr>
<td><a href="https://o11yfest.org">o11yfest</a> - FireHydrant</td>
<td><a href="https://www.blackgirlscode.com/">Black Girls CODE</a></td>
<td>$2,000</td>
<td>contributor sponsorship</td>
</tr>
<tr>
<td><a href="https://o11yfest.org">o11yfest</a> - SLOConf/Nobl9</td>
<td><a href="https://www.blackgirlscode.com/">Black Girls CODE</a></td>
<td>$2,000</td>
<td>contributor sponsorship</td>
</tr>
<tr>
<td><a href="https://devsecopsdaysboston.org">DevSecOps Days Boston</a></td>
<td><a href="https://aidindia.org/donate/covid-relief-fund/">COVID Relief Fund for India</a></td>
<td>$672.35</td>
<td>100% of net ticket sales from <strong><em><a href="https://devsecopsdaysboston.org">DevSecOps Days Boston</a></em></strong></td>
</tr>
</tbody>
</table>
<h1 id="shout-outs-and-contributors">Shout-outs and Contributors</h1>
<p>A special thanks to <a href="https://twitter.com/lizthegrey">Liz Fong-Jones</a> who was my co-chair of <a href="https://o11yfest.org">o11yfest</a>, notwithstanding all the other things she had to do this summer. Gracious, patient, concise, and hella-connected. Thank you for your meaningful guidance and support!</p>
<p><a href="https://www.linkedin.com/in/michaeltclark/">Michael Thomas Clark</a>, another volunteer core organizer from <a href="https://devopsdaysbos.org/">DevOpsDays Boston</a>, was my financial email “plus ones” person, and a regular at our organizing weekly check-ins. Kind of my conspirator on the donations thing, we are of the same spirit that serious donation strategy should be part of the ethics of every tech event, at least the ones we’re volunteering with.</p>
<p><a href="https://www.linkedin.com/in/kate-ruh/">Kate Ruh</a>, who helped with DevSecOps Days on a whim, and then totally came through and is now one of the <a href="https://devopsdaysbos.org/">DevOpsDays Boston</a> organizers! So insightful, effective, efficient, and generative.</p>
<p><a href="https://twitter.com/ameliamango">Amelia Mango</a>, for the second year in a row, prompted me to get off my ass and do something! Not only a proven good actor in the OTel and API spaces around intelligent marketing, she also helped a lot with ideation, logistics and just keeping the ball rolling on tactical work as bigger general management things started to take all my time up. Always interested in working with Amelia about, well, anything.</p>
<p><a href="https://www.linkedin.com/in/ruth-g-lennon/">Ruth Lennon</a>, a wise and fearless leader in the standards and higher ed technology space. Ruth was co-chair of <a href="https://devsecopsdaysboston.org">DevSecOps Days Boston</a>, which generated ~$2,672 in donations as a first-year virtual conference. Her insight and guidance on topics and presentations for this event was instrumental to putting on a top-quality event.</p>
<h1 id="it-really-takes-a-village">It Really Takes a Village</h1>
<p>Again, none of this would have happened without a lot of peoples’ concerted efforts, off-hours and completely volunteer. If you’re interested in helping with these events or even just becoming part of the Boston DevOps or o11yfest communities, please reach out to me via <a href="https://www.linkedin.com/in/paulsbruce/">LinkedIn</a> and let’s chat!</p>{"login"=>"pbruce", "display_name"=>"Paul Bruce", "first_name"=>"Paul", "last_name"=>"Bruce"}After a few years of volunteer organizing DevOpsDays Boston and other local tech events, I found that there were some things I wanted to work out personally in other sandboxes, but still drive myself to “provide more value than I consume”. Specifically, I want to put my money where my mouth is regarding under-representation and white privilege in the tech industry. So I did, with a lot of help from others.5 Trends Shaping the Future of Performance Engineering2021-12-05T15:47:35+00:002021-12-05T15:47:35+00:00https://paulsbruce.io/blog/2021/12/5-trends-shaping-the-future-of-performance-engineering<p>As part of my usual duties, Tricentis asked me to name a few key themes and trends
that I’ve seen during customer engineering work with forward-thinking customers,
ones that will likely be a big part of performance engineering in 2022.</p>
<p>There is no predicting the future, there’s only keeping your eyes and ears open
and developing intuition, then collecting evidence for AND against hypotheses.</p>
<p>You can signup/view the recording of this webinar <a href="https://www.bigmarker.com/techwell-corporation/5-Trends-Shaping-the-Future-of-Performance-Engineering">here</a></p>
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vQCtC7zalzcJkXfSYgFXcKyFoek4pTX494cZglTU303devTx1csCx-CBb-PrCroklIaCjunraJ19SI2/embed?start=true&loop=true&delayms=3000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>
<ul>
<li><a href="#1-resiliency-and-controlled-chaos-engineering-by-default">1. Resiliency and controlled chaos engineering by default</a></li>
<li><a href="#2-deep-and-wide-telemetry-for-all-cloud-resources">2. Deep and wide telemetry for all cloud resources</a></li>
<li><a href="#3-industry-standardization-of-metrics-and-instrumentation">3. Industry standardization of metrics and instrumentation</a></li>
<li><a href="#4-intelligent-readiness-pre-and-post-deployment">4. Intelligent readiness pre and post deployment</a></li>
<li><a href="#5-a-common-approach-across-custom-and-packaged-apps">5. A common approach across custom and packaged apps</a></li>
</ul>
<h1 id="1-resiliency-and-controlled-chaos-engineering-by-default">1. Resiliency and controlled chaos engineering by default</h1>
<p>Chaos engineering has been around for a while, but it often takes quite a while for
good ideas and practices to mature in the crucible of large at-scale organizations.</p>
<p>Many of my largest customers have some form of failure more injection in play during
large load tests, often using tools like Gremlin or the Chaos tools from Netflix
to simulate infrastructure flakiness and unavailability. This is mostly in lower
environments, though some have small pockets of already-rugged services in production
that are regularly rotated and decommissioned as part of a plan.</p>
<p>They also utilize load testing because there’s very little you can defensibly glean
from injecting chaos/faults but then manually testing things out as only one user.
Load testing, even at non-massive volumes, provides statistical relevance to errors
observed and potentially correlated to chaos exercises.</p>
<p>After participating in a small back-and-forth on DevOps Unbound episode with the very
awesome <a href="https://www.linkedin.com/in/tammybutow/">Tammy Bryant Butow</a>, it’s clear that
performance and reliability teams must
work together to advocate and grow both performance testing and chaos engineering
practices together; there’s little point in doing either without seriously considering
the other. Yes this means I’m saying that load testing alone is not enough; it’s
a good first step to exercising your systems pre-release, but the next step should
use chaos/chaotic fault injection to really test your system’s true failure states.</p>
<p>A few resources if you’d like to hear more about chaos/resiliency and performance testing:</p>
<ul>
<li><a href="https://bit.ly/chaosdeck">bit.ly/chaosdeck : A slide deck I use for tech intros and workshops</a></li>
<li><a href="https://bit.ly/chaosvideos">bit.ly/chaosvideos : A playlist of really good videos on chaos testing</a></li>
<li>NeoLoad + Gremlin: <a href="https://youtu.be/K-MxfuzOdU8">Webinar</a> and <a href="https://d28h099uturm62.cloudfront.net/wp-content/uploads/2021/03/Datasheet-NeoLoad-Gremlin-EN.pdf">White Paper</a></li>
</ul>
<h1 id="2-deep-and-wide-telemetry-for-all-cloud-resources">2. Deep and wide telemetry for all cloud resources</h1>
<p>In the past, we engineers have always needed more information to diagnose issues.
This often took classic forms of monitoring servers directly: <a href="https://youtu.be/YlAqs8cWP2I?t=272">CPU, memory, disk, and network</a> utilization. Though sometimes this kind of direct access is still necessary, it’s problematic from
a security and cloud-based infrastructure perspective to allow this. It’s far more
common now to see application performance monitoring (APM) agents take the responsibility
of being on-server, in-process, actors that safely ship both classic and deeper telemetry about
the state of our servers <strong><em>AND</em></strong> service health back to a central system (e.g. Dynatrace, Prometheus, DataDog, etc.)</p>
<p>But going deep isn’t always enough. In distributed systems and deployments using
shared infrastructures (think blade servers, cloud racks, Kubernetes clusters),
it’s also just as critical as ‘going deep’ to also be able to ‘go wide’. In other words,
measuring as many of these shared components as feasible, specifically because
the parts affect each other. One pods spike is another pods noisy neighbor. No
cloud VM’s spec is <strong><em>guaranteed</em></strong>, it’s all best-effort at the end of the day.</p>
<p>There are also ‘wide metrics’, such as number of pod or VM restarts, that only a
management layer captures since the individual component in this case dies unexpectedly.
I also include networking infrastructure such as security firewalls, software-defined
routers, and load balancers into ‘wide metrics’ because they often cause issues with
performance yet serve many services. Not having access to firewall telemetry that
correlates to socket timeouts in a load test can be maddening because you can’t
prove or disprove if that particular device is part of the issue or not.</p>
<p>Even (and sometimes especially) with SaaS providers whom you trust to fulfill their SLAs,
their Ops teams have implemented rules for detecting and rate-limiting, which don’t
show up until you start hitting them. From <a href="https://paulsbruce.io/blog/2019/08/on-lack-of-transparency-in-saas-providers/">my work</a> with clients using various Salesforce
SaaS platforms, ‘governers’ (their name for these rules) are usually undocumented publicly,
and sometimes not documented well or even at all internally, and diagnosing why
you’re hitting performance limits on their SaaS platform(s) takes weeks of back-and-forth
with their technical teams.</p>
<p>The point is, the more telemetry we have, both deep into the components and wide
across runtime management layers, the more likely we can diagnose what the biggest
contributing factors are to a performance or reliability issue.</p>
<p>The future of telemetry is looking better and better year after year though. Every
cloud provider typically has (low cardinality, low granularity) metrics about resources
you use there. More organizations are expanding or shifting their APM platform investments
to support cloud-based, not just in-premise, resources. And as Kubernetes adoption
continues to grow exponentially, APMs and CNCF tooling (such as eBPF) is getting
deeper and wider about providing centralized reporting for telemetry.</p>
<p>Some resources to learn more about deep and wide telemetry:</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=49BGvC1coG4">PromCon EU 2019: Containing Your Cardinality</a></li>
<li><a href="https://www.youtube.com/watch?v=Za7vmKr11cQ">PromCon 2018: OpenMetrics</a></li>
<li><a href="https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/">Metrics for Kubernetes System Components</a></li>
<li><a href="https://paulsbruce.io/blog/2019/08/on-lack-of-transparency-in-saas-providers/">Blog: On Lack of Transparency in SaaS Providers</a></li>
<li><a href="https://prometheus.io/docs/prometheus/latest/querying/examples/">Prometheus Query Examples</a></li>
</ul>
<p>This leads us nicely into the next topic…</p>
<h1 id="3-industry-standardization-of-metrics-and-instrumentation">3. Industry standardization of metrics and instrumentation</h1>
<p>Now that there are so many ways to collect telemetry, inject tracing, and correlate logs,
the choice for which technologies and platforms your organization should support
is increasingly complex. Each team has their own preferences for which tools and
technologies to implement, even if there are prescribed ‘best practices’ and already
paid for platforms.</p>
<p>And if you were to propose that one of your important services change how it captures and
ships telemetry (often a massive rewrite for legacy codebases), you’d likely face a huge
battle with product teams who already have an overbooked backlog of both new features and
tech debt items to take up their time. So how can we make this better moving forward?</p>
<p>The short answer is for the <strong><em>industry to standardize</em></strong> at least some of the core
structures, formats, and SDKs for instrumenting and embedding telemetry code into
your apps and services. A <em>huge</em> amount of effort has been going on for years
in the CNCF OpenTelemetry (OTel) project for this exact reason. Utilizing OTel
for all greenfield/new development is a great start, and it’s not that terribly hard
to retrofit for legacy systems instrumentation as well.</p>
<p>And ‘observability’ is not the only buzzword getting standardization love these days.
Remember when APIs were the new cool kid on the block? Well beyond the API definition
wars from 2015, we now see people realizing the importance of standards around
measuring non-functional aspects of APIs as well, with the recent formulation of
<a href="https://thenewstack.io/api-rating-agency-brings-consistency-to-api-measurements/">‘The API Rating Agency’</a>.</p>
<p>There’s even a high-level standard for enterprise DevOps principles and practices
(that I and others worked on) called IEEE 2675-2021. I specifically made sure that
the vision and outcomes statements of the Measurement, Risk Management, QM, QA, verification,
and validation process sections had <strong><em>HEAVY</em></strong> helpings of how to accelerate telemetry
and synthesis back in to decision processes.</p>
<p>But staying with performance and reliability, OTel is definately worth getting up to speed on.
To learn more about the CNCF OpenTelemetry project, here are a few resources:</p>
<ul>
<li><a href="https://o11yfest.org">o11yfest.org</a> (free signup for two days of presentations)</li>
<li><a href="https://www.youtube.com/watch?v=C1nvcLapcyA&list=PLFXQmSmq7uXQU3IrbypXKetf5Tos_XWOU">Observability Primer, LYIT Guest Lecture, Paul Bruce</a></li>
<li><a href="https://thenewstack.io/api-rating-agency-brings-consistency-to-api-measurements/">The New Stack: API Ratings agency…</a></li>
<li><a href="https://www.youtube.com/watch?v=8Og8rhjqgQo&list=PLFXQmSmq7uXQU3IrbypXKetf5Tos_XWOU&t=63s">Announcing IEEE 2675-2021: DevOps Standard to Build Secure and Reliable Systems</a></li>
</ul>
<h1 id="4-intelligent-readiness-pre-and-post-deployment">4. Intelligent readiness pre and post deployment</h1>
<p>When we talk about being ‘ready’, it matters what we precisely mean. What does it
mean for you to be ready to release to production users? How is this different
than being ready to deploy to a production-like environment for load testing?
Who defines what ‘ready’ means in these and other contexts?</p>
<p>What we’re really talking about in most of these cases is <strong><em>confidence</em></strong> people
have that what they’re about to do will have the anticipated affect on
users or the business under an appropriate level of risk (never zero).</p>
<p>The simplest answer I can provide is that ‘ready’ should be:</p>
<ul>
<li><strong><em>clearly defined</em></strong>, in terms of
<ul>
<li>process to complete correctly</li>
<li>evidences of satisfactory success</li>
<li>risk mitigation plan, should issues arise</li>
</ul>
</li>
<li><strong><em>measurable</em></strong> in concrete terms, such as
<ul>
<li>SLAs, SLOs, SLIs agreed to by all stakeholders</li>
<li>both before and after release to production users</li>
</ul>
</li>
<li><strong><em>communicated</em></strong> to all stakeholders
<ul>
<li>what the rollout plan is</li>
<li>what success signals to look for</li>
<li>what failure signals to look for</li>
<li>how to raise your hand if something doesn’t look right</li>
</ul>
</li>
<li><strong><em>actively managed</em></strong> by a product team lead
<ul>
<li>but also include on-call plan for devs and infra/ops</li>
<li>and include appropriate status updates to stakeholders</li>
<li>and who never assumes that things are working as planned</li>
</ul>
</li>
</ul>
<p>Okay, so for performance engineering, what does ‘intelligent’ readiness look like?
Well there are a lot of the points above that sound like they have to be done
by humans (who are intelligent), but many of them are patternable and repeatable
once some of the basic thinking is laid down, and so can and should be automated.</p>
<p>Performance tests shouldn’t all have to be special-snowflake, big-bang, complex
operations. In fact, most performance testing should be frequent and small, focusing
on API, sanity, and low-volume validations early on in the development lifecycle.</p>
<p>As such, it is easy to ‘clearly define’ what the performance testing process is,
both on your organization’s wiki or knowledge management platform, and most certainly
in the automated pipeline scripts that inscribe the repeatable process into your
continuous delivery motions.</p>
<p>All testing, performance or otherwise, should produce clear and trustworthy <strong><em>exit signals</em></strong>,
both in the case of successful completion and unsuccessful. Usually my clients
use load test SLAs/thresholds around error rate and response times under sustained
load to determine if the test should keep going or ‘fast fail’. If the load test
infrastructure or project sources have issues, this would be what I call a ‘catastrophically unsuccessful’ test, as opposed to a test that completed but didn’t meet the final
performance acceptance criteria that the team agreed to. Except for short, passing sanity
tests, all test runs and results should be retained for an appropriate amount of time.</p>
<p>Getting ‘measurable’ readiness means that you can demonstrate evidence not only
before deployment and release to production, but afterwards as well. Some people and
systems lend themselves to testing in production more than others, and most clients
I see ‘load testing in prod’ are usually doing it against production resources that
have not been rotated in to use by production users. This kind of capability takes
rollout and deployment maturity many organizations still lack, so many just opt
for pre-release load testing in either production-like staging environments or
something close to that. The point is that the same testing you do in lower environments
should be easy to switch over to other higher environments that need a pressure
check before <em>releaseing</em> to production users.</p>
<p><strong><em>Communicating</em></strong> these efforts is important because when a load test runs somewhere,
someone will definitely feel it, whether that’s other test engineers using that environment
or the infra/ops team getting alerts. This is usually why you don’t trigger big
performance tests on every single code check-in, not to mention the time budget,
the system under test usually takes resources, and if shared/permanent impacts others.</p>
<p><strong><em>Active management</em></strong> of performance engineering tasks such as big load tests
isn’t hard; this is usually the product team asking for someone to run the test.
What’s harder is to build continuous performance testing practices that communicate
success and failures back to the product teams A) at the right times, and B) with
the right level of actionable detail.</p>
<p>There is rarely a ‘silver bullet’ in performance
engineering, no single ‘root cause’…until you step through things to find a specific
issue causing lots of symptoms. Sure, 90% of the time, it’s the database; but then
there are times it’s not. What people often mistake as ‘single root cause’ situations
are actually a biggest contributing factor that’s loud enough and that you happened upon
first. There are always more than one contributing factor, and once we find something
we’re often too busy to look around for the others.</p>
<p>The point is here, there’s no one optimal performance report format or easy way
to identify ‘root cause’ in all cases, not without expertise and experience guiding it.
There are however common and practical points of information in load tests, such as:</p>
<ul>
<li>metrics perceived by the load platform
<ul>
<li>RED signals: rates, errors, durations</li>
</ul>
</li>
<li>metrics observed via monitoring
<ul>
<li>USE signals: server/service side Utilization, Saturation, and Errors</li>
<li>Wide metrics: monitoring storage, management layers, and network devices</li>
</ul>
</li>
<li>notable moments and in-cycle events:
<ul>
<li>chaos testing activities</li>
<li>re-balancing or optimization rules being triggered</li>
<li>real-time manual configuration changes</li>
</ul>
</li>
</ul>
<p>By visualizing these together, calling out ‘interesting moments’, and constantly
taking action to automate known faults and anti-patterns into actionable outcomes,
product teams can effectively move performance testing into continuous delivery.</p>
<h1 id="5-a-common-approach-across-custom-and-packaged-apps">5. A common approach across custom and packaged apps</h1>
<p>I often hear that “the API teams are doing their own thing” or “testing the
apps/services we build is very different from ‘commercial of the shelf’ (COTS) products.”
I know what my clients are talking about, I get it. Testing Workday and JIRA
is very different than that your organization’s e-commerce site or claims processing app.</p>
<p>Mostly it boils down to:</p>
<ul>
<li>varying levels of ‘testability’ in the Systems Under Test (SUT)</li>
<li>low expertise or experience with the intricacies of the SUT</li>
<li>lack of visibility into the back-ends of the COTS apps (see prior comments about Salesforce)</li>
<li>low/no direct interaction or communication with SUT dev team(s)</li>
</ul>
<p>Especially with packaged apps, it’s not so much the app itself that’s causing
performance issues, but rather all the customizations and the operational environment
configuration that fundamentally change the definition of what a COTS vendor does
to verify the performance and reliability of their core deliverables vs. how it
actually works for your organization as deployed. If you’ve ever gone through an
SAP enterprise implementation, you might be familiar with their ‘configuration testing’
which includes load testing once customizations and environment are loaded up.</p>
<p>With custom apps, you usually have more of an ability to reach out to teams internally,
even if they’re contracting, to ask questions. There are often architectural diagrams
and systems (of record, such as APM platforms) already in use that you simply have
to ask for access to and explain why it’s important to the work you’ve been asked to do.
If you find untestable situations, call them out, make sure you add a testing user
story to the product team’s backlog and get the PO and Product Management to assess
the risk it represents. If they can’t do that, call your security/compliance office.</p>
<p>In both custom app and packaged app cases, you can always make progress on at least
two of the four bullet points above, immediately. It might not be easy…certainly getting
interactions with a JIRA dev team at Atlassian is far harder than say sending
a virtual 30min meeting invite to an internal Product Owner (PO) and their lead
developer. Sometimes the required level of expertise with a complex web app takes
longer than provided to build, and that’s not an easy discussion to have with some
engineering managers. And of course it’s easy to blame tools and other people.</p>
<p>A bad engineer blames their tools. It is not the tools we use that make us good,
but rather how we employ them.</p>
<p>How we employ testing tools should be guided by what approach to testing we have
decided to take. If something proves to be inordinately hard to test, that’s
an indication to go investigate how others have solved the problem, or ask the
persons responsible for building it how they would approach validating that it
works well.</p>
<p>In a nutshell, performance engineering has been and even more needs to consolidate
their expertise, their practices, and their tooling to scale it to others in their
organization (don’t forget that continuous pipelines are like another person too).</p>
<h1 id="summing-it-all-up">Summing It All Up</h1>
<p>In my customers and clients, I see these trends growing, some better/faster than others.
It is encouraging to see this, but also compelling me to encourage others to grow
and improve their performance and reliability practices.</p>{"login"=>"pbruce", "display_name"=>"Paul Bruce", "first_name"=>"Paul", "last_name"=>"Bruce"}As part of my usual duties, Tricentis asked me to name a few key themes and trends that I’ve seen during customer engineering work with forward-thinking customers, ones that will likely be a big part of performance engineering in 2022.Personal Log 2020-09-022020-09-03T02:54:44+00:002020-09-03T02:54:44+00:00https://paulsbruce.io/blog/2020/09/personal-log-2020-09-02<p><!-- wp:paragraph --></p>
<p>Started later than I'd like, trying to spend more good morning time with the kiddos. Nothing huge on the schedule, some customer and internal meetings, some time for deep work.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:heading {"level":3} --></p>
<h3>Morning Popcorn</h3>
<p><!-- /wp:heading --></p>
<p><!-- wp:paragraph --></p>
<p>Started to spin up revisionary work on an integration between qTest and NeoLoad Web. Good lord I need to document my pre-documentation work better for myself. Critical commands and obscure UI workflows are killer in products you don't know or care to know.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Had to remind someone how to get to competitive intel docs. LMGTFY also applies to internal docs for organizations that use G-Suite.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Redirected someone from using the old product requests system (Trello) to new one. Only saw this because I 'watch' everything in Trello and it comes via email summaries. Trello and email are gross, but hey, it worked today.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Responded to a net-new person interested in collaborating on the book. Good sign that this channel isn't just a point-in-time blast, but an ongoing consideration from the channel admin now.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Combined two email threads about the same topic from two different people groups in our org. Always nice when you can help people realize they're asking for or working on the same things as each other.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:heading {"level":3} --></p>
<h3>Helping to Prioritize Tactical Feature Requests</h3>
<p><!-- /wp:heading --></p>
<p><!-- wp:paragraph --></p>
<p>Hopped on with head of PM to discuss how to better prioritize tactical requests in the new product feedback and request system. Using a similar methodology as <a rel="noreferrer noopener" href="https://medium.com/swlh/rice-scoring-model-for-prioritisation-88d879bfbac0" data-type="URL" data-id="https://medium.com/swlh/rice-scoring-model-for-prioritisation-88d879bfbac0" target="_blank">RICE</a> (Reach-Impact-Confidence-Effort), my main suggestion was around how to represent urgency from the pre-sales or CSM side related to the topic, which isn't really covered explicitly enough to successfully operationalize a prioritization model that works for the whole business. Well received and there other practical things such as customer, Salesforce links, revenue size and risk info that we would also have to provide as metadata regardless of how we sum the urgency factor up into the high level view.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:heading {"level":3} --></p>
<h3>Lunchtime Community Stuff</h3>
<p><!-- /wp:heading --></p>
<p><!-- wp:paragraph --></p>
<p>In BDO Slack, got DMed politely about helping a local startup get some product feedback from our community. Points for asking what the best way was first, points for having the CEO (techy) do it, saw some community members positively engaging already. Also suggested that if they really want community love and help, sponsoring the upcoming DevOpsDays Boston event would be good, and sent along the prospectus. That's the nice thing about being an organizer in multiple groups, constructive forces.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>While I was on the topic and pivoted from lunch, a few other event organizer-y emails, sponsor asks, and an internal huddle about something that can't be ignored anymore.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:heading {"level":3} --></p>
<h3>Deep Work, Fast CLI Fix, Short-Circuiting Flys</h3>
<p><!-- /wp:heading --></p>
<p><!-- wp:paragraph --></p>
<p>Back to work-work: deep dive into the qTest integration. Looks like I have to completely create a <a rel="noreferrer noopener" href="https://github.com/Neotys-Labs/Tricentis-qTest/blob/master/docker/universal-agent/Dockerfile" data-type="URL" data-id="https://github.com/Neotys-Labs/Tricentis-qTest/blob/master/docker/universal-agent/Dockerfile" target="_blank">Dockerfile from scratch</a> (hello versioning hell) that includes their agent, not just the NeoLoad CLI and Python dependencies. Since <a rel="noreferrer noopener" href="https://documentation.tricentis.com/qtest/9910/en/content/qtest_launch/qtest_automation_host_2_install_upgrade_guides/automation_host_docker_setup_instructions.htm" data-type="URL" data-id="https://documentation.tricentis.com/qtest/9910/en/content/qtest_launch/qtest_automation_host_2_install_upgrade_guides/automation_host_docker_setup_instructions.htm" target="_blank">their Docker example</a> (stale, btw) was Ubuntu 16.04 based, struggled with getting Python 3.8/3.6 as default and proper pip requirements for almost an hour. Gave up and based off of Ubuntu 18.04 to simplify Python install process, and everything works much easier now. Also, their agentctl doesn't seem to have subcommands properly documented (wait, <a rel="noreferrer noopener" href="https://support.tricentis.com/community/manuals_detail.do?lang=en&version=9.9.1&module=Tricentis%20qTest%20On-Premise&url=qtest_launch/qtest_automation_host_2_install_upgrade_guides/qtest_automation_host_2.x_installation_guide_on_windows.htm" data-type="URL" data-id="https://support.tricentis.com/community/manuals_detail.do?lang=en&version=9.9.1&module=Tricentis%20qTest%20On-Premise&url=qtest_launch/qtest_automation_host_2_install_upgrade_guides/qtest_automation_host_2.x_installation_guide_on_windows.htm" target="_blank">here</a> it is, halfway down a sea of blah blahs), so there's no way to know if there's an automated approach to configuring an agent on a host (will try again tomorrow). So far, what I've learned is:</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:list --></p>
<ul>
<li>Documentation is shit unless it's optimized for Google via keywords</li>
<li>No matter how much vendor documentation sites push their search bar on you, it's never as good as Google, and usually just downright awful</li>
<li>Cyclical documentation articles that bounce you back to high-level categories are bullshit</li>
<li>Documentation that is long enough to have sections and don't provide permalinks to those sections is candy ass</li>
<li>Documentation should be written for all supported platforms, not just old versions of Winblows</li>
</ul>
<p><!-- /wp:list --></p>
<p><!-- wp:paragraph --></p>
<p>Saw a <a rel="noreferrer noopener" href="http://ideas.neotys.com/feedbacks/181051-publish-reporting-url-nlweb/comments/402135" data-type="URL" data-id="http://ideas.neotys.com/feedbacks/181051-publish-reporting-url-nlweb/comments/402135" target="_blank">product idea about the CLI</a> come in from one of my best customers, made sense, so I implemented it, created a PR and pushed a pre-release version to address the issue. Customer and support comms, then feedbacks that it's better. Turn around time: less than 30mins. </p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Figured out why I was getting a roles and permissions error deep in the qTest agent execution logs. Turns out, my trial to their platform had expired, so, nothing that seemed at all related between the error messages and the actual problem. Nice. Emailed our contacts to ask for proper technical partner license acquisition, hoping to hear from them soon. Reported the blocker to stakeholders.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Short-circuited for the third time an ask from the CEO a fledgling startup who's been stalking Neotys and me about combining their tech with ours. They have one, maybe two customers, and the technology paradigms between them and us by definition are just so far from a match. They're just looking for legitimacy and customer lists. A big waste of our time, and I won't have it, I certainly won't waste my bosses time on it, though they have discussed and been dismissed in the past. Some people just don't understand when there's not something there, neither from a tech/architecture perspective nor a business/user case. We already have our priorities, this is a fly buzzing around the ointment at this point.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:heading {"level":3} --></p>
<h3>Dinner and Event Organizer Meetings</h3>
<p><!-- /wp:heading --></p>
<p><!-- wp:paragraph --></p>
<p>Back home for dinner and family time. Partner is listening to upcoming virtual classroom stuff for the new school year. Glad we and 20% of our community families, that's one whole elementary school out of the five in our city, emphatically said that they wouldn't be sending their kids back. At the beginning of the pandemic, we flattened the curve about demand on hospitals, why the hell would we all be expected to forget that the same applies with our kids and throwing them back into the schools all at the same time, even if that's for half-days (which btw doesn't help working parents more than stay-at-home by much).</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Back to DevOps community organizing, event sponsors working group 30mins weekly until the event is done. There are very few 'other fruits to squeeze' for sponsorship dollars this year. I guess that makes it all the more meaningful, the companies and organizations that have committed already. The fund for next year will be okay, we won't be adding to it this with this year's revenue for sure, but at least we reached a milestone precedent by committing to donate all the net ticket revenue that comes into good causes! Boy, I worked for months getting alignment between groups on how to make that happen, and along with another organizer to put the right pressure words on the right people at the right time, we now we have a list of causes and agreement across groups that it will happen. We also have a process and clear criteria that can scale and be reused again next year. We will also have data about how many people 'feel generous' when offered an option to give more than the recommended ticket price, and data on how many people pick a free ticket when faced with the option that 100% of their contribution will go to something altruistic. We will need to write a retro, draft it up beforehand to publish the day after, about how much money and who it went to; this is not something that can wait for weeks or months like the AV post-production in prior years. People will want to know.</p>
<p><!-- /wp:paragraph --></p>{"login"=>"pbruce", "display_name"=>"Paul Bruce", "first_name"=>"Paul", "last_name"=>"Bruce"}Started later than I'd like, trying to spend more good morning time with the kiddos. Nothing huge on the schedule, some customer and internal meetings, some time for deep work. Morning Popcorn Started to spin up revisionary work on an integration between qTest and NeoLoad Web. Good lord I need to document my pre-documentation work better for myself. Critical commands and obscure UI workflows are killer in products you don't know or care to know. Had to remind someone how to get to competitive intel docs. LMGTFY also applies to internal docs for organizations that use G-Suite. Redirected someone from using the old product requests system (Trello) to new one. Only saw this because I 'watch' everything in Trello and it comes via email summaries. Trello and email are gross, but hey, it worked today. Responded to a net-new person interested in collaborating on the book. Good sign that this channel isn't just a point-in-time blast, but an ongoing consideration from the channel admin now. Combined two email threads about the same topic from two different people groups in our org. Always nice when you can help people realize they're asking for or working on the same things as each other. Helping to Prioritize Tactical Feature Requests Hopped on with head of PM to discuss how to better prioritize tactical requests in the new product feedback and request system. Using a similar methodology as RICE (Reach-Impact-Confidence-Effort), my main suggestion was around how to represent urgency from the pre-sales or CSM side related to the topic, which isn't really covered explicitly enough to successfully operationalize a prioritization model that works for the whole business. Well received and there other practical things such as customer, Salesforce links, revenue size and risk info that we would also have to provide as metadata regardless of how we sum the urgency factor up into the high level view. Lunchtime Community Stuff In BDO Slack, got DMed politely about helping a local startup get some product feedback from our community. Points for asking what the best way was first, points for having the CEO (techy) do it, saw some community members positively engaging already. Also suggested that if they really want community love and help, sponsoring the upcoming DevOpsDays Boston event would be good, and sent along the prospectus. That's the nice thing about being an organizer in multiple groups, constructive forces. While I was on the topic and pivoted from lunch, a few other event organizer-y emails, sponsor asks, and an internal huddle about something that can't be ignored anymore. Deep Work, Fast CLI Fix, Short-Circuiting Flys Back to work-work: deep dive into the qTest integration. Looks like I have to completely create a Dockerfile from scratch (hello versioning hell) that includes their agent, not just the NeoLoad CLI and Python dependencies. Since their Docker example (stale, btw) was Ubuntu 16.04 based, struggled with getting Python 3.8/3.6 as default and proper pip requirements for almost an hour. Gave up and based off of Ubuntu 18.04 to simplify Python install process, and everything works much easier now. Also, their agentctl doesn't seem to have subcommands properly documented (wait, here it is, halfway down a sea of blah blahs), so there's no way to know if there's an automated approach to configuring an agent on a host (will try again tomorrow). So far, what I've learned is: Documentation is shit unless it's optimized for Google via keywords No matter how much vendor documentation sites push their search bar on you, it's never as good as Google, and usually just downright awful Cyclical documentation articles that bounce you back to high-level categories are bullshit Documentation that is long enough to have sections and don't provide permalinks to those sections is candy ass Documentation should be written for all supported platforms, not just old versions of Winblows Saw a product idea about the CLI come in from one of my best customers, made sense, so I implemented it, created a PR and pushed a pre-release version to address the issue. Customer and support comms, then feedbacks that it's better. Turn around time: less than 30mins. Figured out why I was getting a roles and permissions error deep in the qTest agent execution logs. Turns out, my trial to their platform had expired, so, nothing that seemed at all related between the error messages and the actual problem. Nice. Emailed our contacts to ask for proper technical partner license acquisition, hoping to hear from them soon. Reported the blocker to stakeholders. Short-circuited for the third time an ask from the CEO a fledgling startup who's been stalking Neotys and me about combining their tech with ours. They have one, maybe two customers, and the technology paradigms between them and us by definition are just so far from a match. They're just looking for legitimacy and customer lists. A big waste of our time, and I won't have it, I certainly won't waste my bosses time on it, though they have discussed and been dismissed in the past. Some people just don't understand when there's not something there, neither from a tech/architecture perspective nor a business/user case. We already have our priorities, this is a fly buzzing around the ointment at this point. Dinner and Event Organizer Meetings Back home for dinner and family time. Partner is listening to upcoming virtual classroom stuff for the new school year. Glad we and 20% of our community families, that's one whole elementary school out of the five in our city, emphatically said that they wouldn't be sending their kids back. At the beginning of the pandemic, we flattened the curve about demand on hospitals, why the hell would we all be expected to forget that the same applies with our kids and throwing them back into the schools all at the same time, even if that's for half-days (which btw doesn't help working parents more than stay-at-home by much). Back to DevOps community organizing, event sponsors working group 30mins weekly until the event is done. There are very few 'other fruits to squeeze' for sponsorship dollars this year. I guess that makes it all the more meaningful, the companies and organizations that have committed already. The fund for next year will be okay, we won't be adding to it this with this year's revenue for sure, but at least we reached a milestone precedent by committing to donate all the net ticket revenue that comes into good causes! Boy, I worked for months getting alignment between groups on how to make that happen, and along with another organizer to put the right pressure words on the right people at the right time, we now we have a list of causes and agreement across groups that it will happen. We also have a process and clear criteria that can scale and be reused again next year. We will also have data about how many people 'feel generous' when offered an option to give more than the recommended ticket price, and data on how many people pick a free ticket when faced with the option that 100% of their contribution will go to something altruistic. We will need to write a retro, draft it up beforehand to publish the day after, about how much money and who it went to; this is not something that can wait for weeks or months like the AV post-production in prior years. People will want to know.More Like Water, Less Like Waterfall2020-09-03T01:23:08+00:002020-09-03T01:23:08+00:00https://paulsbruce.io/blog/2020/09/more-like-water-less-like-waterfall<p><!-- wp:paragraph --></p>
<p>It's been too long a time since I published something here. The more time I commit to professional and volunteer and personal projects, the less time time I feel I have to write. What a bullshit excuse too, because I book time on my calendar for other operational things, why not this? All it takes is diligence and sticking to a scheduled time.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>In tech, people use the word waterfall like a curse word due to historical reasons, a derogatory label...but an actual waterfall is a continuous stream that doesn't repeat itself. I want to be more like water, as Bruce Lee said, but for more than the reasons he had.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>As Master Lee indicated, we should be ready to change, like water in a glass conforms to the situation around it, in martial arts being rigid and stiff leads to being slow and overly anticipatory. Pangai-noon, translated as "half hard, half soft" also indicates that we need to keep strength and application in balance with speed and adaptability. Mindfulness also plays a huge part in this, living in "the now", being present in each moment, like water that completely fills every dip and cleft from the riverbed to the edge, but also constantly seeking balance at the surface.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Anyway, I hypothesize that it will increase my mindfulness to exhale my field thoughts and experiences (minus personally identifiable stuff of course) to this blog on a consistent basis. I will dedicatedly try this for about 3 months, writing at least twice a week, and not give up because I missed one or two of these personal appointments. I will simply regale out what learnings I can from the main areas of my daily work, since it is so broad. If these pseudo-minutes strike on a topic meaty enough to write as it's own post, fine. If not, fine.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>It will, at the very least, force me to share something, at least, on a frequent basis. Less like 'waterfall' deployment as they say in software, more [continuous] like water.</p>
<p><!-- /wp:paragraph --></p>{"login"=>"pbruce", "display_name"=>"Paul Bruce", "first_name"=>"Paul", "last_name"=>"Bruce"}It's been too long a time since I published something here. The more time I commit to professional and volunteer and personal projects, the less time time I feel I have to write. What a bullshit excuse too, because I book time on my calendar for other operational things, why not this? All it takes is diligence and sticking to a scheduled time. In tech, people use the word waterfall like a curse word due to historical reasons, a derogatory label...but an actual waterfall is a continuous stream that doesn't repeat itself. I want to be more like water, as Bruce Lee said, but for more than the reasons he had. As Master Lee indicated, we should be ready to change, like water in a glass conforms to the situation around it, in martial arts being rigid and stiff leads to being slow and overly anticipatory. Pangai-noon, translated as "half hard, half soft" also indicates that we need to keep strength and application in balance with speed and adaptability. Mindfulness also plays a huge part in this, living in "the now", being present in each moment, like water that completely fills every dip and cleft from the riverbed to the edge, but also constantly seeking balance at the surface. Anyway, I hypothesize that it will increase my mindfulness to exhale my field thoughts and experiences (minus personally identifiable stuff of course) to this blog on a consistent basis. I will dedicatedly try this for about 3 months, writing at least twice a week, and not give up because I missed one or two of these personal appointments. I will simply regale out what learnings I can from the main areas of my daily work, since it is so broad. If these pseudo-minutes strike on a topic meaty enough to write as it's own post, fine. If not, fine. It will, at the very least, force me to share something, at least, on a frequent basis. Less like 'waterfall' deployment as they say in software, more [continuous] like water.In Search of Behavioral Indications2020-05-24T15:47:35+00:002020-05-24T15:47:35+00:00https://paulsbruce.io/blog/2020/05/in-search-of-behavioral-indications<p><!-- wp:paragraph --></p>
<p>Recent changes in an organizing group I'm involved in have given rise to questions in my mind about how well we're doing, not just in terms of outputs, but in terms of cultural characteristics.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>In the past, I've used the model that <a rel="noreferrer noopener" href="https://cloud.google.com/solutions/devops/devops-culture-westrum-organizational-culture" target="_blank">Westrum created for assessing organizational culture</a> to help put puzzle pieces on the table, not only to "see a bigger picture" but also to see what gaps exist (i.e. "missing pieces"). Even when you do this at some point, time goes on, and the picture changes, new boundaries are defined, and new gaps form, so it's important to do this on a regular basis.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:image {"id":1334,"sizeSlug":"large"} --></p>
<figure class="wp-block-image size-large"><a href="https://cloud.google.com/solutions/devops/devops-culture-westrum-organizational-culture" target="_blank" rel="noopener noreferrer"><img src="/assets/images/2020/05/image-1024x394.png" alt="" class="wp-image-1334" /></a><br />
<figcaption>Example of Westrum cultural characteristics</figcaption>
</figure>
<p><!-- /wp:image --></p>
<p><!-- wp:paragraph --></p>
<p>Personally, I have experienced more of the pathological characteristics, such as "messengers shot", "bridging discouraged", "...scapegoating", and "novelty crushed". Others probably have experienced a swatch of these and others.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>However, there is also a solid basis of improved and positive characteristics in this group, namely "modest cooperation" and "failure leads to inquiry". We regularly encourage verbally and async in chat, and have some of the groundwork laid for peer ownership of responsibilities and risks.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>What I like about the Westrum grid above is that each cell value is a pretty good label, categorically speaking, but it can be difficult to understand what to do or avoid concretely. Everything is new to someone at some point, and though I have some experiences with how to move and improve some of these aspects (e.g. from bridging tolerated to encouraged, or the levels of novelty management), there is power in group-think.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>For those and for other reasons, I'm in search of a list of "attractor" and "detractor" behaviors that the group can use to A) sample the aggregate feelings of the group, and B) use as concrete "hotspots" to either improve upon (if deemed negative) or maintain/safeguard (if deemed positive). So far I have:</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Detractors:</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:list --></p>
<ul>
<li>Too much outspokenness, not enough chance to speak</li>
<li>No voice, no real representation</li>
<li>Exclusive grip on wheelhouses</li>
<li>No healthy peer ownership</li>
<li>Silos, low collaboration, low emotional bucket fill</li>
<li>Shutdowns, cutoffs</li>
<li>Rephrasings</li>
<li>"Splaining" and assumptive attitudes</li>
<li>Lack of recognition, lack of constructive feedback</li>
<li>Public admonishment, not encouragements or recognition</li>
<li>Assignments and effort that is toil or low impact</li>
<li>No/low mindful group facilitation</li>
<li>Unsafe/alienating behavior</li>
<li>Ego/hubristic reasoning</li>
</ul>
<p><!-- /wp:list --></p>
<p><!-- wp:paragraph --></p>
<p>Attractors:</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:list --></p>
<ul>
<li>Encouraging good ideas</li>
<li>Hearing from everyone, not "it's obvious"</li>
<li>Always assign peer (and/or assists)</li>
<li>Prioritize "preventative maintenance" characteristics over double-time out</li>
<li>Purposeful attempt to understand; ask open questions</li>
<li>Rotate wheelhouses; opportunity to change</li>
<li>Pre-agreement on time spend / limits / timeboxes</li>
<li>Purposeful recognition / appreciations / thanks</li>
<li>Proactive communication of bandwidth; ownership of comms</li>
<li>When work is done/blocked, swarm help w/ permission</li>
<li>Obtain permission before asking for labor, time/physical or emotional</li>
</ul>
<p><!-- /wp:list --></p>
<p><!-- wp:paragraph --></p>
<p>I'm a big fan of introspection, starting with the person in the mirror, so I'm approaching this as a group ask to help me personally as a friend understand the group and myself more. However, I try not to ask people to do free work, certainly not work that doesn't benefit them in some way. As an exercise, it has it's own virtues not limited to me anyway, but I also think that the outcomes can lead us to a place of constructive discussion in the upcoming organizer's 'summit' (a 4hr quarterly Zoom) where we can all get a better picture and agree to how to improve the culture of the group.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>For a holiday weekend, I've already spent about 4hrs (not to mention tossing and turning) on this, so in many ways this has been a "long weekend" for me.</p>
<p><!-- /wp:paragraph --></p>{"login"=>"managed-wp-migration-823cfc63", "display_name"=>"Paul Bruce", "first_name"=>"Paul Bruce", "last_name"=>""}Recent changes in an organizing group I'm involved in have given rise to questions in my mind about how well we're doing, not just in terms of outputs, but in terms of cultural characteristics. In the past, I've used the model that Westrum created for assessing organizational culture to help put puzzle pieces on the table, not only to "see a bigger picture" but also to see what gaps exist (i.e. "missing pieces"). Even when you do this at some point, time goes on, and the picture changes, new boundaries are defined, and new gaps form, so it's important to do this on a regular basis. Example of Westrum cultural characteristics Personally, I have experienced more of the pathological characteristics, such as "messengers shot", "bridging discouraged", "...scapegoating", and "novelty crushed". Others probably have experienced a swatch of these and others. However, there is also a solid basis of improved and positive characteristics in this group, namely "modest cooperation" and "failure leads to inquiry". We regularly encourage verbally and async in chat, and have some of the groundwork laid for peer ownership of responsibilities and risks. What I like about the Westrum grid above is that each cell value is a pretty good label, categorically speaking, but it can be difficult to understand what to do or avoid concretely. Everything is new to someone at some point, and though I have some experiences with how to move and improve some of these aspects (e.g. from bridging tolerated to encouraged, or the levels of novelty management), there is power in group-think. For those and for other reasons, I'm in search of a list of "attractor" and "detractor" behaviors that the group can use to A) sample the aggregate feelings of the group, and B) use as concrete "hotspots" to either improve upon (if deemed negative) or maintain/safeguard (if deemed positive). So far I have: Detractors: Too much outspokenness, not enough chance to speak No voice, no real representation Exclusive grip on wheelhouses No healthy peer ownership Silos, low collaboration, low emotional bucket fill Shutdowns, cutoffs Rephrasings "Splaining" and assumptive attitudes Lack of recognition, lack of constructive feedback Public admonishment, not encouragements or recognition Assignments and effort that is toil or low impact No/low mindful group facilitation Unsafe/alienating behavior Ego/hubristic reasoning Attractors: Encouraging good ideas Hearing from everyone, not "it's obvious" Always assign peer (and/or assists) Prioritize "preventative maintenance" characteristics over double-time out Purposeful attempt to understand; ask open questions Rotate wheelhouses; opportunity to change Pre-agreement on time spend / limits / timeboxes Purposeful recognition / appreciations / thanks Proactive communication of bandwidth; ownership of comms When work is done/blocked, swarm help w/ permission Obtain permission before asking for labor, time/physical or emotional I'm a big fan of introspection, starting with the person in the mirror, so I'm approaching this as a group ask to help me personally as a friend understand the group and myself more. However, I try not to ask people to do free work, certainly not work that doesn't benefit them in some way. As an exercise, it has it's own virtues not limited to me anyway, but I also think that the outcomes can lead us to a place of constructive discussion in the upcoming organizer's 'summit' (a 4hr quarterly Zoom) where we can all get a better picture and agree to how to improve the culture of the group. For a holiday weekend, I've already spent about 4hrs (not to mention tossing and turning) on this, so in many ways this has been a "long weekend" for me.Thoughts on DevOps vs. Enterprise Culture Clash2019-12-25T18:58:22+00:002019-12-25T18:58:22+00:00https://paulsbruce.io/blog/2019/12/thoughts-on-devops-vs-enterprise-culture-clash<p><!-- wp:paragraph --></p>
<p>Probably not unlike you, every day I work with folks caught in a clash between organizational processes and technology imperatives. "We have to get this new software up and running, but the #DevOps group won't give me the time of day."</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Large organizations don't have the luxury of 'move fast, break stuff'; if they did, their infrastructure, security, financial, and software release processes would be a chaotic mess...far more than usual. But how does one 'move fast' without breaking enterprise processes, particularly ones that they don't understand?</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:heading {"level":3} --></p>
<h3>Enterprise, Know Thyself</h3>
<p><!-- /wp:heading --></p>
<p><!-- wp:paragraph --></p>
<p>The answer is simple: encourage engineers to always be curious to know more about their environment, constraints, and organizational culture. The more you know, the more nimble you'll be when planning and responding to unanticipated situations. </p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Today I had a call with a health care company, working to get docker installed on a RHEL server provisioned by an infra team. What was missing was that the operator didn't know that the security team using Centrify to manage permissions on that box required tickets to be created to grant 'dzdo su' access for a very narrow window of time. Additionally, the usual 'person to connect with' was off on holiday break, so we were at the mercy of a semi-automated process for handling these tickets, and because they had already put in a similar request in the past 7 days, all new tickets would have to go through a manual verification process. This frustrated our friend.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>The frustration manifested in the form of the following statement:</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:quote --></p>
<blockquote class="wp-block-quote"><p>Why can't they just let me have admin access to this non-production machine for more like 72 hours? Why only 2 meaasly hours at a time?</p>
<p><cite>- Engineer at an F100 health care organization</cite></p></blockquote>
<p><!-- /wp:quote --></p>
<p><!-- wp:paragraph --></p>
<p>My empathy and encouragement to them was to "expect delays at first, don't expect everyone to know exactly how processes work until they've gone through them a few times, but don't accept things like this as discouragements to your primary objective." </p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>If everything were easy and no problems existed, kind words might be useless. When things are not working that way, knowing how to fix or overcome them goes a long way, just like a kind word at the right time. We crafted an email to the security team together explaining exactly what was needed AND WHY, as well as an indication of the authority and best/worst case timelines that we were operating under, and a sincere thank you.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:heading {"level":3} --></p>
<h3>Enterprise "DevOps" Patterns that Feel Like Anti-Patterns</h3>
<p><!-- /wp:heading --></p>
<p><!-- wp:paragraph --></p>
<p>In my current work, I experience a lot of different enterprise dynamics at many organizations around the world. The same themes, of course, come up often. A few dynamics I've seen in play when enterprises try to put new technology work in a pretty box (i.e. consolidate "DevOps engineers" into a centralized team) are:</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:list {"ordered":true} --></p>
<ol>
<li>Enterprise DevOps/CloudOps/infra teams adopt the pattern of <strong><em>"planned work"</em></strong>, just like development teams, using sprints and work tracking to provide manageable throughput and consistency of support to other organizational 'consumers'. This inherits other patterns like prioritization of work items, delivery dates, estimable progress, etc.</li>
<li>Low/no context requests into these teams get rejected because it's slow/impossible to prioritize and plan based on ambiguous work requirements</li>
<li>The amount of control and responsibility these teams have over security and infrastructure systems the organization is often considered "high risk", so they're subject to additional scrutiny come audit time</li>
</ol>
<p><!-- /wp:list --></p>
<p><!-- wp:paragraph --></p>
<p>That last point about auditing, particularly the psychological impacts on 'move fast' engineers, cannot be understated. When someone asks you to break protocol 'just this one time', it's you that's on the hook for explaining why you took action to do so, rarely the product owner or director who pressured the engineer to do it.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Technical auditors that are worth anything more than spit will focus on processes instead of narrow activities because to comb through individual log entries is not scalable...but verifying that critical risk mitigative processes are in place and checking for examples of when the process is AND isn't being followed...that's far more doable in the few precious weeks that auditing firms are contracted to complete their work. </p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:heading {"level":3} --></p>
<h3>The More You Know, The Faster You Can Go (Safely)</h3>
<p><!-- /wp:heading --></p>
<p><!-- wp:paragraph --></p>
<p>An example of how understanding your enterprise organization's culture improves the speed of your work comes from an email today between two colleagues at F100+:</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:quote --></p>
<blockquote class="wp-block-quote"><p>Can you confirm tentative dates when you are planning to conduct this test? Also will it take time to open firewall, post freeze incident tickets can be fast tracked?</p>
<p><cite>- Performance Engineering at Major Retailer</cite></p></blockquote>
<p><!-- /wp:quote --></p>
<p><!-- wp:paragraph --></p>
<p>This is a simple example of proper planning. Notice that the first as is for concrete dates, an inference that others also need to have their shit together (in this particular case because they're conducting a 100k synthetic user test against some system, not a trivial thing in the slightest). The knowledge that firewall rules have to be requested ahead of time, and to notify incident response that potential issues reported may be due to the simulation, not real production traffic, comes from having experienced these things before. Understanding takes time.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>Another software engineer friend of mine in the open-source space and I were discussing the Centrify thing today, and he asked: "why can't they just set up and configure this server with temporary admin rights off to the side, then route appropriate ports and stuff to it once it's working?" Many practitioners in the bowels of enterprises will recognize a few wild assumptions there, and in no way is this a slight of my friend, but rather an example of how different thinking is from two very different engineering cultures. More specifically, those who are used to being constrained as opposed to those who aren't often have a harder time collaborating with each other because they're reasoning is predicated on very different past experiences. I see this one a lot.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:heading {"level":3} --></p>
<h3 id="devops-is-an-approach">DevOps Is an Approach to Engineering Culture, not a Team</h3>
<p><!-- /wp:heading --></p>
<p><!-- wp:paragraph --></p>
<p>This is my perspective after only 5yrs of working out what "DevOps" means. I encourage everyone to find their own by having their own journey of curiosity, keyboard work, and many conversations.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>There is and never should be a DevOps 'manifesto'. As Andrew Clay Shafer (<a href="https://twitter.com/littleidea" target="_blank" rel="noreferrer noopener" aria-label="@littleidea (opens in a new tab)">@littleidea</a>) once said, DevOps is about <em>'optimizing for people'</em>, not process or policy or one type of team only. Instead of manifesto bullet points, there are some clear and common principles that have stayed the test of time since 2008:</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:list --></p>
<ul>
<li>A flow of work, as one way as possible</li>
<li>Observability and Transparency</li>
<li>Effective communication and collaboration</li>
<li>A high degree of automation</li>
<li>Feedback and experimentation for learning and mastery</li>
</ul>
<p><!-- /wp:list --></p>
<p><!-- wp:paragraph --></p>
<p>Some of the principles above come from <a rel="noreferrer noopener" aria-label=" (opens in a new tab)" href="https://itrevolution.com/the-three-ways-principles-underpinning-devops/" target="_blank">early work</a> like The Phoenix Project, The Goal, and Continuous Delivery; <a rel="noreferrer noopener" aria-label="others (opens in a new tab)" href="http://agilealmdevops.com/2016/10/26/devops-principles-and-practices/" target="_blank">others</a> come from more formalized research such as ISO and IEEE working groups on DevOps that I've been a part of over the past 3 years.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>I don't tend to bring the "DevOps is not a team" bit up when talking with F100s primarily because:</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:list --></p>
<ul>
<li>it's not terribly relevant to our immediate work and deliverables</li>
<li>enterprises that think in terms of cost centers always make up departments, because "we have to know who's budget to pay them from and who manages them"</li>
<li>Now that DevOps is in vogue with various IT leaders and just like the manifestation of Agile everywhere now, DevOps is perceived as 'yet another demand to do things differently from management', so after being restructured, engineers often have enough open wounds that I don't need to throw salt on</li>
<li>if this is how people grok DevOps in their organization, there's little I as an 'outside' actor can do to change it...except maybe a little side-conversation over beers here and there, which I try to do as much as appropriately possible with receptive folks</li>
</ul>
<p><!-- /wp:list --></p>
<p><!-- wp:paragraph --></p>
<p>However, as an approach to engineering culture, DevOps expects people to work together, to "row in the same direction", and to learn at every opportunity. As I stated at the beginning of this post, learning more about the people and processes around you, the constraints and interactions behind the behaviors we see, being curious, and having empathy...these things all still work in an enterprise context.</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:paragraph --></p>
<p>As the Buddha taught, the Middle Path gives vision, gives knowledge, and leads to calm, to insight, to enlightenment. There is always a 'middle way', and IMO is often the easiest path between extremes to get to the place where you want to be.</p>
<p><!-- /wp:paragraph --></p>{"login"=>"pbruce", "display_name"=>"Paul Bruce", "first_name"=>"Paul", "last_name"=>"Bruce"}Probably not unlike you, every day I work with folks caught in a clash between organizational processes and technology imperatives. "We have to get this new software up and running, but the #DevOps group won't give me the time of day." Large organizations don't have the luxury of 'move fast, break stuff'; if they did, their infrastructure, security, financial, and software release processes would be a chaotic mess...far more than usual. But how does one 'move fast' without breaking enterprise processes, particularly ones that they don't understand? Enterprise, Know Thyself The answer is simple: encourage engineers to always be curious to know more about their environment, constraints, and organizational culture. The more you know, the more nimble you'll be when planning and responding to unanticipated situations. Today I had a call with a health care company, working to get docker installed on a RHEL server provisioned by an infra team. What was missing was that the operator didn't know that the security team using Centrify to manage permissions on that box required tickets to be created to grant 'dzdo su' access for a very narrow window of time. Additionally, the usual 'person to connect with' was off on holiday break, so we were at the mercy of a semi-automated process for handling these tickets, and because they had already put in a similar request in the past 7 days, all new tickets would have to go through a manual verification process. This frustrated our friend. The frustration manifested in the form of the following statement: Why can't they just let me have admin access to this non-production machine for more like 72 hours? Why only 2 meaasly hours at a time? - Engineer at an F100 health care organization My empathy and encouragement to them was to "expect delays at first, don't expect everyone to know exactly how processes work until they've gone through them a few times, but don't accept things like this as discouragements to your primary objective." If everything were easy and no problems existed, kind words might be useless. When things are not working that way, knowing how to fix or overcome them goes a long way, just like a kind word at the right time. We crafted an email to the security team together explaining exactly what was needed AND WHY, as well as an indication of the authority and best/worst case timelines that we were operating under, and a sincere thank you. Enterprise "DevOps" Patterns that Feel Like Anti-Patterns In my current work, I experience a lot of different enterprise dynamics at many organizations around the world. The same themes, of course, come up often. A few dynamics I've seen in play when enterprises try to put new technology work in a pretty box (i.e. consolidate "DevOps engineers" into a centralized team) are: Enterprise DevOps/CloudOps/infra teams adopt the pattern of "planned work", just like development teams, using sprints and work tracking to provide manageable throughput and consistency of support to other organizational 'consumers'. This inherits other patterns like prioritization of work items, delivery dates, estimable progress, etc. Low/no context requests into these teams get rejected because it's slow/impossible to prioritize and plan based on ambiguous work requirements The amount of control and responsibility these teams have over security and infrastructure systems the organization is often considered "high risk", so they're subject to additional scrutiny come audit time That last point about auditing, particularly the psychological impacts on 'move fast' engineers, cannot be understated. When someone asks you to break protocol 'just this one time', it's you that's on the hook for explaining why you took action to do so, rarely the product owner or director who pressured the engineer to do it. Technical auditors that are worth anything more than spit will focus on processes instead of narrow activities because to comb through individual log entries is not scalable...but verifying that critical risk mitigative processes are in place and checking for examples of when the process is AND isn't being followed...that's far more doable in the few precious weeks that auditing firms are contracted to complete their work. The More You Know, The Faster You Can Go (Safely) An example of how understanding your enterprise organization's culture improves the speed of your work comes from an email today between two colleagues at F100+: Can you confirm tentative dates when you are planning to conduct this test? Also will it take time to open firewall, post freeze incident tickets can be fast tracked? - Performance Engineering at Major Retailer This is a simple example of proper planning. Notice that the first as is for concrete dates, an inference that others also need to have their shit together (in this particular case because they're conducting a 100k synthetic user test against some system, not a trivial thing in the slightest). The knowledge that firewall rules have to be requested ahead of time, and to notify incident response that potential issues reported may be due to the simulation, not real production traffic, comes from having experienced these things before. Understanding takes time. Another software engineer friend of mine in the open-source space and I were discussing the Centrify thing today, and he asked: "why can't they just set up and configure this server with temporary admin rights off to the side, then route appropriate ports and stuff to it once it's working?" Many practitioners in the bowels of enterprises will recognize a few wild assumptions there, and in no way is this a slight of my friend, but rather an example of how different thinking is from two very different engineering cultures. More specifically, those who are used to being constrained as opposed to those who aren't often have a harder time collaborating with each other because they're reasoning is predicated on very different past experiences. I see this one a lot. DevOps Is an Approach to Engineering Culture, not a Team This is my perspective after only 5yrs of working out what "DevOps" means. I encourage everyone to find their own by having their own journey of curiosity, keyboard work, and many conversations. There is and never should be a DevOps 'manifesto'. As Andrew Clay Shafer (@littleidea) once said, DevOps is about 'optimizing for people', not process or policy or one type of team only. Instead of manifesto bullet points, there are some clear and common principles that have stayed the test of time since 2008: A flow of work, as one way as possible Observability and Transparency Effective communication and collaboration A high degree of automation Feedback and experimentation for learning and mastery Some of the principles above come from early work like The Phoenix Project, The Goal, and Continuous Delivery; others come from more formalized research such as ISO and IEEE working groups on DevOps that I've been a part of over the past 3 years. I don't tend to bring the "DevOps is not a team" bit up when talking with F100s primarily because: it's not terribly relevant to our immediate work and deliverables enterprises that think in terms of cost centers always make up departments, because "we have to know who's budget to pay them from and who manages them" Now that DevOps is in vogue with various IT leaders and just like the manifestation of Agile everywhere now, DevOps is perceived as 'yet another demand to do things differently from management', so after being restructured, engineers often have enough open wounds that I don't need to throw salt on if this is how people grok DevOps in their organization, there's little I as an 'outside' actor can do to change it...except maybe a little side-conversation over beers here and there, which I try to do as much as appropriately possible with receptive folks However, as an approach to engineering culture, DevOps expects people to work together, to "row in the same direction", and to learn at every opportunity. As I stated at the beginning of this post, learning more about the people and processes around you, the constraints and interactions behind the behaviors we see, being curious, and having empathy...these things all still work in an enterprise context. As the Buddha taught, the Middle Path gives vision, gives knowledge, and leads to calm, to insight, to enlightenment. There is always a 'middle way', and IMO is often the easiest path between extremes to get to the place where you want to be.