Performance Engineer vs. Tester

A performance engineer’s job is to get things to work really, really well.

Some might say that the difference between being a performance tester and a performance engineer boils down to scope. The scope of a tester is testing, to construct, execute and verify test results. An engineer seeks to understand, validate, and improve the operational context of a system.

Sure, let’s go with that for now, but really the difference is an appetite for curiosity. Some people treat monoliths as something to fear or control. Others explore them, learn how to move beyond them, and how to bring others along in the journey.

Testing Is Just a Necessary Tactic of an Engineer

Imagine being an advisor to a professional musician, their performance engineer. What would that involve? You wouldn’t just administer tests, you would carefully coach, craft instruction, listen and observe, seek counsel from other musicians and advisors, ultimately to provide the best possible path forward to your client. You would need to know their domain, their processes, their talents and weaknesses, their struggle.

With software teams and complex distributed systems, a lot can go wrong very quickly. Everyone tends to assume their best intentions manifest into their code, that what they build is today’s best. Then time goes by and everything more than 6 months old is already brownfield. What if the design of a thing is already so riddled with false assumptions and unknowns that everything is brownfield before it even begins.

Pretend with me for a moment, that if you were to embody the software you write, become your code, and look at your operational lifecycle as if it was your binary career, your future would be a bleak landscape of retirement options. Your code has a half-life.

Everything Is Flawed from the Moment of Inception

Most software is like this…not complete shit but more like well-intentioned gift baskets full of fruits, candies, pretty things, easter eggs, and bunny droppings. Spoils the whole fucking lot when you find them in there. A session management microservice that only starts to lose sessions once a few hundred people are active. An obese 3mb CSS file accidentally included in the final deployment. A reindexing process that tanks your order fulfillment process to 45 seconds, giving customers just enough time to rethink.

Performance engineer doesn’t simply polish turds. We help people not to build broken systems to begin with. In planning meetings, we coach people to ask critical performance questions by asking those questions in a way that appeals to their ego and curiosity at a time that’s cost effective to do so. We write in BIG BOLD RED SHARPIE in a corner of the sprint board what the percentage slow-down to the login process the nightly build as now caused. We develop an easy way to assess the performance of changes and new code, so that task templates in JIRA can include the “performance checkbox” in a meaningful way with simple steps on a wiki page.

Engineers Ask Questions Because Curiosity Is Their Skill

We ask how a young SRE’s good intentions of wrapping u statistical R models from a data sciences product team in Docker containers to speed deployment to production will affect resources, how they intend on measuring the change impact so that the CFO isn’t going to be knocking down their door the next day.

We ask why the architects didn’t impose requirements on their GraphQL queries to deliver only the fields necessary within JSON responses to mobile app clients, so that developers aren’t even allowed to reinvent the ‘SELECT * FROM’ mistake so rampant in legacy relational and OLAP systems.

We ask what the appropriate limits should be to auto-scaling and load balancing strategies and when we’d like to be alerted that our instance limits and contractual bandwidth limits are approaching cutoff levels. We provide cross-domain expertise from Ops, Dev, and Test to continuously integrate the evidence of false assumptions back into the earliest cycle possible. There should be processes in place to expose and capture things which can’t always be known at the time of planning.

Testers ask questions (or should) before they start testing. Entry/exit criteria, requirements gathering, test data, branch coverage expectations, results format, sure. Testing is important but is only a tactic.

Engineers Improve Process, Systems, and Teams

In contrast, engineering has the curiosity and the expertise to get ahead of testing so that when it comes time, the only surprises are the ones that are actually surprising, those problems that no one could have anticipated, and to advise on how to solve them based on evidence and team feedbacks collected throughout planning, implementation, and operation cycles.

An engineer’s greatest hope is to make things work really, really well. That hope extends beyond the software, the hardware, and the environment. It includes the teams, the processes, the business risks, and the end-user expectations.

Holiday IoT and the Performance Imperative

A few words to manufacturers and vendors of tech toys: to really be ready for the holiday, if your product requires software updates in order to work or is in any way internet connected, make sure your site stays up. Otherwise, you just shipped coal.

  • Provide more than one distribution point for downloadable updates/binaries
  • Rely on CDNs for static assets (like installers and documentation)
  • If your update process must rely on live services, make sure they’re scalable
    • Load test subcomponents/microservices AND the end-to-end process
  • Be prepared for damage control by:
    • Monitoring site uptime and availability to know when things are broken
    • Proactively establish a communication channel with customers during issues
    • Properly staff IT and support for during AND post-season issues
  • Make sure the cost of downtime is factored into your next sales cycle

Santa Brought Us a Brick?

Let me start by admitting how 1st-world this example is. Robots as play-things are still not exactly ‘so easy, a child could do it’, and Roomba’s have been around for almost two decades, but we still have yet to see a really down-to-earth home robotics project that really works for children under 10. I don’t just mean toys that are not pre-assembled, even the right-out-of-the-box kind often require firmware updates or online services to really work as expected.

Case in point, the Meccano MAX. Though it only took about 3hrs total to put together, this morning when we finally turned it on and went to connect for a firmware update, the vendor’s website was down…hard. The instructions said, before anything else, update the ‘MeccaMind’ and voice commands weren’t working without, so, blocker.

As an ops nerd, I slapped an uptime monitor on it to know when (if ever) it was back up:

That didn’t stop the whole multi-day experience from deflating to a dud. We all worked on this thing together and then before it can do anything, we are stuck guessing about when we can actually enjoy it. Don’t blame Santa, blame the geniuses at Meccano.

Performance, of your product, of your service, of your site, is imperative to delivering what you sold people. Availability, uptime, scalability, and reliability matter by default now. Everyone has downtime, but 4hr recovery time on your corporate domain isn’t just irresponsible and costly, it’s plain embarrassing…and transparent.

Why Is This Even Important?

My 7yr old is crazy into coding right now. Granted, we use a visual Code Block Editor mostly for lights and tones, but it’s a great way to introduce concepts like flow control and formal logic. As soon as she saw an example I built that used function blocks to encapsulate and reuse logic, she instantly understood and started refactoring her programs.

But finding the right project for varying stages of aptitude, appetite, and enjoyment is a real challenge. No thanks to marketing, but also unanticipated road-blocks like service and subscription dependencies are hard for consumers to factor in when purchasing. Even when you do find a right-fit project, if bone-headed problems like website downtime occur, it can become a negative experience for the child (or student).

It’s important for STEM product manufacturers and software vendors to really think about the impact of what they’re selling, how they’re delivering it, and how to support people who paid them money for something to accomplish a goal. If you don’t have the optimal consumer’s experience in mind, it will eventually cost you.

Need the Robot Software Updater?

I can’t archive everything on their site, and it’s their job to provide reliable content distribution, but in case you find yourself stuck like I was, here are links to at least the firmware updater tool:

Also, never ask a consumer if they want to choose (null).

And if the updater gives you flashbacks to DirectX drivers from 1997, don’t worry. It only looked like it bricked my robot for about 4mins before providing UI feedback:

Once you do get back to the modern era, a 98.6MB mobile app to control it shouldn’t be too hard on your data plan. They also need to know your GPS location, phone contacts, and file storage for some reason.

AllDayDevOps 2018: Progressive Testing to Meet the Performance Imperative

Mostly as an appendix for references and readings, but whenever I can I like to have a self-hosted post to link everything back to about a particular presentation.

My slides for the presy:

Video stream:

A few thoughts from my journal today (spelling and grammar checks off):

Will update as the conversation unfolds.


Performance Is (Still) a Feature, Not a Test!

Since I presented the following perspective at APIStrat Chicago 2014, I’ve had many opportunities to clarify and deepen it within the context of Agile and DevOps development:

It’s more productive to view system performance as a feature than to view it as a set of tests you run occasionally.

The more teams I work with, the more I see how performance as a critical aspect of their products. But why is performance so important?

‘Fast’ Is a Subconscious User Expectation

Whether you’re building an API, an app, or whatever, its consumers (people, processes) don’t want to wait around. If your software is slow, it becomes a bottleneck to whatever real-world process it facilitates.

Your Facebook feed is a perfect example. If it is even marginally slower to scroll through it today than it was yesterday, if it is glitchy, halty, or jenky in any way, your experience turns from dopamine-inducing self-gratification to epinephrine fueled thoughts of tossing your phone into the nearest body of water. Facebook engineers know this, which is why they build data centers to test and monitor mobile performance on a per-commit basis. For them, this isn’t a luxury; it’s a hard requirement, as it is for all of us whether we choose to address it or not. Performance is everyone’s problem.

Performance is as critical to delighting people as delivering them features they like. This is why session abandonment rates are a key metric on Cyber Monday.

‘Slow’ Compounds Quickly

Performance is a measurement of availability over time, and time always marches forward. Performance is an aggregate of many dependent systems, and even just one slow link can cause an otherwise blazingly fast process to grind to a halt long enough for people to turn around and walk the other way.

Consider a mobile app; performance is everything. The development team slaves over which list component scrolls faster and more smoothly, spends hours getting asynchronous calls and spinners to provide the user critical feedback so that they don’t think the app has crashed. Then a single misbehaving REST call to some external web API suddenly slows by 50% and the whole user experience is untenable.

The performance of a system is only as strong as it’s weakest link. In technical terms, this is about risk. You at least need to know the risk introduced by each component of a system; only then can you chose how to mitigate the risk accordingly. ‘Risk’ is a huge theme in ISO 29119 and the upcoming IEEE 2675 draft I’m working on, and any seasoned architect would know why it matters.

Fitting Performance into Feature Work

Working on ‘performance’ and working on a feature shouldn’t be two separate things. Automotive designers don’t do this when they build car engines and performance is paramount throughout even the assembly process as well. Neither should it be separate in software development.

However, in practice if you’ve never run a load test, tracked power consumption of a subroutine or analyzed aggregate results, it will be different than building stuff for sure. Comfortability and efficiency come with experience. A lack of experience or familiarity doesn’t remove the need for something critical to occur; it accelerates the need to ask how to get it done.

A reliable code pipeline and testing schedule make all the difference here. Many performance issues take time or dramatic conditions to expose, such as battery degradation, load balancing, and memory leaks. In these cases, it isn’t feasible to execute long-running performance tests for every code check-in.

What does this mean for code contributors? Since they are still responsible for meeting performance criteria, it means that they can’t always press the ‘done’ button today. It means we need reliable delivery pipelines to push code through that checks its performance pragmatically. As pressure to deliver value incrementally mounts, developers are taking responsibility for the build and deployment process through technologies like Docker, Jenkins Pipeline, and Puppet.

It also means that we need to adopt a testing schedule that meets the desired development cadence and real-world constrains on time or infrastructure:

  • Run small performance checks on all new work (new screens, endpoints, etc.)
  • Run local baselines and compare before individual contributors check in code
  • Schedule long-running (anything slower than 2mins) performance tests into pipeline stage after build verification in parallel
  • Schedule nightly performance regression checks on all critical risk workflows (i.e. login, checkout, submit claim, etc.)

How Do You Bake Performance Into Development?

While it’s perfectly fine to adopt patterns like ‘spike and stabilize’ on feature development, stabilization is a required payback of the technical debt you incur when your development spikes. To ‘stabilize’ isn’t just to make the code work, it’s to make it work well. This includes performance (not just acceptance) criteria to be considered complete.

A great place to start making measurable performance improvements is to measure performance objectively. Every user story should contain solid performance criteria, just as it should with acceptance criteria. In recent joint research, I found that higher performing development teams include performance criteria on 50% more of their user stories.

In other words, embedding tangible performance expectations in your user stories bakes performance in to the resulting system.

There are a lot of sub-topics under the umbrella term “performance”. When we get down to brass tacks, measuring performance characteristics often boils down to three aspects: throughput, reliability, and scalability. I’m a huge fan of load testing because it helps to verify all three measurable aspects of performance.

Throughput: from a good load test, you can objectively track throughput metrics like transactions/sec, time-to-first-byte (and last byte), and distribution of resource usage (i.e. are all CPUs being used efficiently). These give you a raw and necessarily granular level of detail that can be monitored and visualized in stand-ups and deep-dives equally.

Reliability: load tests also exercise your code far more than you can independently. It takes exercise to expose if a process is unreliable; concurrency in a load test is like exercise on steroids. Load tests can act as your robot army, especially when infrastructure or configuration changes push you into unknown risk territory.

Scalability: often, scalability mechanisms like load balancing, dynamic provisioning, and network shaping throw unexpected curveballs into your user’s experience. Unless you are practicing a near-religious level of control over deployment of code, infrastructure, and configuration changes into production, you run the risk of affecting real users (i.e. your paycheck). Load tests are a great way to see what happens ahead of time.


Short, Iterative Load Testing Fits Development Cycles

I am currently working with a client to load test their APIs, to simulate mobile client bursts of traffic that represent real-world scenarios. After a few rounds of testing, we’ve resolve many obvious issues, such as:

  • Overly verbose logs that write to SQL and/or disk
  • Parameter formats that cause server-side parsing errors
  • Throughput restrictions against other 3rd-party APIs (Google, Apple)
  • Static data that doesn’t exercise the system sufficiently
  • Large images stored as SQL blobs with no caching

We’ve been able to work through most of these issues quickly in test/fail/fix/re-test cycles, where we conduct short all-hands sessions with a developer, test engineer, and myself. After a quick review of significant changes since the last session (i.e. code, test, infrastructure, configuration), we use BlazeMeter to kick of a new API load test written in jMeter and monitor the server in real-time. We’ve been able to rapidly resolve a few anticipated, backlogged issues as well as learn about new problems that are likely to arise at future usage tiers.

The key here is to ‘anticipate iterative re-testing‘. Again I say: “performance is a feature, not a test”. It WILL require re-design and re-shaping as the code changes and system behaviors are better understood. It’s not a one-time thing to verify how a dynamic system behaves given a particular usage pattern.

The outcome from a business perspective of this load testing is that this new system is perceived to be far less of a risky venture, and more the innovation investment needed to improve sales and the future of their digital strategy.

Performance really does matter to everyone. That’s why I’m available to chat with you about it any time. Ping me on Twitter and we’ll take it from there.