Performance Is (Still) a Feature, Not a Test!

Since I presented the following perspective at APIStrat Chicago 2014, I’ve had many opportunities to clarify and deepen it within the context of Agile and DevOps development:

It’s more productive to view system performance as a feature than to view it as a set of tests you run occasionally.

The more teams I work with, the more I see how performance as a critical aspect of their products. But why is performance so important?

‘Fast’ Is a Subconscious User Expectation

Whether you’re building an API, an app, or whatever, its consumers (people, processes) don’t want to wait around. If your software is slow, it becomes a bottleneck to whatever real-world process it facilitates.

Your Facebook feed is a perfect example. If it is even marginally slower to scroll through it today than it was yesterday, if it is glitchy, halty, or jenky in any way, your experience turns from dopamine-inducing self-gratification to epinephrine fueled thoughts of tossing your phone into the nearest body of water. Facebook engineers know this, which is why they build data centers to test and monitor mobile performance on a per-commit basis. For them, this isn’t a luxury; it’s a hard requirement, as it is for all of us whether we choose to address it or not. Performance is everyone’s problem.

Performance is as critical to delighting people as delivering them features they like. This is why session abandonment rates are a key metric on Cyber Monday.

‘Slow’ Compounds Quickly

Performance is a measurement of availability over time, and time always marches forward. Performance is an aggregate of many dependent systems, and even just one slow link can cause an otherwise blazingly fast process to grind to a halt long enough for people to turn around and walk the other way.

Consider a mobile app; performance is everything. The development team slaves over which list component scrolls faster and more smoothly, spends hours getting asynchronous calls and spinners to provide the user critical feedback so that they don’t think the app has crashed. Then a single misbehaving REST call to some external web API suddenly slows by 50% and the whole user experience is untenable.

The performance of a system is only as strong as it’s weakest link. In technical terms, this is about risk. You at least need to know the risk introduced by each component of a system; only then can you chose how to mitigate the risk accordingly. ‘Risk’ is a huge theme in ISO 29119 and the upcoming IEEE 2675 draft I’m working on, and any seasoned architect would know why it matters.

Fitting Performance into Feature Work

Working on ‘performance’ and working on a feature shouldn’t be two separate things. Automotive designers don’t do this when they build car engines and performance is paramount throughout even the assembly process as well. Neither should it be separate in software development.

However, in practice if you’ve never run a load test, tracked power consumption of a subroutine or analyzed aggregate results, it will be different than building stuff for sure. Comfortability and efficiency come with experience. A lack of experience or familiarity doesn’t remove the need for something critical to occur; it accelerates the need to ask how to get it done.

A reliable code pipeline and testing schedule make all the difference here. Many performance issues take time or dramatic conditions to expose, such as battery degradation, load balancing, and memory leaks. In these cases, it isn’t feasible to execute long-running performance tests for every code check-in.

What does this mean for code contributors? Since they are still responsible for meeting performance criteria, it means that they can’t always press the ‘done’ button today. It means we need reliable delivery pipelines to push code through that checks its performance pragmatically. As pressure to deliver value incrementally mounts, developers are taking responsibility for the build and deployment process through technologies like Docker, Jenkins Pipeline, and Puppet.

It also means that we need to adopt a testing schedule that meets the desired development cadence and real-world constrains on time or infrastructure:

  • Run small performance checks on all new work (new screens, endpoints, etc.)
  • Run local baselines and compare before individual contributors check in code
  • Schedule long-running (anything slower than 2mins) performance tests into pipeline stage after build verification in parallel
  • Schedule nightly performance regression checks on all critical risk workflows (i.e. login, checkout, submit claim, etc.)

How Do You Bake Performance Into Development?

While it’s perfectly fine to adopt patterns like ‘spike and stabilize’ on feature development, stabilization is a required payback of the technical debt you incur when your development spikes. To ‘stabilize’ isn’t just to make the code work, it’s to make it work well. This includes performance (not just acceptance) criteria to be considered complete.

A great place to start making measurable performance improvements is to measure performance objectively. Every user story should contain solid performance criteria, just as it should with acceptance criteria. In recent joint research, I found that higher performing development teams include performance criteria on 50% more of their user stories.

In other words, embedding tangible performance expectations in your user stories bakes performance in to the resulting system.

There are a lot of sub-topics under the umbrella term “performance”. When we get down to brass tacks, measuring performance characteristics often boils down to three aspects: throughput, reliability, and scalability. I’m a huge fan of load testing because it helps to verify all three measurable aspects of performance.

Throughput: from a good load test, you can objectively track throughput metrics like transactions/sec, time-to-first-byte (and last byte), and distribution of resource usage (i.e. are all CPUs being used efficiently). These give you a raw and necessarily granular level of detail that can be monitored and visualized in stand-ups and deep-dives equally.

Reliability: load tests also exercise your code far more than you can independently. It takes exercise to expose if a process is unreliable; concurrency in a load test is like exercise on steroids. Load tests can act as your robot army, especially when infrastructure or configuration changes push you into unknown risk territory.

Scalability: often, scalability mechanisms like load balancing, dynamic provisioning, and network shaping throw unexpected curveballs into your user’s experience. Unless you are practicing a near-religious level of control over deployment of code, infrastructure, and configuration changes into production, you run the risk of affecting real users (i.e. your paycheck). Load tests are a great way to see what happens ahead of time.

 

Short, Iterative Load Testing Fits Development Cycles

I am currently working with a client to load test their APIs, to simulate mobile client bursts of traffic that represent real-world scenarios. After a few rounds of testing, we’ve resolve many obvious issues, such as:

  • Overly verbose logs that write to SQL and/or disk
  • Parameter formats that cause server-side parsing errors
  • Throughput restrictions against other 3rd-party APIs (Google, Apple)
  • Static data that doesn’t exercise the system sufficiently
  • Large images stored as SQL blobs with no caching

We’ve been able to work through most of these issues quickly in test/fail/fix/re-test cycles, where we conduct short all-hands sessions with a developer, test engineer, and myself. After a quick review of significant changes since the last session (i.e. code, test, infrastructure, configuration), we use BlazeMeter to kick of a new API load test written in jMeter and monitor the server in real-time. We’ve been able to rapidly resolve a few anticipated, backlogged issues as well as learn about new problems that are likely to arise at future usage tiers.

The key here is to ‘anticipate iterative re-testing‘. Again I say: “performance is a feature, not a test”. It WILL require re-design and re-shaping as the code changes and system behaviors are better understood. It’s not a one-time thing to verify how a dynamic system behaves given a particular usage pattern.

The outcome from a business perspective of this load testing is that this new system is perceived to be far less of a risky venture, and more the innovation investment needed to improve sales and the future of their digital strategy.

Performance really does matter to everyone. That’s why I’m available to chat with you about it any time. Ping me on Twitter and we’ll take it from there.

A Jenkins Pipeline for Mobile UI Testing with Appium and Docker

In theory, a completely Docker-ized version of an Appium mobile UI test stack sounds great. In practice, however, it’s not that simple. This article explains how to structure a mobile app pipeline using Jenkins, Docker, and Appium.

TL;DR: The Goal Is Fast Feedback on Code Changes

When we make changes, even small ones, to our codebase, we want to prove that they had no negative impact on the user experience. How do we do this? We test…but manual testing is takes time and is error prone, so we write automated unit and functional tests that run quickly and consistently. Duh.

As Uncle Bob Martin puts it, responsible developers not only write code that works, they provide proof that their code works. Automated tests FTW, right?

Not quite. There are a number of challenges with test automation that raise the bar on complexity to successfully getting tests to provide us this feedback. For example:

  • How much of the code and it’s branches actually get covered by our tests?
  • How often do tests fail for reasons that aren’t because the code isn’t working?
  • How accurate was our implementation of the test case and criteria as code?
  • Which tests do we absolutely need to run, and which can we skip?
  • How fast can and must these tests run to meet our development cadence?

Jenkins Pipeline to the Rescue…Not So Fast!

Once we identify what kind of feedback we need and match that to our development cadence, it’s time to start writing tests, yes? Well, that’s only part of the process. We still need a reliable way to build/test/package our apps. The more automated this can be, the faster we can get the feedback. A pipeline view of the process begins with code changes, includes building, testing, and packaging the app so we always have a ‘green’ version of our app.

Many teams chose a code-over-configuration approach. The app is code, the tests are code, server setup (via Puppet/Chef and Docker) is code, and not surprisingly, our delivery process is now code too. Everything is code, which lets us extend SCM virtues (versioning, auditing, safe merging, rollback, etc.) to our entire software lifecycle.

Below is an example of ‘process-as-code’ is Jenkins Pipeline script. When a build project is triggered, say when someone pushes code to the repo, Jenkins will execute this script, usually on a build agent. The code gets pulled, the project dependencies get refreshed, a debug version of the app and tests are build, then the unit and UI tests run.

Notice that last step? The ‘Instrumented Tests’ stage is where we run our UI tests, in this case our Espresso test suite using an Android emulator. The sharp spike in code complexity, notwithstanding my own capabilities, reflects reality. I’ve seen a lot of real-world build/test scripts which also reflect the amount of hacks and tweaks that begin to gather around the technologically significant boundary of real sessions and device hardware.

A great walkthrough on how to set up a Jenkinsfile to do some of the nasty business of managing emulator lifecycles can be found on Philosophical Hacker…you know, for light reading on the weekend.

Building a Homegrown UI Test Stack: Virtual Insanity

We have lots of great technologies at our disposal. In theory, we could use Docker, the Android SDK, Espresso, and Appium to build reusable, dynamic nodes that can build, test, and package our app dynamically.

Unfortunately, in practice, the user interface portion of our app requires hardware resources that simply can’t be executed in a timely manner in this stack. Interactive user sessions are a lot of overhead, even virtualized, and virtualization is never perfect.

Docker runs under either a hyperkit (lightweight virtualization layer on Mac) or within a VirtualBox host, but neither of these solutions support nested virtualization and neither can pass raw access to the host machine’s VTX instruction set through to containers.

What’s left for containers is a virtualized CPU that doesn’t support the basic specs that the Android emulator needs to use host GPU, requiring us to run ‘qemu’ and ARM images instead of native x86/64 AVD-based images. This makes timely spin-up and execution of Appium tests so slow that it renders the solution infeasible.

Alternative #1: Containerized Appium w/ Connection to ADB Device Host

Since we can’t feasibly keep emulation in the same container as the Jenkins build node, we need to split out the emulators to host-level hardware assisted virtualization. This approach also has the added benefit of reducing the dependencies and compound issues that can occur in a single container running the whole stack, making process issues easier to pinpoint if/when they arise.

So what we’ve done is decoupled our “test lab” components from our Jenkins build node into a hardware+software stack that can be “easily” replicated:

Unfortunately, we can no longer keep our Appium server in a Docker container (which would make the process reliable, consistent across the team, and minimize cowboy configuration issues). Even after you:

  • Run the appium container in priviledged mode
  • Mount volumes to pass build artifacts around
  • Establish an SSH tunnel from container to host to use host ADB devices
  • Establish a reverse SSH tunnel from host to container to connect to Appium
  • Manage and exchange keys for SSH and Appium credentials

…you still end up dealing with flaky container-to-host connectivity and bizarre Appium errors that don’t occur if you simply run Appium server on bare metal. Reliable infrastructure is a hard requirement, and the more complexity we add to the stack, the more (often) things go sideways. Sad but true.

Alternative #2: Cloud-based Lab as a Service

Another alternative is to simply use a cloud-based testing service. This typically involves adding credentials and API keys to your scripts, and paying for reserved devices up-front, which can get costly. What you get is hassle-free, somewhat constrained real devices that can be easily scaled as your development process evolves. Just keep in mind, aside from credentials, you want to carefully managed how much of your test code integrates custom commands and service calls that can’t easily be ported over to another provider later.

Alternative #3: Keep UI Testing on a Development Workstation

Finally, we could technically run all our tests on our development machine, or get someone else to run them, right? But this wouldn’t really translate to a CI environment and doesn’t take full advantage of the speed benefits of automation, neither of which help is parallelize coding and testing activities. Testing on local workstations is important before checking in new tests to prove that they work reliably, but doesn’t make sense time-wise for running full test suites in continuous delivery/deployment.

Alternative #4: A Micro-lab for Every Developer

Now that we have a repeatable model for running Appium tests, we can scale that out to our team. Since running emulators on commodity hardware and open source software is relatively cheap, we can afford a “micro-lab” for each developer making code changes on our mobile app. The “lab” now looks something like this:

As someone who has worked in the testing and “lab as a service” industries, there are definitely situations where some teams and organizations outgrow the “local lab” approach. Your IT/ops team might just not want to deal with per-developer hardware sprawl. You may not want to dedicate team members to be the maintainers of container/process configuration. And, while Appium is a fantastic technology, like any OSS project it often falls behind in supporting the latest devices and hardware-specific capabilities. Fingerprint support is a good example of this.

The Real Solution: Right { People, Process, Technology }

My opinion is that you should hire smart people (not one person) with a bit of grit and courage that “own” the process. When life (I mean Apple and Google) throw you curveballs, you need people who can quickly recover. If you’re paying for a service to help with some part of your process as a purely economic trade-off, do the math. If it works out, great! But this is also an example of “owning” your process.

Final thought: as more and more of your process becomes code, remember that code is a liability, not an asset. The less of if, the more lean your approach, generally the better.

More reading:

Automating the Quality of Your Digital Front Door

Mobile is the front door to your business for most / all of your users. But how often do you use your front door, a few times a day? How often do your users use your app? How often would you like them to? It’s really a high-traffic front door between people and you.

This is how you welcome people into what you’re doing. If it’s broken, people don’t feel welcome.

[7/27/2017: For my presentation at Mobile Tea Boston, my slides and code samples are below]

 

Slides with notes: http://bit.ly/2tgGiGr
Git example: https://github.com/paulsbruce/FingerprintDemo

The Dangers of Changing Your Digital Front Door

In his book “On Intelligence”, Hawkins describes how quickly our human brains pick up on minute changes with the analogy of someone replacing the handle on your front door with a knob while you’re out. When you get back, things will seem very weird. You feel disoriented, alienated. Not emotions we want to invoke in our users.

Now consider what it’s like for your users to have you changing things on their high-traffic door to you. Change is good, but only good changes. And when changes introduce problems, forget sympathy, forget forgiveness, people revolt.

What Could Possibly Go Wrong?

A lot. Even for teams that are great at what they do, delivering a mobile app is fraught with challenges that lead to:

  • Lack of strategy around branching, merging, and pushing to production
  • Lack of understanding about dependencies, impacts of changes
  • Lack of automated testing, integration woes, no performance/scalability baselines, security holes
  • Lack of communication between teams (Front-end, API, business)
  • Lack of planning at the business level (marketing blasts, promotions, advertising)

Users don’t care about our excuses. A survey by Perfecto found that more than 44% of defects in mobile apps are found by users. User frustrations aren’t just about what you designed, they are about how they behave in the real world too. Apps that are too slow will be treated as broken apps and uninstalled just the same.

What do we do about it?

We test, but testing is a practice unto itself. There are many test types and methodologies like TDD, ATDD, and BDD that drive us to test. Not everyone is cut out to be a great tester, especially when developers are driven to write only things that works, and not test for when it shouldn’t (i.e. lack of negative testing).

Allistar Scott – Test ‘Ice Cream Cone’

In many cases, automation gaps and issues make it easier for development teams to fall back to manual testing. This is what Allistar Scott (of Ruby Waitr) calls the anti-pattern ‘ice cream cone’, an inversion of the ideal test pyramid, and Mike Cohen has good thoughts on this paradigm too.

To avoid this downward spiral, we need to prioritize automation AND which tests we chose to automate. Testing along architecturally significant boundaries, as Kevin Henney puts it, is good; but in a world full of both software and hardware, we need to broaden that idea to ‘technologically significant boundaries‘. The camera, GPS, biometric, and other peripheral interfaces on your phone are a significant boundary…fault lines of the user experience.

Many development teams have learned the hard way that not including real devices in automated testing leaves these UX fault lines at risk of escaping defects. People in the real world use real devices on real networks under real usage conditions, and our testing strategy should reflect this reality too.

The whole point of all this testing is to maintain confidence in our release readiness. We want to be in an ‘always green’ state, and there’s no way to do this without automated, continuous testing.

Your Code Delivery Pipeline to the Rescue!

Confidence comes in two flavors: quality and agility. Specifically, does the code we write do what we intend, and can we iterate and measure quickly?

Each team comes with their own definition of done, their own acceptable levels of coverage, and their own level of confidence over the what it takes to ship, but answering both of these questions definitively requires adequate testing and a reliable pipeline for our code.

Therein lies the dynamic tension between agility (nimbleness) and the messy world of reality. What’s the point of pushing out something that doesn’t match the needs of reality? So we try to pull reality in little bits at a time, but reality can be slow. Executing UI tests takes time. So we need to code and test in parallel, automate as much as possible, and be aware of the impact of changes on release confidence.

The way we manage this tension is to push smaller batches more frequently through the pipeline, bring the pain forward, in other words continuous delivery and deployment. Far away from monolithically, we shrink the whole process to an individual contributor level. Always green at the developer level…merge only code that has been tested automatically, thoroughly.

Even in a Perfect World, Your Front Door Still Jams

So automation is crucial to this whole thing working. But what happens when we can’t automate something? This is often why the “ice cream cone” exists.

Let’s walk through it together. Google I/O or WWDC drops new hardware or platform capabilities on us. There’s a rush to integrate, but a delay in tooling and support gums up development all the way through production troubleshooting. We mock what we have to, but fall back to manual testing.

This not only takes our time, it robs us of velocity and any chance to reach that “always green” aspiration.

The worst part is that we don’t even have to introduce new functionality to fall prey to this problem. Appium was stuck behind lack of iOS 10 support for months, which means most companies had no automated way to validate on a platform that was out already.

And if anything, history teaches us that technology advances whether the last thing is well-enough baked or not. We are still dealing with camera (i.e. driver stack) flakiness! Fingerprint isn’t as unreliable, but still part of the UI/UX. And many of us now face an IoT landscape with very few standards that developers follow.

So when faced with architectural boundaries that have unpolished surfaces, what do we do? Mocks…good enough for early integration, but who will stand up and say testing against mocks is good enough to go to production?

IoT Testing Provides Clues to How We Can Proceed

In many cases, introducing IoT devices into the user experience means adding architecturally significant boundaries. Standards like BLE, MQTT, CoAP and HTTP provide flexibility to virtualize much of the interactions across these boundaries.

In the case of Continuous Glucose Monitoring (CGM) vendors, their hardware and mobile app dev cycles are on very different cycles. But to integrate often, they virtualize BLE signals to real devices in the cloud as part of their mobile app test scripts. They also adopt “IoT ninjas” as part of the experience team, hardware/firmware engineers that are in change of prototyping changes on the device side, to make sure that development and testing on the mobile app side is as enabled as possible.

Adding IoT to the mix will change your pyramid structure, adding pressure to rely on standards/interfaces as well as manual testing time for E2E scenarios.

[For more on IoT Testing, see my deck from Mobile/IoT Dev+Test 2017 here]

Automated Testing Requires Standard Interfaces

There are plenty of smart people looking to solve the busy-work problem with writing tests. Facebook Infer, Appdiff, Functionalize, and MABL are just a few of the new technologies that integrate machine learning and AI to reduce time-spend on testing busy-work.

But any and all programmatic approach, even AI, requires standard interfaces; in our case, universally accepted development AND testing frameworks and technologies.

Tool ecosystems don’t get built without foundational standards, like HTML/CSS/JS, Android, Java, and Swift. And when they want to innovate on hardware or platform, there will always be some gaps, usually in automation around the new stuff.

Example Automation Gap: Fingerprint Security

Unfortunately for those of us who see the advantages of integrating with innovative platform capabilities like biometric fingerprint authentication, automated testing support is scarce.

What this means is that we either don’t test certain critical workflows in our app, or we manually test them. What a bummer to velocity.

The solution is to have people who know how to implement multiple test frameworks and tools in a way that matches the velocity requirements of development.

For more information in this, see my deep-dive on how to use Appium in Android development to simulate fingerprint activities in automated tests. It’s entirely possible, but requires experience and a planning over how to integrate a mobile lab into your continuous integration pipeline.

 

Tailoring Fast Feedback to Resources (and vice versa)

As you incrementally introduce reality into every build, you’ll run into two problems: execution speed and device pool limits.

To solve the execution speed, most development teams parallelize their testing against multiple devices at once, and split up their testing strategy to different schedules. This is just an example of a schedule against various testing types.

For more on this, I published a series of whitepapers on how to do this.

TL;DR recap

Automating the quality of our web and mobile apps keeps us accurate, safe, and confident; but isn’t easy. Fortunately we have many tools and a lot of thought put in already to how to do this. Notwithstanding ignorance of some individuals, automation continues to change the job landscape over and over again. 

Testing always takes tailoring to the needs of the development process to provide fast feedback. The same is true in reverse: developers need to understand where support gaps exist in test frameworks and tooling, otherwise they risk running the “ship” aground.

This is why, and my mantra remains, it is imperative to velocity to have the right people in the planning room when designing new features and integrating capabilities across significant technological boundaries.

Similarly, in my research on developer efficiency, we see that there is a correlation between increased coverage over non-functional criteria on features and test coverage. Greater completeness in upfront planning saves time and effort, it’s just that simple.

Just like Conway’s “law”, the result of your team, it’s structure, communication patterns, functions and dysfunctions, all show up in the final product. Have the right people in the room when planning new features, retros, and determining your own definition of done. Otherwise you end up with more gaps than simply in automation.

Meta / cliff notes:

  • “Everyone owns quality” means that the whole team needs to be involved in testing strategy
    • To what degree are various levels of testing included in Definition of Done?
    • Which test sets (i.e. feedback loops) provide the most value?
    • How are various tests triggered, considering their execution speed?
    • Who’s responsible for creating which types of tests?
    • How are team members enabled to interpret and use test result data?
    • When defects do escape certain stages, how is RCA used to close the gap?
    • Who manages/fixes the test execution framework and infrastructure?
    • Does the benefits of the current approach to testing outweigh the cost?
  • Multiple testing framework / tool / platform is 200 OK
    • We already use separate frameworks for separate test types
      • jUnit/TestNG (Java) for unit (and some integration) testing
      • Chakram/Citrus/Postman/RestAssured for API testing
      • Selenium, Appium, Espresso, XCTest for UI testing
      • jMeter, Dredd, Gatling, Siege for performance testing
    • Tool sprawl can be a challenge, but proper coverage requires plurality
    • Don’t overtax one framework or tool to do a job it can’t, just find a better fit
  • Incremental doses of reality across architecturally significant boundaries
    • We need reality (real devices, browsers, environments) to spot fragility in our code and our architecture
    • Issues tend to clump around architecturally significant boundaries, like API calls, hardware interfaces, and integrations to monolithic components
    • We stub/mock/virtualize to speed development; signs of “significant” boundaries, but it only tells us what happens in isolation
    • A reliable code pipeline can do the automated testing for you, but you still need to tell it what and when to test; have a test execution strategy that considers:
      • testing types (unit, component, API, integration, functional, performance, installation, security, acceptance/E2E, …)
      • execution speed (<2m, <20m, <2h, etc) vs. demand for fast feedback
      • portions of code that are known-fragile
      • various critical-paths: login, checkout, administrative tasks, etc.
    • Annotations denote tests that relate across frameworks and tools
      • @Signup, @Login, @SearchForProduct, @V2Deploy
      • Tag project-based work (like bug fixes) like: JIRA-4522
  • Have the right people in the room when planning features
    • Future blockers like test framework support for new hardware capabilities will limit velocity, so have test engineers in the planning phases
    • Close the gap between what was designed vs. what is feasible to implement by having designers and developers prototype together
    • Including infrastructure/operations engineers in planning reduces later scalability issues; just like testers, this can be a blocker to release readiness
    • Someone, if not all the people above, should represent the user’s voice

More reading:

What Can You Do Without an Office?

4
Meme == fun! Make one yourself! Tag #SummerOfSelenium

I rarely go to the beach. When I do, I like to do what I like to do: surprise, tech stuff. We all relax in different ways, no judgement here. We also all need to work flexibly, so I’m also busy with a new co-working space for my locals.

In back and forth with a colleague of mine, we started to talk about strange places where we find ourselves writing Selenium scripts. All sorts of weird things have been happening in mobile this summer, Pokemon Go (which I call pokenomics) for example, and Eran was busy creating a cross-platform test script for his upcoming webinar that tests their download and installation process.

At work, we’re doing this #SummerOfSelenium thing, and I thought that it would be cool to start a meme competition themed around summer-time automated testing from strange places. Fun times, or at least a distraction from the 7th circle of XPath hell that we’re still regularly subjected to via test frameworks.

If you want to build your own meme, use the following links…

940x450_phone-computer-on-beach a.baa-With-computer-on-the-beach businessman-surfboard-9452686

Reply to my tweet with your image and I’ll figure out a way to get our marketing team to send you some schwag. ;D

Mine was:

17zb39

Then Eran responded with:

Capture

We think we’re funny at least.


Side-note: co-working spaces are really important. As higher-paid urban jobs overtake local employment in technical careers, we need to respond to the demand for work-life balance and encourage businesses to do the same. Co-working spaces create economic stickiness and foster creativity through social engagement. My thoughts + a local survey are here, in case you want to learn more. A research area of mine and one I’ll be speaking on in the next year.

Gluecon 2016: Melody Meckfessel on Serverless Technology

I had the opportunity to attend Melody Meckfessel’s presentation at Gluecon 2016 last week. As someone with years of experience at Google, when she speak, smart people listen.

[paraphrased from notes]

We should all be optimistic about open source and serverless technologies. Kubernetes, an example of how to evolve your service, a means to an end, housing processes internally, making management easier for developers to deploy and scale horizontally, to update gracefully without causing downtime. One of the themes in the future of cloud-based computing is that is increasingly open source which makes it easier for developers to influence and contribute the software stack as it’s evolving. Our focus is switching to managing applications and not machines.

“…the future of cloud-based computing…is increasingly open source which makes it easier for developers to influence and contribute the software stack as it’s evolving.”


(photo credit: @rockchick322004)

Serverless means caring about the code, embedded in all cloud platforms. In App Engine, you run code snippets and only get charged for your app or service as it scales up or down. With container engine, you run containers but don’t specify the machine it runs on. In the future, we’re not going to be thinking or talking about VMs. What does it mean for us developers? It will enable us to compose and create our software dynamically. As one of the most creative and competitive markets, software puts us in the same room and presents us with the same challenge: how to make things better, faster.

Melody tells a story about a project she was working on where they were using a new piece of hardware. They were having trouble scaling during peak load, holiday traffic being really important to their particular customer. They were constantly scrambling and for the first 6-9 months, she spent her time on primarily DevOps related work. This was frustrating because she wanted to work on features, seeing the potential of what she could do, and she could never quite get to that.

As developers, we are constantly raising the bar on the tools we use to release, and correspondingly our users’ expectations increase. Also as developers, we have the ability to create magic for users, predicting what they want, putting things in context, and launching the next great engaging thing. In a serverless future, developers will be focused on code, which means that PaaS (platform as a service) will have to evolve to integrate the components it’s not doing today. To do that, developers will need to shift Ops work from themselves more to cloud providers, being able to pick and choose different providers. Developers come to a platform with a specific language that they’re productive in, so this is a place where PaaS providers will have to support a variety of runtimes and interpreters. App Engine is the hint and glimmer of creating an entirely new developer experience.

“In a serverless future, developers will be focused on code…will need to shift Ops work from themselves more to cloud providers.”

What does a serverless future mean for the developer experience?

We will spend more of our time coding and have more and more choice. Right now, we’re in a world where we still have to deal with hardware, but as that changes, we’ll spend more of our time building value in our products and services and leave the infrastructure costs to providers and the business. If we want to do that, developers need to make it very easy to integrate with machine learning frameworks and provide analytics. Again, to do this, we need to free up our time: hence NoOps. If we’re spending all out time on ops, we’re not going to get to a world where we can customize and tailor our applications for users in the ways they want and expect. This is why NoOps is going to continue to go mainstream.

If we’re spending all out time on ops, we’re not going to get to a world where we can customize and tailor our applications for users in the ways they want and expect.

Think about if you’re a startup. If you’re writing code snippets, you may not need to think about storage. You don’t need to worry about security because the cloud providers will take care of that for you. Some wifi and laptops, and that’s your infrastructure. You’ll have machine-oriented frameworks to allow that app to be really amazing. This idea that you can go from prototype to production-ready to then scale it to the whole planet. Examples of really disruptive technology because they were enabled to do so by cloud resources.

To get there, we’re going to have to automate away the configuration and management and the overhead. We still have to do this automation. We have work to do there.

Multi-cloud in NoOps is a reality.

People are using mixed and multiple levels of on-prem IT and hybrid cloud; the problem is that the operational tools are siloed in this model though. This makes it very hard to operate these services. As developers, we should expect unifying management…and we should get it. Kubernetes is an example, just another way for teams to deploy faster. Interestingly, we’re seeing enterprises use Docker and Kubernetes on-prem; the effect of this is that it will make their apps and services cloud-ready. When they are faced with having to migrate to the cloud, it will be easy to do it.

“…we’re seeing enterprises use Docker and Kubernetes on-prem…it will make their apps and services cloud-ready”

Once you’re deployed like this though across multiple clouds and service providers, you’ll need to have an easy way to collect logs and perform analysis; StackDriver is an example of this. It’s this single-pane-of-glass view that enables developers to find and fix issues fast, to see what’s happening with workloads, and to monitor more effectively. Debugging and diagnosing isn’t going to go away, but with better services like Spinnaker and App Engine, we’ll be able to manage it more effectively.

Spinnaker providers continuous delivery functionality, tractability throughout our deployment process, it’s open source. As a collaboration with Netflix, Melody’s team worked hard on this. It was a case of being more open about the software stack we’re using and multiple companies coming together. We don’t all need to go build our own thing, especially when we share so many common problems.

Debugging and diagnosis

The problem is that there’s all this information across all these sources and it’s hard to make sense of it. It’s usually in a time-critical situation and we have to push out a quick fix as soon as we can. Part of streamlining that developer and debugging experience in the future cloud is having the tools at our disposal. These are things like being able to trace through a complex application for a performance issue, going back through the last 3 releases and see where the slow-down started, or production debuggers where you can inspect your service that’s running in the cloud and go to any point in the code to look through variable and see what’s happening. Error reporting as well needs to be easier, as errors can be spread across multiple logs, it’s hard to see the impact of the errors you’re seeing. Errors aren’t going away, so we need to be able to handle them effectively. We want to reduce the effort it takes to resolve these errors, we want to speed up the iteration cycles for finding and fixing the errors, and provide better system transparency.

“The faster we can bring the insight from these motions back in to our source code and hence optimize, those are all benefits that we are passing through to users, making us all happier.”

In the NoOps, serverless future, not everything will be open source, but we will expect that 3rd party providers should work well with all the cloud platforms. Meledy’s team is currently working on an open source version of their build system called Basil. Right after their work on Spinnaker in collaboration with Netflix, they’re looking for other opportunities to work with others on open source tools.

What does the world look like 5-10 years from now?

We’re not talking about VMs anymore. We’re benefiting from accelerated hardware development. We’re seeing integration of data into apps and compute so that we can have better insight into how to make applications better. It’s easier for machine learning to be embedded in apps so that developers can create that magical software that users expect. We don’t have to talk about DevOps anymore. We will have more time to innovate. In an open cloud, you have more opportunity to influence and contribute to how things evolve. Like thinking back on how there used to be pay phones and there aren’t anymore, the future of development is wide open, no one knows what it will look like. But contributions towards the future cloud are a big part of that future and everyone’s invited.

More resources:

How do you test the Internet of Things?

If we think traditional software is hard, just wait until all the ugly details of the physical world start to pollute our perfect digital platforms.

What is the IoT?

The Internet of Things (IoT) is a global network of digital devices that exchange data with each other and cloud systems. I’m not Wikipedia, and I’m not a history book, so I’ll just skip past some things in this definitions section.

Where is the IoT?

It’s everywhere, not just in high-tech houses. Internet providers handing out new cable modems that cat as their own WiFi is just a new “backbone” for these devices to connect in over, in almost every urban neighborhood now.

Enter the Mind of an IoT Tester

How far back should we go? How long do you have? I’ll keep it short: the simpler the system, the less there is to test. Now ponder the staggering complexity of the low-cost Raspberry Pi. Multiplied by the number of humans on Earth that like to tinker, educated or no, throw in some APIs and ubiquitous internet access for fun, and now we have a landscape, a view of the magnitude of possibility that the IoT represents. It’s a huge amount of worry for me personally.

Compositionality as a Design Constraint

Good designers will often put constraints in their own way purposely to act as a sort of scaffolding for their traversal of a problem space. Only three colors, no synthetic materials, exactly 12 kilos, can I use it without tutorials, less materials. Sometimes the unyielding makes you yield in places you wouldn’t otherwise, flex muscles you normally don’t, reach farther.

IoT puts compositionality right up in our faces, just like APIs, but with hardware and in ways that both very physical and personal. It forces us to consider how things will be combined in the wild. For testing, this is the nightmare scenario.

Dr. Strangetest, or How I Learned to Stop Worrying and Accept the IoT

The only way out of this conundrum is in the design. You need to design things to very discrete specifications and target very focused scenarios. It moves the matter of quality up a bit into the space of orchestration testing, which by definition is scenario based. Lots of little things are easy to prove working independent of each other, but once you do that, the next challenges lie in the realm of how you intend to use it. Therein lies both the known and unknown, the business cases and the business risks.

If you code or build, find someone else to test it too

As a developer, I can always pick up a device I just flashed with my new code, try it out, and prove that it works. Sort of. It sounds quick, but rarely is. There’s lots of plugging and unplugging, uploading, waiting, debugging, and fiddling with things to get them to just work. I get sick of it all; I just want things to work. And when they finally *do* work, I move on quickly.

If I’m the one building something to work a certain way, I have a sort of programming myopia, where I only always want it to work. Confirmation bias.

What do experts say?

I’m re-reading Code Complete by Steve McConnell, written more than 20 years ago now, eons in the digital age. Section 22.1:

“Testing requires you to assume that you’ll find errors in your code. If you assume you won’t, you probably won’t.”

“You must hope to find errors in your code. Such hope might feel like an unnatural act, but you should hope that it’s you who find the errors and not someone else.”

True that, for code, for IoT devices, and for life.

[Talk] API Strategy: The Next Generation

I took the mic at APIStrat Austin 2015 last week.

A few weeks back, Kin Lane (sup) emailed and asked if I could fill in a spot, talk about something that was not all corporate slides. After being declined two weeks before that and practically interrogating Mark Boyd when he graciously called me to tell me that my talk wasn’t accepted, I was like “haal no!” (in my head) as I wrote back “haal yes” because duh.

I don’t really know if it was apparent during, but I didn’t practice. Last year at APIStrat Chicago, I practiced my 15 minute talk for about three weeks before. At APIdays Mediterranea in May I used a fallback notebook and someone tweeted that using notes is bullshit. Touché, though some of us keep our instincts in check with self-deprecation and self-doubt. Point taken: don’t open your mouth unless you know something deep enough where you absolutely must share it.

I don’t use notes anymore. I live what I talk about. I talk about what I live. APIs.

I live with two crazy people and a superhuman. It’s kind of weird. My children are young and creative, my wife and I do whatever we can to feed them. So when some asshole single developer tries to tell me that they know more about how to build something amazing with their bare hands, I’m like “psh, please, do have kids?” (again, in my head).

Children are literally the only way our race carries on. You want to tell me how to carry on about APIs, let me see how much brain-power for API design nuance you have left after a toddler carries on in your left ear for over an hour.

My life is basically APIs + Kids + Philanthropy + Sleep.

That’s where my talk at APIstrat came from. Me. For those who don’t follow, imagine that you’ve committed to a long-term project for how to make everyone’s life a little easier by contributing good people to the world, people with hearts and minds at least slightly better than your own. Hi.

It was a testing and monitoring track, so for people coming to see bullet lists of the latest ways to ignore important characteristics and system behaviors that only come from working closely with a distributed system, it may have been disappointing. But based on the number of conversation afterwards, I don’t think that’s what happened for most of the audience. My message was:

Metrics <= implementation <= design <= team <= people

If you don’t get people right, you’re doomed to deal with overly complicated metrics from dysfunctional systems born of hasty design by scattered teams of ineffective people.

My one piece of advice: consider that each person you work with when designing things was also once a child, and like you, has developed their own form of learning. Learn from them, and they will learn from you.

 

Automating the Self: Social Media

I’m taking on the task of building an automation system for some of my online social engagement. Since I am not such a Very Important Person (yet :), the absolute worst that can happen is that one of my friend/followers shares something racist or sexist or *-ist that I wouldn’t otherwise agree with. Bad, but I can at least un-share or reply with an “I’m sorry folks, my robot and I need to talk” statement. But this leads to an interesting question:

What does it mean to imbue responsibility over my online persona to a digital system?

It’s not really that bizarre of a question to ask. We already grant immense amounts of control over our online profiles to the social primaries (i.e. Facebook, Twitter, Google+). For most people, any trending app that wants access to “post to your timeline” is enough of a reason to grant full access to activities on behalf of your profile, though it shouldn’t. Every time you want to play Candy Crush or Farmville, you are telling King and Zynga that it’s okay for them to say whatever they want as if they were you to people in your network.

The more of a public figure you are, the more your risk goes up. Consider that Zynga is not at all incentivized to post bad or politically incorrect content to your network on your behalf. That’s not the problem. The problem is when (not if) the company behind a game gets hacked, as did Zynga in 2011. It happens all the time. It’s probably happened to you, and you stand to lose more than just face.

So what is the first thing to get right about automating social media?

Trust and security are the first priorities, even before defining how the system works. Automation rules are great except for when the activities they’re automating do not follow the rules of trust and responsibility that a human would catch in a heartbeat. There is no point to automation if it’s not working properly. And there’s no point in automation of social media if it’s not trustworthy.

For me at least in the initial phases of planning out what this system would look like, trust (not just “security”) will be a theme in all areas of design. It will be a question I ask early and often with every algorithm I design and every line of code I write. Speaking of algorithms, an early example of these rules go something like this (pseudo-code):