A Jenkins Pipeline for Mobile UI Testing with Appium and Docker

In theory, a completely Docker-ized version of an Appium mobile UI test stack sounds great. In practice, however, it’s not that simple. This article explains how to structure a mobile app pipeline using Jenkins, Docker, and Appium.

TL;DR: The Goal Is Fast Feedback on Code Changes

When we make changes, even small ones, to our codebase, we want to prove that they had no negative impact on the user experience. How do we do this? We test…but manual testing is takes time and is error prone, so we write automated unit and functional tests that run quickly and consistently. Duh.

As Uncle Bob Martin puts it, responsible developers not only write code that works, they provide proof that their code works. Automated tests FTW, right?

Not quite. There are a number of challenges with test automation that raise the bar on complexity to successfully getting tests to provide us this feedback. For example:

  • How much of the code and it’s branches actually get covered by our tests?
  • How often do tests fail for reasons that aren’t because the code isn’t working?
  • How accurate was our implementation of the test case and criteria as code?
  • Which tests do we absolutely need to run, and which can we skip?
  • How fast can and must these tests run to meet our development cadence?

Jenkins Pipeline to the Rescue…Not So Fast!

Once we identify what kind of feedback we need and match that to our development cadence, it’s time to start writing tests, yes? Well, that’s only part of the process. We still need a reliable way to build/test/package our apps. The more automated this can be, the faster we can get the feedback. A pipeline view of the process begins with code changes, includes building, testing, and packaging the app so we always have a ‘green’ version of our app.

Many teams chose a code-over-configuration approach. The app is code, the tests are code, server setup (via Puppet/Chef and Docker) is code, and not surprisingly, our delivery process is now code too. Everything is code, which lets us extend SCM virtues (versioning, auditing, safe merging, rollback, etc.) to our entire software lifecycle.

Below is an example of ‘process-as-code’ is Jenkins Pipeline script. When a build project is triggered, say when someone pushes code to the repo, Jenkins will execute this script, usually on a build agent. The code gets pulled, the project dependencies get refreshed, a debug version of the app and tests are build, then the unit and UI tests run.

Notice that last step? The ‘Instrumented Tests’ stage is where we run our UI tests, in this case our Espresso test suite using an Android emulator. The sharp spike in code complexity, notwithstanding my own capabilities, reflects reality. I’ve seen a lot of real-world build/test scripts which also reflect the amount of hacks and tweaks that begin to gather around the technologically significant boundary of real sessions and device hardware.

A great walkthrough on how to set up a Jenkinsfile to do some of the nasty business of managing emulator lifecycles can be found on Philosophical Hacker…you know, for light reading on the weekend.

Building a Homegrown UI Test Stack: Virtual Insanity

We have lots of great technologies at our disposal. In theory, we could use Docker, the Android SDK, Espresso, and Appium to build reusable, dynamic nodes that can build, test, and package our app dynamically.

Unfortunately, in practice, the user interface portion of our app requires hardware resources that simply can’t be executed in a timely manner in this stack. Interactive user sessions are a lot of overhead, even virtualized, and virtualization is never perfect.

Docker runs under either a hyperkit (lightweight virtualization layer on Mac) or within a VirtualBox host, but neither of these solutions support nested virtualization and neither can pass raw access to the host machine’s VTX instruction set through to containers.

What’s left for containers is a virtualized CPU that doesn’t support the basic specs that the Android emulator needs to use host GPU, requiring us to run ‘qemu’ and ARM images instead of native x86/64 AVD-based images. This makes timely spin-up and execution of Appium tests so slow that it renders the solution infeasible.

Alternative #1: Containerized Appium w/ Connection to ADB Device Host

Since we can’t feasibly keep emulation in the same container as the Jenkins build node, we need to split out the emulators to host-level hardware assisted virtualization. This approach also has the added benefit of reducing the dependencies and compound issues that can occur in a single container running the whole stack, making process issues easier to pinpoint if/when they arise.

So what we’ve done is decoupled our “test lab” components from our Jenkins build node into a hardware+software stack that can be “easily” replicated:

Unfortunately, we can no longer keep our Appium server in a Docker container (which would make the process reliable, consistent across the team, and minimize cowboy configuration issues). Even after you:

  • Run the appium container in priviledged mode
  • Mount volumes to pass build artifacts around
  • Establish an SSH tunnel from container to host to use host ADB devices
  • Establish a reverse SSH tunnel from host to container to connect to Appium
  • Manage and exchange keys for SSH and Appium credentials

…you still end up dealing with flaky container-to-host connectivity and bizarre Appium errors that don’t occur if you simply run Appium server on bare metal. Reliable infrastructure is a hard requirement, and the more complexity we add to the stack, the more (often) things go sideways. Sad but true.

Alternative #2: Cloud-based Lab as a Service

Another alternative is to simply use a cloud-based testing service. This typically involves adding credentials and API keys to your scripts, and paying for reserved devices up-front, which can get costly. What you get is hassle-free, somewhat constrained real devices that can be easily scaled as your development process evolves. Just keep in mind, aside from credentials, you want to carefully managed how much of your test code integrates custom commands and service calls that can’t easily be ported over to another provider later.

Alternative #3: Keep UI Testing on a Development Workstation

Finally, we could technically run all our tests on our development machine, or get someone else to run them, right? But this wouldn’t really translate to a CI environment and doesn’t take full advantage of the speed benefits of automation, neither of which help is parallelize coding and testing activities. Testing on local workstations is important before checking in new tests to prove that they work reliably, but doesn’t make sense time-wise for running full test suites in continuous delivery/deployment.

Alternative #4: A Micro-lab for Every Developer

Now that we have a repeatable model for running Appium tests, we can scale that out to our team. Since running emulators on commodity hardware and open source software is relatively cheap, we can afford a “micro-lab” for each developer making code changes on our mobile app. The “lab” now looks something like this:

As someone who has worked in the testing and “lab as a service” industries, there are definitely situations where some teams and organizations outgrow the “local lab” approach. Your IT/ops team might just not want to deal with per-developer hardware sprawl. You may not want to dedicate team members to be the maintainers of container/process configuration. And, while Appium is a fantastic technology, like any OSS project it often falls behind in supporting the latest devices and hardware-specific capabilities. Fingerprint support is a good example of this.

The Real Solution: Right { People, Process, Technology }

My opinion is that you should hire smart people (not one person) with a bit of grit and courage that “own” the process. When life (I mean Apple and Google) throw you curveballs, you need people who can quickly recover. If you’re paying for a service to help with some part of your process as a purely economic trade-off, do the math. If it works out, great! But this is also an example of “owning” your process.

Final thought: as more and more of your process becomes code, remember that code is a liability, not an asset. The less of if, the more lean your approach, generally the better.

More reading:

Using Appium to Test Fingerprint Authentication on Android Devices

In this article, I’ll show how you can use Appium to automate fingerprint authentication on Android mobile devices. The general process also applies to iOS, though specific implementation is not discussed here.

This is based on work I did in preparation for presenting at Mobile Tea Boston in June 2017. This example is just a small part of a broader conversation on automating quality across the delivery pipeline.

Git example: https://github.com/paulsbruce/FingerprintDemo

Fingerprint Security: Great for UX

First question I asked was “why would we integrate fingerprint login functionality into our apps?” The short answer is “high security, low friction“. There are compelling use cases for fingerprint authentication.

Paswordless systems usually require people to use SMS or email to confirm login…high friction IMO to the user experience, but who wants their user to leave their UX purposely? This is better security at the cost of poor workflow.

Multi-factor authentication is another good user case. Using biometric ensures that the unique identity of the individual is presented along with additional credentials.

Step-up authentication is another popular method of keeping the run-rate user experience frictionless, yet increasing protection over sensitive information and operations on a user’s account.

Fingerprint Security: Bad for Development Velocity

So for teams who want to implement fingerprint authentication into their mobile apps, this also means they need to automate tests that integrate fingerprint security. What does the test automation process look like?

In short, it’s a mess. Android libraries and the default UI test framework Espresso contain zero support for fingerprint automation. Since October 2015 with the release of Android 6.0 M, Google provides a standard API for integrating these features into mobile app code, but no way of automating it.

The same is true for Touch ID on iOS, though there are interactive ways to simulate fingerprint events when running XCTest suites in XCode, there is no easy way of writing an automated test that can provide coverage over these workflows.

Without some other automation alternative, these portions of functionality fall prey to the ice-cream cone anti-pattern. What a pity.

Solution: Find the Right Framework

Espresso is fast because it runs directly alongside the main app code on the device. However, since the only way Google provided us to simulate fingerprint events is through ADB (i.e. ‘adb -e emu finger touch …’), this has to be run on the machine where Android tools are installed and where the device is connected.

Appium, an open source outgrowth of Selenium for mobile apps, is architected differently from Espresso and XCTest. Though often slower for this reason, it has some advantages too:

Instead of running directly on the device as a sibling process, Appium tests are executed from a server to which the devices are connected. This provides a context whereby we can inject device-specific commands against the device, in combination with the calls through the testing framework itself, to simulate the entire workflow on the device in one script.

An example of this can be found in my Github FingerprintDemo repo.

Because I want to write all my code and tests in the same IDE, I keep unit tests and Espresso tests as part of the normal conventions in the ‘app’ module, but I create a separate module called ‘appium’ that can be compiled as a separate jar artifact from the main APK. This keeps my Gradle dependencies for testing separate from my app and my build.gradle scripts clean and clear.

In short, it boils down to test code that looks like this:

Appium + fingerprint = depends on your lab

If you manage a very small local lab, you have the liability control to execute whatever custom commands you need to on your devices.

If you’ve graduated to using devices (emulators/simulators/real) in the cloud via some service like Firebase, Perfecto, or TestObject, then your ability to simulate fingerprint events reliably really depends on which one you’re using.

For instance, both Perfecto and TestObject provide SSH direct connections to devices, so in theory you could run custom ADB commands against them; Firebase and AWS Device Farm aren’t even close to having this capability.

In practice, these cloud services also provide automation endpoints and SDKs to execute these tasks reliably. Perfecto, for instance, has both DevTunnel direct access and scripted fingerprint simulation support in Appium.

Treat Code and Tests as Equal Citizens

Everyone should have access to app code AND test code. Period. Some large organizations often fear that this will leak proprietary secrets to offshore and out-of-cycle testing teams. That’s what contracts and proper repository permissions are for.

The benefit for modern teams is that test engineers have better visibility into the app, making test creation faster and initial root cause analysis of defects found faster. In my example, this is what the simplified IDE experience looks like:

Now that we can press the play button on A) our app, B) our unit and Espresso tests, and C) our E2E fingerprint Appium tests, everyone on the team has the option to make sure their changes don’t introduce negative impacts on all aspects of the user experience.

‘Works on My Machine’ Isn’t Good Enough

Test code applies first and foremost to the development experience, but also to the build system later on. In the case in including Appium tests in an Android project, this means we need to be keenly aware of the test infrastructure used to simulate fingerprint actions locally against emulators.

Expect that you will need to “productionize” this process to fit into the build process. By introducing a number of new moving parts (emulators, Appium, custom adb commands) we’ll also need to perpetuate that as a build stack.

I’m a Jenkins nerd, so what this means in terms of build infrastructure is that we need to create build nodes that contain the components necessary to run Appium tests in isolation of other processes as well. Emulators keep the solution device-independent and can simplify the test execution logistics, but only provide a very narrow slice of reality.

To integrate real devices into this mix, you either have to manage a local Appium grid (which again, is challenging) or write your tests to use a cloud lab solution. In the end, you’ll have to parameterize your tests along the following environment variables:

  • Appium server address
    • localhost for development workstations and Appium emulator stack in CI
    • Shared/cloud host for real devices
  • (if emulators)
    • emulator image (i.e. Nexus_6_API_24, etc.)
  • Device capabilities
    • Platform (Android/iOS)
    • Platform version
    • App (binaries) under test
    • (if shared/cloud) credentials or API keys

Recap:

Since there’s no support for fingerprint simulation directly from Espresso, we have to rely on other test frameworks like Appium to cover these use cases. Really, the test architecture needs to fit the use case, and Appium provides us a way to mix test framework calls with native commands to other mobile tools. This requires us to introduce complexity carefully, plan for how that impacts our build-verification testing stack when triggered by continuous integration.

More reading:

Don’t Panic! (or how to prepare for IoT with a mature testing strategy)

Thanks everyone for coming to my talk today! Slides and more links are below.

As all my presentations are, this is meant to extend a dialog to you, so please tweet to me any thoughts and questions you have to become part of the conversation. Looking forward to hearing from you!

More links are in my slides, and the presenter notes are the narrative in case you had to miss part of the presentation.

AnDevCon: Espresso Is Awesome, But Where Are It’s Edges?

For my presentation at AnDevCon SF 2016, I focused on how Espresso represents a fundamental change in how approach the process of shipping software that provably works on a mobile ecosystem that is constantly changing.

The feedback was overwhelmingly good, many people who stopped by the Perfecto booth before or after our talk came to me to discuss topics I raised. In other words, it did what I wanted, which was to provide value and strike up conversation about how to improve the Android UI testing process.

If you’re pressed for time, my slides can be found below or at:
bit.ly/espresso-edges-andevcon

Why responsive web developers must learn Selenium

Developers need fluency in technologies that the team uses for UI testing. Similarly, testing has become more a programmer’s job than a point-n-click task. As the world’s most widely-adopted web UI testing framework, Selenium is a must-know technology for responsive web development teams in order to maintain the pace of continuous delivery.

Mostly Working, Somewhat Broken ‘Builds’

I use the term ‘build’ loosely here when it comes to web apps, since it’s often more like packaging (particularly with Docker containers), but the idea is the same. You have code in a repo, that gets sucked in by your CI system, built/packaged), tested/validated, then deployed somewhere (temp stack or whatnot).

If the build succeeds but tests fail, you need to quickly assess if it’s the test’s fault or the new build. The only way to do that quickly is to maintain proficiency in technologies and practices used to validate the build.

Code in production is where money is made; broken code doesn’t make any money, it just makes people move on. Everyone is responsible for the result.

We have a QA person. That’s their job.

Not so fast. In continuous delivery teams, developers don’t have the luxury of leaving it to someone else. When build verification tests (BVT) fail, a developer needs to track down the source of the problem which typically means parsing the test failure logs, referring to the code and the context/data used for the test, then making some adjustment and re-running the tests.

Though your test engineer can fix the test, you still have to wait for a developer to fix problems in code. Why not make them the same person?

Of course I’m not suggesting that every developer write ALL their own UI/UX tests, there’s a division of labor and skills match decision here which each team needs to make for themselves. I also thing in larger teams it’s important to have separation of concern between those who create and those who validate.

Code in production is where money is made; broken code doesn’t make any money, it just makes people move on. Everyone is responsible for the result.

How can web developers find time for UI testing?

The answer is simple: do less. That’s right, prioritize work. Include a simple test or two in your estimation for a feature. Put another way, do more of the right things.

You may get push-back for this. Explain that you’re purposely improving your ability to turn things around faster and become more transparent about how long something really takes. Product owners and QA will get it.

Now that you have a sliver of breathing room, learn how to write a basic Selenium script over what you just created. Now you have a deliverable that can be included in CI, act as the basis for additional validation like load testing and production monitoring, and you look very good doing it.

You can do it!

Obviously, you can go to the main site, but a quick way for those with Node already installed is through WebDriver:

[full tutorial here]

For those who have never worked with Selenium before, start with this tutorial. Then go through the above.

Even if you aren’t a Node.js fan, it’s an easy to grasp the concepts. If you want to go the Java or C# route, there are plenty of tutorials on that too.

More reading:

 

What Can You Do Without an Office?

4
Meme == fun! Make one yourself! Tag #SummerOfSelenium

I rarely go to the beach. When I do, I like to do what I like to do: surprise, tech stuff. We all relax in different ways, no judgement here. We also all need to work flexibly, so I’m also busy with a new co-working space for my locals.

In back and forth with a colleague of mine, we started to talk about strange places where we find ourselves writing Selenium scripts. All sorts of weird things have been happening in mobile this summer, Pokemon Go (which I call pokenomics) for example, and Eran was busy creating a cross-platform test script for his upcoming webinar that tests their download and installation process.

At work, we’re doing this #SummerOfSelenium thing, and I thought that it would be cool to start a meme competition themed around summer-time automated testing from strange places. Fun times, or at least a distraction from the 7th circle of XPath hell that we’re still regularly subjected to via test frameworks.

If you want to build your own meme, use the following links…

940x450_phone-computer-on-beach a.baa-With-computer-on-the-beach businessman-surfboard-9452686

Reply to my tweet with your image and I’ll figure out a way to get our marketing team to send you some schwag. ;D

Mine was:

17zb39

Then Eran responded with:

Capture

We think we’re funny at least.


Side-note: co-working spaces are really important. As higher-paid urban jobs overtake local employment in technical careers, we need to respond to the demand for work-life balance and encourage businesses to do the same. Co-working spaces create economic stickiness and foster creativity through social engagement. My thoughts + a local survey are here, in case you want to learn more. A research area of mine and one I’ll be speaking on in the next year.

How do you test the Internet of Things?

If we think traditional software is hard, just wait until all the ugly details of the physical world start to pollute our perfect digital platforms.

What is the IoT?

The Internet of Things (IoT) is a global network of digital devices that exchange data with each other and cloud systems. I’m not Wikipedia, and I’m not a history book, so I’ll just skip past some things in this definitions section.

Where is the IoT?

It’s everywhere, not just in high-tech houses. Internet providers handing out new cable modems that cat as their own WiFi is just a new “backbone” for these devices to connect in over, in almost every urban neighborhood now.

Enter the Mind of an IoT Tester

How far back should we go? How long do you have? I’ll keep it short: the simpler the system, the less there is to test. Now ponder the staggering complexity of the low-cost Raspberry Pi. Multiplied by the number of humans on Earth that like to tinker, educated or no, throw in some APIs and ubiquitous internet access for fun, and now we have a landscape, a view of the magnitude of possibility that the IoT represents. It’s a huge amount of worry for me personally.

Compositionality as a Design Constraint

Good designers will often put constraints in their own way purposely to act as a sort of scaffolding for their traversal of a problem space. Only three colors, no synthetic materials, exactly 12 kilos, can I use it without tutorials, less materials. Sometimes the unyielding makes you yield in places you wouldn’t otherwise, flex muscles you normally don’t, reach farther.

IoT puts compositionality right up in our faces, just like APIs, but with hardware and in ways that both very physical and personal. It forces us to consider how things will be combined in the wild. For testing, this is the nightmare scenario.

Dr. Strangetest, or How I Learned to Stop Worrying and Accept the IoT

The only way out of this conundrum is in the design. You need to design things to very discrete specifications and target very focused scenarios. It moves the matter of quality up a bit into the space of orchestration testing, which by definition is scenario based. Lots of little things are easy to prove working independent of each other, but once you do that, the next challenges lie in the realm of how you intend to use it. Therein lies both the known and unknown, the business cases and the business risks.

If you code or build, find someone else to test it too

As a developer, I can always pick up a device I just flashed with my new code, try it out, and prove that it works. Sort of. It sounds quick, but rarely is. There’s lots of plugging and unplugging, uploading, waiting, debugging, and fiddling with things to get them to just work. I get sick of it all; I just want things to work. And when they finally *do* work, I move on quickly.

If I’m the one building something to work a certain way, I have a sort of programming myopia, where I only always want it to work. Confirmation bias.

What do experts say?

I’m re-reading Code Complete by Steve McConnell, written more than 20 years ago now, eons in the digital age. Section 22.1:

“Testing requires you to assume that you’ll find errors in your code. If you assume you won’t, you probably won’t.”

“You must hope to find errors in your code. Such hope might feel like an unnatural act, but you should hope that it’s you who find the errors and not someone else.”

True that, for code, for IoT devices, and for life.

[Talk] API Strategy: The Next Generation

I took the mic at APIStrat Austin 2015 last week.

A few weeks back, Kin Lane (sup) emailed and asked if I could fill in a spot, talk about something that was not all corporate slides. After being declined two weeks before that and practically interrogating Mark Boyd when he graciously called me to tell me that my talk wasn’t accepted, I was like “haal no!” (in my head) as I wrote back “haal yes” because duh.

I don’t really know if it was apparent during, but I didn’t practice. Last year at APIStrat Chicago, I practiced my 15 minute talk for about three weeks before. At APIdays Mediterranea in May I used a fallback notebook and someone tweeted that using notes is bullshit. Touché, though some of us keep our instincts in check with self-deprecation and self-doubt. Point taken: don’t open your mouth unless you know something deep enough where you absolutely must share it.

I don’t use notes anymore. I live what I talk about. I talk about what I live. APIs.

I live with two crazy people and a superhuman. It’s kind of weird. My children are young and creative, my wife and I do whatever we can to feed them. So when some asshole single developer tries to tell me that they know more about how to build something amazing with their bare hands, I’m like “psh, please, do have kids?” (again, in my head).

Children are literally the only way our race carries on. You want to tell me how to carry on about APIs, let me see how much brain-power for API design nuance you have left after a toddler carries on in your left ear for over an hour.

My life is basically APIs + Kids + Philanthropy + Sleep.

That’s where my talk at APIstrat came from. Me. For those who don’t follow, imagine that you’ve committed to a long-term project for how to make everyone’s life a little easier by contributing good people to the world, people with hearts and minds at least slightly better than your own. Hi.

It was a testing and monitoring track, so for people coming to see bullet lists of the latest ways to ignore important characteristics and system behaviors that only come from working closely with a distributed system, it may have been disappointing. But based on the number of conversation afterwards, I don’t think that’s what happened for most of the audience. My message was:

Metrics <= implementation <= design <= team <= people

If you don’t get people right, you’re doomed to deal with overly complicated metrics from dysfunctional systems born of hasty design by scattered teams of ineffective people.

My one piece of advice: consider that each person you work with when designing things was also once a child, and like you, has developed their own form of learning. Learn from them, and they will learn from you.

 

Don’t Insult Technical Professionals

Some vendors look at analyst reports on API testing and all they see is dollar signs. Yes, API testing and virtualization has blown up over the past 5 years, and that’s why some companies who were first to the game have the lead. Lead position comes from sweat and tears, that’s how leaders catch the analysts attention in the first place; those who created the API testing industry, gained the community and analyst attention, and have the most comprehensive products that win. Every time.

There are snakes in the grass no matter what field you’re in

I recently had opportunity to informally socialize with a number of “competitors”, and as people are great people to eat tacos and burn airport wait time with. Unfortunately, their scrappy position in the market pushes them to do things that you can only expect from lawyers and pawn sharks. They say they’re about one thing in person, but their press releases and website copy betray their willingness to lie, cheat, and deceive actual people trying to get real things done.

In other words, some vendors proselytize about “API testing” without solid product to back up their claims.

I don’t like lying, and neither do you

One of my current job responsibilities is to make sure that the story my employer tells around its products accurately portray the capabilities of those products, because if they don’t, real people (i.e. developers, testers, engineers, “implementers”) will find out quickly and not only not become customers, but in the worst cases tell others that the story is not true. Real people doing real things is my litmus test, not analysts, not some theoretical BS meter.

Speaking of BS meter, a somewhat recent report lumped API “testing” with “virtualization” to produce a pie chart that disproportionately compares vendors market share, both by combining these two semi-related topics and by measuring share by revenue reported by the vendors. When analysts ask for things like revenue in a particular field, they generally don’t just leave the answer solely up to the vendor; they do some basic research on their own to prove that the revenue reported is an accurate reflection of the product(s) directly relating to the nature of the report. After pondering this report for months, I’m not entirely sure that the combination of the “testing” and “virtualization” markets is anything but a blatant buy-off by one or two of the vendors involved to fake dominance in both areas where there is none. Money, meet influence.

I can’t prove it, but I can easily prove when you’ve left a rotting fish in the back seat of my car simply by smelling it.

What this means for API testing

It means watch out for BS. Watch really closely. The way that some companies use “API testing” (especially in Google Ads) is unfounded in their actual product capabilities. What they mean by “testing” is not what you know as what’s necessary to ship great software. Every time I see those kinds of vendors say “we do API testing”, which is a insult to actual API testing, I seriously worry that they’re selling developers the illusion of having sufficient testing over their APIs when in reality it’s not even close.

Why your API matters to me

On the off-chance that I actually use it, I want your API to have been tested more than what a developer using a half-ass “testing” tool from a fledgling vendor can cover. I want you to write solid code, prove that it’s solid, and present me with a solid solution to my problem. I also want you to have fun doing that.

The API vendor ecosystem is not what it seems from the outside. If you have questions, I have honesty. You can’t say that about too many other players. Let’s talk if you need an accurate read on an analyst report or vendor statement.