A Jenkins Pipeline for Mobile UI Testing with Appium and Docker

In theory, a completely Docker-ized version of an Appium mobile UI test stack sounds great. In practice, however, it’s not that simple. This article explains how to structure a mobile app pipeline using Jenkins, Docker, and Appium.

TL;DR: The Goal Is Fast Feedback on Code Changes

When we make changes, even small ones, to our codebase, we want to prove that they had no negative impact on the user experience. How do we do this? We test…but manual testing is takes time and is error prone, so we write automated unit and functional tests that run quickly and consistently. Duh.

As Uncle Bob Martin puts it, responsible developers not only write code that works, they provide proof that their code works. Automated tests FTW, right?

Not quite. There are a number of challenges with test automation that raise the bar on complexity to successfully getting tests to provide us this feedback. For example:

  • How much of the code and it’s branches actually get covered by our tests?
  • How often do tests fail for reasons that aren’t because the code isn’t working?
  • How accurate was our implementation of the test case and criteria as code?
  • Which tests do we absolutely need to run, and which can we skip?
  • How fast can and must these tests run to meet our development cadence?

Jenkins Pipeline to the Rescue…Not So Fast!

Once we identify what kind of feedback we need and match that to our development cadence, it’s time to start writing tests, yes? Well, that’s only part of the process. We still need a reliable way to build/test/package our apps. The more automated this can be, the faster we can get the feedback. A pipeline view of the process begins with code changes, includes building, testing, and packaging the app so we always have a ‘green’ version of our app.

Many teams chose a code-over-configuration approach. The app is code, the tests are code, server setup (via Puppet/Chef and Docker) is code, and not surprisingly, our delivery process is now code too. Everything is code, which lets us extend SCM virtues (versioning, auditing, safe merging, rollback, etc.) to our entire software lifecycle.

Below is an example of ‘process-as-code’ is Jenkins Pipeline script. When a build project is triggered, say when someone pushes code to the repo, Jenkins will execute this script, usually on a build agent. The code gets pulled, the project dependencies get refreshed, a debug version of the app and tests are build, then the unit and UI tests run.

Notice that last step? The ‘Instrumented Tests’ stage is where we run our UI tests, in this case our Espresso test suite using an Android emulator. The sharp spike in code complexity, notwithstanding my own capabilities, reflects reality. I’ve seen a lot of real-world build/test scripts which also reflect the amount of hacks and tweaks that begin to gather around the technologically significant boundary of real sessions and device hardware.

A great walkthrough on how to set up a Jenkinsfile to do some of the nasty business of managing emulator lifecycles can be found on Philosophical Hacker…you know, for light reading on the weekend.

Building a Homegrown UI Test Stack: Virtual Insanity

We have lots of great technologies at our disposal. In theory, we could use Docker, the Android SDK, Espresso, and Appium to build reusable, dynamic nodes that can build, test, and package our app dynamically.

Unfortunately, in practice, the user interface portion of our app requires hardware resources that simply can’t be executed in a timely manner in this stack. Interactive user sessions are a lot of overhead, even virtualized, and virtualization is never perfect.

Docker runs under either a hyperkit (lightweight virtualization layer on Mac) or within a VirtualBox host, but neither of these solutions support nested virtualization and neither can pass raw access to the host machine’s VTX instruction set through to containers.

What’s left for containers is a virtualized CPU that doesn’t support the basic specs that the Android emulator needs to use host GPU, requiring us to run ‘qemu’ and ARM images instead of native x86/64 AVD-based images. This makes timely spin-up and execution of Appium tests so slow that it renders the solution infeasible.

Alternative #1: Containerized Appium w/ Connection to ADB Device Host

Since we can’t feasibly keep emulation in the same container as the Jenkins build node, we need to split out the emulators to host-level hardware assisted virtualization. This approach also has the added benefit of reducing the dependencies and compound issues that can occur in a single container running the whole stack, making process issues easier to pinpoint if/when they arise.

So what we’ve done is decoupled our “test lab” components from our Jenkins build node into a hardware+software stack that can be “easily” replicated:

Unfortunately, we can no longer keep our Appium server in a Docker container (which would make the process reliable, consistent across the team, and minimize cowboy configuration issues). Even after you:

  • Run the appium container in priviledged mode
  • Mount volumes to pass build artifacts around
  • Establish an SSH tunnel from container to host to use host ADB devices
  • Establish a reverse SSH tunnel from host to container to connect to Appium
  • Manage and exchange keys for SSH and Appium credentials

…you still end up dealing with flaky container-to-host connectivity and bizarre Appium errors that don’t occur if you simply run Appium server on bare metal. Reliable infrastructure is a hard requirement, and the more complexity we add to the stack, the more (often) things go sideways. Sad but true.

Alternative #2: Cloud-based Lab as a Service

Another alternative is to simply use a cloud-based testing service. This typically involves adding credentials and API keys to your scripts, and paying for reserved devices up-front, which can get costly. What you get is hassle-free, somewhat constrained real devices that can be easily scaled as your development process evolves. Just keep in mind, aside from credentials, you want to carefully managed how much of your test code integrates custom commands and service calls that can’t easily be ported over to another provider later.

Alternative #3: Keep UI Testing on a Development Workstation

Finally, we could technically run all our tests on our development machine, or get someone else to run them, right? But this wouldn’t really translate to a CI environment and doesn’t take full advantage of the speed benefits of automation, neither of which help is parallelize coding and testing activities. Testing on local workstations is important before checking in new tests to prove that they work reliably, but doesn’t make sense time-wise for running full test suites in continuous delivery/deployment.

Alternative #4: A Micro-lab for Every Developer

Now that we have a repeatable model for running Appium tests, we can scale that out to our team. Since running emulators on commodity hardware and open source software is relatively cheap, we can afford a “micro-lab” for each developer making code changes on our mobile app. The “lab” now looks something like this:

As someone who has worked in the testing and “lab as a service” industries, there are definitely situations where some teams and organizations outgrow the “local lab” approach. Your IT/ops team might just not want to deal with per-developer hardware sprawl. You may not want to dedicate team members to be the maintainers of container/process configuration. And, while Appium is a fantastic technology, like any OSS project it often falls behind in supporting the latest devices and hardware-specific capabilities. Fingerprint support is a good example of this.

The Real Solution: Right { People, Process, Technology }

My opinion is that you should hire smart people (not one person) with a bit of grit and courage that “own” the process. When life (I mean Apple and Google) throw you curveballs, you need people who can quickly recover. If you’re paying for a service to help with some part of your process as a purely economic trade-off, do the math. If it works out, great! But this is also an example of “owning” your process.

Final thought: as more and more of your process becomes code, remember that code is a liability, not an asset. The less of if, the more lean your approach, generally the better.

More reading:

Using Appium to Test Fingerprint Authentication on Android Devices

In this article, I’ll show how you can use Appium to automate fingerprint authentication on Android mobile devices. The general process also applies to iOS, though specific implementation is not discussed here.

This is based on work I did in preparation for presenting at Mobile Tea Boston in June 2017. This example is just a small part of a broader conversation on automating quality across the delivery pipeline.

Git example: https://github.com/paulsbruce/FingerprintDemo

Fingerprint Security: Great for UX

First question I asked was “why would we integrate fingerprint login functionality into our apps?” The short answer is “high security, low friction“. There are compelling use cases for fingerprint authentication.

Paswordless systems usually require people to use SMS or email to confirm login…high friction IMO to the user experience, but who wants their user to leave their UX purposely? This is better security at the cost of poor workflow.

Multi-factor authentication is another good user case. Using biometric ensures that the unique identity of the individual is presented along with additional credentials.

Step-up authentication is another popular method of keeping the run-rate user experience frictionless, yet increasing protection over sensitive information and operations on a user’s account.

Fingerprint Security: Bad for Development Velocity

So for teams who want to implement fingerprint authentication into their mobile apps, this also means they need to automate tests that integrate fingerprint security. What does the test automation process look like?

In short, it’s a mess. Android libraries and the default UI test framework Espresso contain zero support for fingerprint automation. Since October 2015 with the release of Android 6.0 M, Google provides a standard API for integrating these features into mobile app code, but no way of automating it.

The same is true for Touch ID on iOS, though there are interactive ways to simulate fingerprint events when running XCTest suites in XCode, there is no easy way of writing an automated test that can provide coverage over these workflows.

Without some other automation alternative, these portions of functionality fall prey to the ice-cream cone anti-pattern. What a pity.

Solution: Find the Right Framework

Espresso is fast because it runs directly alongside the main app code on the device. However, since the only way Google provided us to simulate fingerprint events is through ADB (i.e. ‘adb -e emu finger touch …’), this has to be run on the machine where Android tools are installed and where the device is connected.

Appium, an open source outgrowth of Selenium for mobile apps, is architected differently from Espresso and XCTest. Though often slower for this reason, it has some advantages too:

Instead of running directly on the device as a sibling process, Appium tests are executed from a server to which the devices are connected. This provides a context whereby we can inject device-specific commands against the device, in combination with the calls through the testing framework itself, to simulate the entire workflow on the device in one script.

An example of this can be found in my Github FingerprintDemo repo.

Because I want to write all my code and tests in the same IDE, I keep unit tests and Espresso tests as part of the normal conventions in the ‘app’ module, but I create a separate module called ‘appium’ that can be compiled as a separate jar artifact from the main APK. This keeps my Gradle dependencies for testing separate from my app and my build.gradle scripts clean and clear.

In short, it boils down to test code that looks like this:

Appium + fingerprint = depends on your lab

If you manage a very small local lab, you have the liability control to execute whatever custom commands you need to on your devices.

If you’ve graduated to using devices (emulators/simulators/real) in the cloud via some service like Firebase, Perfecto, or TestObject, then your ability to simulate fingerprint events reliably really depends on which one you’re using.

For instance, both Perfecto and TestObject provide SSH direct connections to devices, so in theory you could run custom ADB commands against them; Firebase and AWS Device Farm aren’t even close to having this capability.

In practice, these cloud services also provide automation endpoints and SDKs to execute these tasks reliably. Perfecto, for instance, has both DevTunnel direct access and scripted fingerprint simulation support in Appium.

Treat Code and Tests as Equal Citizens

Everyone should have access to app code AND test code. Period. Some large organizations often fear that this will leak proprietary secrets to offshore and out-of-cycle testing teams. That’s what contracts and proper repository permissions are for.

The benefit for modern teams is that test engineers have better visibility into the app, making test creation faster and initial root cause analysis of defects found faster. In my example, this is what the simplified IDE experience looks like:

Now that we can press the play button on A) our app, B) our unit and Espresso tests, and C) our E2E fingerprint Appium tests, everyone on the team has the option to make sure their changes don’t introduce negative impacts on all aspects of the user experience.

‘Works on My Machine’ Isn’t Good Enough

Test code applies first and foremost to the development experience, but also to the build system later on. In the case in including Appium tests in an Android project, this means we need to be keenly aware of the test infrastructure used to simulate fingerprint actions locally against emulators.

Expect that you will need to “productionize” this process to fit into the build process. By introducing a number of new moving parts (emulators, Appium, custom adb commands) we’ll also need to perpetuate that as a build stack.

I’m a Jenkins nerd, so what this means in terms of build infrastructure is that we need to create build nodes that contain the components necessary to run Appium tests in isolation of other processes as well. Emulators keep the solution device-independent and can simplify the test execution logistics, but only provide a very narrow slice of reality.

To integrate real devices into this mix, you either have to manage a local Appium grid (which again, is challenging) or write your tests to use a cloud lab solution. In the end, you’ll have to parameterize your tests along the following environment variables:

  • Appium server address
    • localhost for development workstations and Appium emulator stack in CI
    • Shared/cloud host for real devices
  • (if emulators)
    • emulator image (i.e. Nexus_6_API_24, etc.)
  • Device capabilities
    • Platform (Android/iOS)
    • Platform version
    • App (binaries) under test
    • (if shared/cloud) credentials or API keys

Recap:

Since there’s no support for fingerprint simulation directly from Espresso, we have to rely on other test frameworks like Appium to cover these use cases. Really, the test architecture needs to fit the use case, and Appium provides us a way to mix test framework calls with native commands to other mobile tools. This requires us to introduce complexity carefully, plan for how that impacts our build-verification testing stack when triggered by continuous integration.

More reading:

Android Debug Features and Tools chat with Sam Edwards

Developing Android apps can be hard…without the right tools and patterns in your back pocket.

I had a chance to sit with Sam Edwards (@handstandsam) to talk about some of the work that went into his presentation at Droidcon Boston last week. There are a bunch of tools outside in out-of-box Android stack that I wasn’t aware of (see below), but Sam quickly educated us. His full talk will be available in a few weeks once the conference dust settles.

Improving the Espresso Testing Landscape

Afterwards, Sam and I went for a stroll down Tremont and discussed some patterns people were applying to simplify writing Espresso tests.

Shauvik Roy Choudhary who I met at last year’s Capital One Android Summit has produced some great work around Testing Robots (slides and code). This is not to be confused with Jake Wharton’s work on (also named) Testing Robots, equally awesome and worth your time reviewing if you haven’t already. Shauvik also produced Barista, which you can find in the app store.

Related note: in recent StackOverflow work, I found out about another cool project called Barista, a library of wicked helpful functions for Espresso testing by Rafa Vázquez, Roc Boronat, and Sergi Martínez.

Great to see people sharing their experiences and lessons learned so we don’t have to toil on them ourselves.

Android debug and design tools that Sam covered:

 

Diving into React Native on Cloud Devices

I’ve heard that React Native is really cool. I’ve heard it can help to change your delivery, team, and hiring strategy. I’ve also heard it’s toolchain is immature and that it’s not ‘enterprise friendly’.

Extraordinary claims require extraordinary proof, so I decided to put it through the paces I see enterprises require every day. Namely, how do you use a cloud lab in situations where you’re debugging a critical incident found in production?

Since I have a variety of real mobile devices at my disposal in the Perfecto cloud, let’s see how quick it is to connect React Native to one of them!

Side question: what was the last physical mobile device you needed to debug an issue on a specific platform, carrier, or form factor?
Click here to tweet me your answer!

Running React Native Code on a Real Device

Sitting in the back row of a local meetup, I quickly installed the requisites on my MacBook, launched a Perfecto device, and was up and running. Like all bootstrap activities, this was flawless. Then I ran the usual ‘react-native run-android’:

First snag, unlike Android Studio, the magic dust that ships with React to automate the Gradle build and deploy process was lacking the -s command, which of course failed the build process. The maturity of React tooling is a side-topic, but all we need is to amend that parameter with a device serial number.

Listing the devices, we see that my cloud device correctly registers in ADB:

After copying the ADB command and rerunning with the -s argument added, the app ran, but with some debugging connectivity issues.

Debugging React Native on a Different Network

What this message is telling us is that stunnel is configured to allow our computer to see the device, but not the other way around.

Since React Native debugging needs to load javascript hosted by your workstation, we’ll need to point debugging on the device to an address that resolves to your workstation’s address. I use ngrok for this.

./ngrok http 8081

This produces a dynamic hostname that will forward all incoming traffic on port 8081 down to my local workstation where the React server is running.

To get to the React Native developer tools on the device to have the right debugging server address, I simulate a device shake by sending a keypress 82 command, then navigate to ‘Dev settings’ and ‘Debug server host & port…’:

 

And voila! A React Native app on a real Samsung Galaxy S7 device hosted in the Perfecto cloud running in debug mode!

Could this be easier? Sure, if React Debugging used ADB to do all debugging interactions, but that would mean a lot of re-architecting on their part. The simple fact is, React Native debugging dictates access to the developer workstation via HTTP, so ngrok is kind of necessary due to their tooling.

Next steps:

You could automate the IP configuration in AppDelegate.m like this walkthrough does if you want to.

You could probably even grep the dynamic host name in a shell command and write it dynamically to that file before React Native deployment to the device. But that would be another article.

More reading:

Locale bugs & currency formatting in Android Studio

I grew up in the United States and like most countries I’ve traveled to, I’m used to seeing money that look like this:   $56.33   €12,50   £281.71

A modern IDE tells you when you’re ignorant

Today, Android Studio was kind enough to tell me that some sample code had a problem: “Implicitly using the default locale is a common source of bugs”. Nice.

Looking at this sample code, I asked myself: “Why should the UI format this number, isn’t that going to hide what the underlying application logic is saying?”

So it was a great thing that Android Studio shows little messages like this because it got me thinking about how to properly handle currency in the view layer.

Simply changing your app code won’t save you!

This is a case where view layer formatting isn’t the appropriate place to deal with rounding. The business logic of this application, the algorithm that calculates tips, should ultimately be in charge of rounding to the locale-appropriate currency decimal place, in Java this means using the BigDecimal class for manipulating values. So, I forked my own copy and updated the sample to remove the view formatting code.

Then I changed the locale on my device, and my Espresso tests started breaking.

See, when you write your format codes and test logic under a narrow mindset of one locale/language/currency (en_US), the test data you use can break your tests if the app is run under a different locale (ar-IQ) which format things like currencies (USD, IQD) to a different arithmetic precision (the dollar uses 2 decimal places, the Dinar uses 3). [locale/currency lookup table]

An example of an Espresso test written assuming western currencies can be found below. Dinar has three decimal places, so this particular sample fails because values are handled as doubles (floating point numbers). Without controller logic to deal with rounding, it comes out as 35.912 which is not the same as the 2-decimal data “35.91” in my test code.

To simplify things, much of the code is written using doubles to pass currency around, even though this isn’t best practice. Even still, so long as we use BigDecimal to handle the higher-order calculations, we can downgrade the decimal precision in outbound double values to the view layer. Then we have the option of using locale-accurate test data, managing the precision in our tests as well.

Check it out for yourself…

If you want to spin these examples up yourself, you can clone my repo. Also, if you want to see this work in continuous integration, check out my article on running a Jenkins / Android build server on Docker.

Reference:

Jenkins on Docker to build Android apps

For an reference example, I had to set up Jenkins to build my Android app. Though I’m using a Mac, once Docker is involved, I can also use the exact same steps on my Windows machine too, and so can you.

  1. Install Docker
  2. Use my existing Docker setup files
    1. Grab the contents of this GitHub folder to your machine
    2. Change directory to where you saved the above contents
    3. Build the docker files to create images. Run: 

       
    4. Run these freshly minted Docker images as a new container:

       

  3. In a browser, log in to your Jenkins instance at http://127.0.0.1:8080
  4. Complete the initial Jenkins setup by walking through the on-screen prompts

Much thanks to Sha who wrote this article that quickly highlights the steps for getting Jenkins 2.0 running on Docker. All I added to my Dockerfile was steps to install the Android SDK so that Jenkins can build my app.

Related reading:

 

AnDevCon: Espresso Is Awesome, But Where Are It’s Edges?

For my presentation at AnDevCon SF 2016, I focused on how Espresso represents a fundamental change in how approach the process of shipping software that provably works on a mobile ecosystem that is constantly changing.

The feedback was overwhelmingly good, many people who stopped by the Perfecto booth before or after our talk came to me to discuss topics I raised. In other words, it did what I wanted, which was to provide value and strike up conversation about how to improve the Android UI testing process.

If you’re pressed for time, my slides can be found below or at:
bit.ly/espresso-edges-andevcon

Android Studio: How to find the Package Name from APK

If you’ve been given an APK file but don’t know the ApplicationId/Package name, you can use the ‘aapt’ (Android Asset Packaging Tool) to obtain this value. What you’ll need:

  • Your Android SDK install directory path
    • Mac: /Users/yourusername/Library/Android/sdk
    • Win: C:\Program Files\Android\SDK
  • Your APK file (example here)

You’ll have to navigate beyond the SDK folder to the ‘build-tools’ and then your version number (see below) to find the ‘aapt’ tool. From that directory, you can run:

aapt dump badging <path to your apk>

This will dump the details of your app manifest, the first line of which is your package name (a.k.a. the ApplicationId in a build.gradle file).

Additionally, you can get a complete listing of the manifest by using:

aapt list -a <path to your apk>

Why Do I need the the Application ID / Package Name?

For various reasons, this identifier is important. For instance, debugging with the ADB command tools or launching an existing app requires this ID.

In my case, I needed to know what to add to a Perfecto test for the Espresso Execute test step:

I have to add the “.test” suffix to the package name coming from aapt because I need to tell the Perfecto Espresso executor to run tests.

 

Why Espresso: Unit vs. UI testing

This article differentiates unit tests, such as those written for jUnit, from UI tests in Espresso through both purpose and technical value.

What is Espresso?

Espresso is an automated UI testing framework for Android apps. They are scripts written in Java that simulate interactions with the app while it is running, either in an emulated environment or on a physical device.

Espresso tests are “instrumented”, which means that internal workings (context) about the app such as object names, runtime variables, and other symbolic information is made available to the tests. Using Android Debug Bridge (ADB) to provide runtime feedback between tools like Android Studio and the app as test activities are executed on the target device.

How is Espresso different than Unit Testing?

Unit tests focus on small portions of code (i.e. class-level methods) and provide basic validation that code is working as expected. Espresso tests provide basic validation that the UI is working as expected.

Early feedback from lots of tiny unit tests on each build help developers know when they just “broke” something by changing some other portion of code. Early means often, and often means that speed and reliability of these tests are crucial.

To that end, unit tests typically are hermetic and rely on stubs/mocks to stand in for dependencies. Antithetically, Espresso UI tests work through platform API which requires a runtime and device capabilities that are not faked. This provides more realistic feedback on code that might work at the unit level, but fails when chained together or during basic usability validations.

When should we write Unit Tests?

Always. You can figure out for yourself what total percent of lines of code are covered by unit tests, but this is a battle against low-level technical debt. How often do you want to be surprised when a change that seemingly had nothing to do with one piece of code breaks because expectations over what that code does weren’t spelled out in a way that could be exercised regularly?

That’s really what validation testing boils down to: are we communicating basic expectations about the things we’re about to ship? Unit is just at a very low level, but the same applies at the UI workflow level too.

When should we write UI tests?

Whenever you have a UI…that’s pretty obvious, right? People use an app, they don’t call your class methods in isolation with static data on emulators. Eventually you have to get real: simulate clicks, drags, gestures, network conditions, and platform upgrades because that’s how real people are using your user interface (a.k.a. your app).

How much UI testing you do is up to you, but it boils down to time cost. UI tests are often more complicated to write, though as we see with Espresso, a developer-focused syntax and fast execution speed goes a long way to reducing cultural friction to writing UI tests as part of “development complete”.

Sideline: API teams, you don’t get off so easily either. Your developer experience is your UI, the patterns by which your users call your endpoints, designed or otherwise, are equivalent to workflows. Equivalent to UI testing for app developers, holistic API tests that simulate known trends and expectations on your API will help you isolate breaking changes earlier and faster in your build cycles, it’s that simple.