Using Appium to Test Fingerprint Authentication on Android Devices

In this article, I’ll show how you can use Appium to automate fingerprint authentication on Android mobile devices. The general process also applies to iOS, though specific implementation is not discussed here.

This is based on work I did in preparation for presenting at Mobile Tea Boston in June 2017. This example is just a small part of a broader conversation on automating quality across the delivery pipeline.

Git example: https://github.com/paulsbruce/FingerprintDemo

Fingerprint Security: Great for UX

First question I asked was “why would we integrate fingerprint login functionality into our apps?” The short answer is “high security, low friction“. There are compelling use cases for fingerprint authentication.

Paswordless systems usually require people to use SMS or email to confirm login…high friction IMO to the user experience, but who wants their user to leave their UX purposely? This is better security at the cost of poor workflow.

Multi-factor authentication is another good user case. Using biometric ensures that the unique identity of the individual is presented along with additional credentials.

Step-up authentication is another popular method of keeping the run-rate user experience frictionless, yet increasing protection over sensitive information and operations on a user’s account.

Fingerprint Security: Bad for Development Velocity

So for teams who want to implement fingerprint authentication into their mobile apps, this also means they need to automate tests that integrate fingerprint security. What does the test automation process look like?

In short, it’s a mess. Android libraries and the default UI test framework Espresso contain zero support for fingerprint automation. Since October 2015 with the release of Android 6.0 M, Google provides a standard API for integrating these features into mobile app code, but no way of automating it.

The same is true for Touch ID on iOS, though there are interactive ways to simulate fingerprint events when running XCTest suites in XCode, there is no easy way of writing an automated test that can provide coverage over these workflows.

Without some other automation alternative, these portions of functionality fall prey to the ice-cream cone anti-pattern. What a pity.

Solution: Find the Right Framework

Espresso is fast because it runs directly alongside the main app code on the device. However, since the only way Google provided us to simulate fingerprint events is through ADB (i.e. ‘adb -e emu finger touch …’), this has to be run on the machine where Android tools are installed and where the device is connected.

Appium, an open source outgrowth of Selenium for mobile apps, is architected differently from Espresso and XCTest. Though often slower for this reason, it has some advantages too:

Instead of running directly on the device as a sibling process, Appium tests are executed from a server to which the devices are connected. This provides a context whereby we can inject device-specific commands against the device, in combination with the calls through the testing framework itself, to simulate the entire workflow on the device in one script.

An example of this can be found in my Github FingerprintDemo repo.

Because I want to write all my code and tests in the same IDE, I keep unit tests and Espresso tests as part of the normal conventions in the ‘app’ module, but I create a separate module called ‘appium’ that can be compiled as a separate jar artifact from the main APK. This keeps my Gradle dependencies for testing separate from my app and my build.gradle scripts clean and clear.

In short, it boils down to test code that looks like this:

Appium + fingerprint = depends on your lab

If you manage a very small local lab, you have the liability control to execute whatever custom commands you need to on your devices.

If you’ve graduated to using devices (emulators/simulators/real) in the cloud via some service like Firebase, Perfecto, or TestObject, then your ability to simulate fingerprint events reliably really depends on which one you’re using.

For instance, both Perfecto and TestObject provide SSH direct connections to devices, so in theory you could run custom ADB commands against them; Firebase and AWS Device Farm aren’t even close to having this capability.

In practice, these cloud services also provide automation endpoints and SDKs to execute these tasks reliably. Perfecto, for instance, has both DevTunnel direct access and scripted fingerprint simulation support in Appium.

Treat Code and Tests as Equal Citizens

Everyone should have access to app code AND test code. Period. Some large organizations often fear that this will leak proprietary secrets to offshore and out-of-cycle testing teams. That’s what contracts and proper repository permissions are for.

The benefit for modern teams is that test engineers have better visibility into the app, making test creation faster and initial root cause analysis of defects found faster. In my example, this is what the simplified IDE experience looks like:

Now that we can press the play button on A) our app, B) our unit and Espresso tests, and C) our E2E fingerprint Appium tests, everyone on the team has the option to make sure their changes don’t introduce negative impacts on all aspects of the user experience.

‘Works on My Machine’ Isn’t Good Enough

Test code applies first and foremost to the development experience, but also to the build system later on. In the case in including Appium tests in an Android project, this means we need to be keenly aware of the test infrastructure used to simulate fingerprint actions locally against emulators.

Expect that you will need to “productionize” this process to fit into the build process. By introducing a number of new moving parts (emulators, Appium, custom adb commands) we’ll also need to perpetuate that as a build stack.

I’m a Jenkins nerd, so what this means in terms of build infrastructure is that we need to create build nodes that contain the components necessary to run Appium tests in isolation of other processes as well. Emulators keep the solution device-independent and can simplify the test execution logistics, but only provide a very narrow slice of reality.

To integrate real devices into this mix, you either have to manage a local Appium grid (which again, is challenging) or write your tests to use a cloud lab solution. In the end, you’ll have to parameterize your tests along the following environment variables:

  • Appium server address
    • localhost for development workstations and Appium emulator stack in CI
    • Shared/cloud host for real devices
  • (if emulators)
    • emulator image (i.e. Nexus_6_API_24, etc.)
  • Device capabilities
    • Platform (Android/iOS)
    • Platform version
    • App (binaries) under test
    • (if shared/cloud) credentials or API keys

Recap:

Since there’s no support for fingerprint simulation directly from Espresso, we have to rely on other test frameworks like Appium to cover these use cases. Really, the test architecture needs to fit the use case, and Appium provides us a way to mix test framework calls with native commands to other mobile tools. This requires us to introduce complexity carefully, plan for how that impacts our build-verification testing stack when triggered by continuous integration.

More reading:

Automating the Quality of Your Digital Front Door

Mobile is the front door to your business for most / all of your users. But how often do you use your front door, a few times a day? How often do your users use your app? How often would you like them to? It’s really a high-traffic front door between people and you.

This is how you welcome people into what you’re doing. If it’s broken, people don’t feel welcome.

[7/27/2017: For my presentation at Mobile Tea Boston, my slides and code samples are below]

 

Slides with notes: http://bit.ly/2tgGiGr
Git example: https://github.com/paulsbruce/FingerprintDemo

The Dangers of Changing Your Digital Front Door

In his book “On Intelligence”, Hawkins describes how quickly our human brains pick up on minute changes with the analogy of someone replacing the handle on your front door with a knob while you’re out. When you get back, things will seem very weird. You feel disoriented, alienated. Not emotions we want to invoke in our users.

Now consider what it’s like for your users to have you changing things on their high-traffic door to you. Change is good, but only good changes. And when changes introduce problems, forget sympathy, forget forgiveness, people revolt.

What Could Possibly Go Wrong?

A lot. Even for teams that are great at what they do, delivering a mobile app is fraught with challenges that lead to:

  • Lack of strategy around branching, merging, and pushing to production
  • Lack of understanding about dependencies, impacts of changes
  • Lack of automated testing, integration woes, no performance/scalability baselines, security holes
  • Lack of communication between teams (Front-end, API, business)
  • Lack of planning at the business level (marketing blasts, promotions, advertising)

Users don’t care about our excuses. A survey by Perfecto found that more than 44% of defects in mobile apps are found by users. User frustrations aren’t just about what you designed, they are about how they behave in the real world too. Apps that are too slow will be treated as broken apps and uninstalled just the same.

What do we do about it?

We test, but testing is a practice unto itself. There are many test types and methodologies like TDD, ATDD, and BDD that drive us to test. Not everyone is cut out to be a great tester, especially when developers are driven to write only things that works, and not test for when it shouldn’t (i.e. lack of negative testing).

Allistar Scott – Test ‘Ice Cream Cone’

In many cases, automation gaps and issues make it easier for development teams to fall back to manual testing. This is what Allistar Scott (of Ruby Waitr) calls the anti-pattern ‘ice cream cone’, an inversion of the ideal test pyramid, and Mike Cohen has good thoughts on this paradigm too.

To avoid this downward spiral, we need to prioritize automation AND which tests we chose to automate. Testing along architecturally significant boundaries, as Kevin Henney puts it, is good; but in a world full of both software and hardware, we need to broaden that idea to ‘technologically significant boundaries‘. The camera, GPS, biometric, and other peripheral interfaces on your phone are a significant boundary…fault lines of the user experience.

Many development teams have learned the hard way that not including real devices in automated testing leaves these UX fault lines at risk of escaping defects. People in the real world use real devices on real networks under real usage conditions, and our testing strategy should reflect this reality too.

The whole point of all this testing is to maintain confidence in our release readiness. We want to be in an ‘always green’ state, and there’s no way to do this without automated, continuous testing.

Your Code Delivery Pipeline to the Rescue!

Confidence comes in two flavors: quality and agility. Specifically, does the code we write do what we intend, and can we iterate and measure quickly?

Each team comes with their own definition of done, their own acceptable levels of coverage, and their own level of confidence over the what it takes to ship, but answering both of these questions definitively requires adequate testing and a reliable pipeline for our code.

Therein lies the dynamic tension between agility (nimbleness) and the messy world of reality. What’s the point of pushing out something that doesn’t match the needs of reality? So we try to pull reality in little bits at a time, but reality can be slow. Executing UI tests takes time. So we need to code and test in parallel, automate as much as possible, and be aware of the impact of changes on release confidence.

The way we manage this tension is to push smaller batches more frequently through the pipeline, bring the pain forward, in other words continuous delivery and deployment. Far away from monolithically, we shrink the whole process to an individual contributor level. Always green at the developer level…merge only code that has been tested automatically, thoroughly.

Even in a Perfect World, Your Front Door Still Jams

So automation is crucial to this whole thing working. But what happens when we can’t automate something? This is often why the “ice cream cone” exists.

Let’s walk through it together. Google I/O or WWDC drops new hardware or platform capabilities on us. There’s a rush to integrate, but a delay in tooling and support gums up development all the way through production troubleshooting. We mock what we have to, but fall back to manual testing.

This not only takes our time, it robs us of velocity and any chance to reach that “always green” aspiration.

The worst part is that we don’t even have to introduce new functionality to fall prey to this problem. Appium was stuck behind lack of iOS 10 support for months, which means most companies had no automated way to validate on a platform that was out already.

And if anything, history teaches us that technology advances whether the last thing is well-enough baked or not. We are still dealing with camera (i.e. driver stack) flakiness! Fingerprint isn’t as unreliable, but still part of the UI/UX. And many of us now face an IoT landscape with very few standards that developers follow.

So when faced with architectural boundaries that have unpolished surfaces, what do we do? Mocks…good enough for early integration, but who will stand up and say testing against mocks is good enough to go to production?

IoT Testing Provides Clues to How We Can Proceed

In many cases, introducing IoT devices into the user experience means adding architecturally significant boundaries. Standards like BLE, MQTT, CoAP and HTTP provide flexibility to virtualize much of the interactions across these boundaries.

In the case of Continuous Glucose Monitoring (CGM) vendors, their hardware and mobile app dev cycles are on very different cycles. But to integrate often, they virtualize BLE signals to real devices in the cloud as part of their mobile app test scripts. They also adopt “IoT ninjas” as part of the experience team, hardware/firmware engineers that are in change of prototyping changes on the device side, to make sure that development and testing on the mobile app side is as enabled as possible.

Adding IoT to the mix will change your pyramid structure, adding pressure to rely on standards/interfaces as well as manual testing time for E2E scenarios.

[For more on IoT Testing, see my deck from Mobile/IoT Dev+Test 2017 here]

Automated Testing Requires Standard Interfaces

There are plenty of smart people looking to solve the busy-work problem with writing tests. Facebook Infer, Appdiff, Functionalize, and MABL are just a few of the new technologies that integrate machine learning and AI to reduce time-spend on testing busy-work.

But any and all programmatic approach, even AI, requires standard interfaces; in our case, universally accepted development AND testing frameworks and technologies.

Tool ecosystems don’t get built without foundational standards, like HTML/CSS/JS, Android, Java, and Swift. And when they want to innovate on hardware or platform, there will always be some gaps, usually in automation around the new stuff.

Example Automation Gap: Fingerprint Security

Unfortunately for those of us who see the advantages of integrating with innovative platform capabilities like biometric fingerprint authentication, automated testing support is scarce.

What this means is that we either don’t test certain critical workflows in our app, or we manually test them. What a bummer to velocity.

The solution is to have people who know how to implement multiple test frameworks and tools in a way that matches the velocity requirements of development.

For more information in this, see my deep-dive on how to use Appium in Android development to simulate fingerprint activities in automated tests. It’s entirely possible, but requires experience and a planning over how to integrate a mobile lab into your continuous integration pipeline.

 

Tailoring Fast Feedback to Resources (and vice versa)

As you incrementally introduce reality into every build, you’ll run into two problems: execution speed and device pool limits.

To solve the execution speed, most development teams parallelize their testing against multiple devices at once, and split up their testing strategy to different schedules. This is just an example of a schedule against various testing types.

For more on this, I published a series of whitepapers on how to do this.

TL;DR recap

Automating the quality of our web and mobile apps keeps us accurate, safe, and confident; but isn’t easy. Fortunately we have many tools and a lot of thought put in already to how to do this. Notwithstanding ignorance of some individuals, automation continues to change the job landscape over and over again. 

Testing always takes tailoring to the needs of the development process to provide fast feedback. The same is true in reverse: developers need to understand where support gaps exist in test frameworks and tooling, otherwise they risk running the “ship” aground.

This is why, and my mantra remains, it is imperative to velocity to have the right people in the planning room when designing new features and integrating capabilities across significant technological boundaries.

Similarly, in my research on developer efficiency, we see that there is a correlation between increased coverage over non-functional criteria on features and test coverage. Greater completeness in upfront planning saves time and effort, it’s just that simple.

Just like Conway’s “law”, the result of your team, it’s structure, communication patterns, functions and dysfunctions, all show up in the final product. Have the right people in the room when planning new features, retros, and determining your own definition of done. Otherwise you end up with more gaps than simply in automation.

Meta / cliff notes:

  • “Everyone owns quality” means that the whole team needs to be involved in testing strategy
    • To what degree are various levels of testing included in Definition of Done?
    • Which test sets (i.e. feedback loops) provide the most value?
    • How are various tests triggered, considering their execution speed?
    • Who’s responsible for creating which types of tests?
    • How are team members enabled to interpret and use test result data?
    • When defects do escape certain stages, how is RCA used to close the gap?
    • Who manages/fixes the test execution framework and infrastructure?
    • Does the benefits of the current approach to testing outweigh the cost?
  • Multiple testing framework / tool / platform is 200 OK
    • We already use separate frameworks for separate test types
      • jUnit/TestNG (Java) for unit (and some integration) testing
      • Chakram/Citrus/Postman/RestAssured for API testing
      • Selenium, Appium, Espresso, XCTest for UI testing
      • jMeter, Dredd, Gatling, Siege for performance testing
    • Tool sprawl can be a challenge, but proper coverage requires plurality
    • Don’t overtax one framework or tool to do a job it can’t, just find a better fit
  • Incremental doses of reality across architecturally significant boundaries
    • We need reality (real devices, browsers, environments) to spot fragility in our code and our architecture
    • Issues tend to clump around architecturally significant boundaries, like API calls, hardware interfaces, and integrations to monolithic components
    • We stub/mock/virtualize to speed development; signs of “significant” boundaries, but it only tells us what happens in isolation
    • A reliable code pipeline can do the automated testing for you, but you still need to tell it what and when to test; have a test execution strategy that considers:
      • testing types (unit, component, API, integration, functional, performance, installation, security, acceptance/E2E, …)
      • execution speed (<2m, <20m, <2h, etc) vs. demand for fast feedback
      • portions of code that are known-fragile
      • various critical-paths: login, checkout, administrative tasks, etc.
    • Annotations denote tests that relate across frameworks and tools
      • @Signup, @Login, @SearchForProduct, @V2Deploy
      • Tag project-based work (like bug fixes) like: JIRA-4522
  • Have the right people in the room when planning features
    • Future blockers like test framework support for new hardware capabilities will limit velocity, so have test engineers in the planning phases
    • Close the gap between what was designed vs. what is feasible to implement by having designers and developers prototype together
    • Including infrastructure/operations engineers in planning reduces later scalability issues; just like testers, this can be a blocker to release readiness
    • Someone, if not all the people above, should represent the user’s voice

More reading: