A few words to manufacturers and vendors of tech toys: to really be ready for the holiday, if your product requires software updates in order to work or is in any way internet connected, make sure your site stays up. Otherwise, you just shipped coal.
Provide more than one distribution point for downloadable updates/binaries
Rely on CDNs for static assets (like installers and documentation)
If your update process must rely on live services, make sure they’re scalable
Load test subcomponents/microservices AND the end-to-end process
Be prepared for damage control by:
Monitoring site uptime and availability to know when things are broken
Proactively establish a communication channel with customers during issues
Properly staff IT and support for during AND post-season issues
Make sure the cost of downtime is factored into your next sales cycle
Santa Brought Us a Brick?
Let me start by admitting how 1st-world this example is. Robots as play-things are still not exactly ‘so easy, a child could do it’, and Roomba’s have been around for almost two decades, but we still have yet to see a really down-to-earth home robotics project that really works for children under 10. I don’t just mean toys that are not pre-assembled, even the right-out-of-the-box kind often require firmware updates or online services to really work as expected.
Case in point, the Meccano MAX. Though it only took about 3hrs total to put together, this morning when we finally turned it on and went to connect for a firmware update, the vendor’s website was down…hard. The instructions said, before anything else, update the ‘MeccaMind’ and voice commands weren’t working without, so, blocker.
As an ops nerd, I slapped an uptime monitor on it to know when (if ever) it was back up:
That didn’t stop the whole multi-day experience from deflating to a dud. We all worked on this thing together and then before it can do anything, we are stuck guessing about when we can actually enjoy it. Don’t blame Santa, blame the geniuses at Meccano.
Performance, of your product, of your service, of your site, is imperative to delivering what you sold people. Availability, uptime, scalability, and reliability matter by default now. Everyone has downtime, but 4hr recovery time on your corporate domain isn’t just irresponsible and costly, it’s plain embarrassing…and transparent.
Why Is This Even Important?
My 7yr old is crazy into coding right now. Granted, we use a visual Code Block Editor mostly for lights and tones, but it’s a great way to introduce concepts like flow control and formal logic. As soon as she saw an example I built that used function blocks to encapsulate and reuse logic, she instantly understood and started refactoring her programs.
But finding the right project for varying stages of aptitude, appetite, and enjoyment is a real challenge. No thanks to marketing, but also unanticipated road-blocks like service and subscription dependencies are hard for consumers to factor in when purchasing. Even when you do find a right-fit project, if bone-headed problems like website downtime occur, it can become a negative experience for the child (or student).
It’s important for STEM product manufacturers and software vendors to really think about the impact of what they’re selling, how they’re delivering it, and how to support people who paid them money for something to accomplish a goal. If you don’t have the optimal consumer’s experience in mind, it will eventually cost you.
Need the Robot Software Updater?
I can’t archive everything on their site, and it’s their job to provide reliable content distribution, but in case you find yourself stuck like I was, here are links to at least the firmware updater tool:
Also, never ask a consumer if they want to choose (null).
And if the updater gives you flashbacks to DirectX drivers from 1997, don’t worry. It only looked like it bricked my robot for about 4mins before providing UI feedback:
Once you do get back to the modern era, a 98.6MB mobile app to control it shouldn’t be too hard on your data plan. They also need to know your GPS location, phone contacts, and file storage for some reason.
I hate acronyms. My dad used to use them far too much, the kind of guy that was more smart in retrospect than the kind of boy I was understood at the time. Kind, thoughtful, quiet, and invested in people around him.
My current thought product is an acronym, “FUEL”, based on a few key practices that I find are valuable to my current line of work as a Change Agent. These practices take time to develop and are only truly useful when used in parallel.
Focus: ability to right-size activities to close the task(s) currently underway
Usefulness: ability to gauge effectiveness of work and reprioritize based on new ideas/objectives/activities
Execution: ability to match skill to task, collaborate the plan, and resolve blockers as they arise
Learning: ability to observe outcomes and refactor them into useful ways to improve all of the above
Real-time Example of FUEL
Last night after a meetup, I had a beer with someone I’d met before at a local conference but hadn’t dived into. The opportunity presented itself, so I stayed a little later than I normally would. They are a CTO for a 50-person startup in town. Net-net:
Paul: “What’s weighing on you right now man, work related?” (L)
Them: “Kind of glad someone asked…we have people issues.” (E)
Paul: “You’re not alone…what kind?” (E/F)
Them: “There are a few ‘senior’ engineers that don’t produce like others.” (U)
Paul: “What’s your plan for them and the rest of your team?” (U)
Them: “We just laid off one after giving him a path, but the other two, I don’t know…maybe add metrics, visibility…they’re kind of SPOFs.” (U)
Paul: “…so you can quantify what you already know? How did we arrive here?” (U/L)
Them: “They were here at the beginning, hence ‘senior’, but one guy hasn’t committed code since Sept (5 months)!” (E)
Paul: “Got it, they don’t ‘git’ it. [laughs] How are you and the leadership team helping to coach other junior engineers?” (L)
Them: “Well that’s the problem maybe. We don’t exactly have a culture yet, but our C-level relationships with each other are solid.” (U/E)
Paul: “I once heard that great leaders define their success by fostering other leaders. Do you know who your real ‘senior’ engineers are?” (F/L)
Them: “Well I kind of already know who deserves the chance to step up.” (E)
Paul: “That’s good, but not enough. People often hesitate on new things simply because they haven’t experienced how it works yet. Your coaching needs to help those people get over any blockers to proving one way or the other if they can do the job well/right/better.” (FUEL)
Them: “I’m going to talk to our CEO about this. Can I get your card?”
Paul: “Only if you intend to use it.”
Good enough exercise and learning for me for one night.
Mobile is the front door to your business for most / all of your users. But how often do you use your front door, a few times a day? How often do your users use your app? How often would you like them to? It’s really a high-traffic front door between people and you.
This is how you welcome people into what you’re doing. If it’s broken, people don’t feel welcome.
[7/27/2017: For my presentation at Mobile Tea Boston, my slides and code samples are below]
In his book “On Intelligence”, Hawkins describes how quickly our human brains pick up on minute changes with the analogy of someone replacing the handle on your front door with a knob while you’re out. When you get back, things will seem very weird. You feel disoriented, alienated. Not emotions we want to invoke in our users.
Now consider what it’s like for your users to have you changing things on their high-traffic door to you. Change is good, but only good changes. And when changes introduce problems, forget sympathy, forget forgiveness, people revolt.
What Could Possibly Go Wrong?
A lot. Even for teams that are great at what they do, delivering a mobile app is fraught with challenges that lead to:
Lack of strategy around branching, merging, and pushing to production
Lack of understanding about dependencies, impacts of changes
Lack of automated testing, integration woes, no performance/scalability baselines, security holes
Lack of communication between teams (Front-end, API, business)
Lack of planning at the business level (marketing blasts, promotions, advertising)
Users don’t care about our excuses. A survey by Perfecto found that more than 44% of defects in mobile apps are found by users. User frustrations aren’t just about what you designed, they are about how they behave in the real world too. Apps that are too slow will be treated as broken apps and uninstalled just the same.
What do we do about it?
We test, but testing is a practice unto itself. There are many test types and methodologies like TDD, ATDD, and BDD that drive us to test. Not everyone is cut out to be a great tester, especially when developers are driven to write only things that works, and not test for when it shouldn’t (i.e. lack of negative testing).
In many cases, automation gaps and issues make it easier for development teams to fall back to manual testing. This is what Allistar Scott (of Ruby Waitr) calls the anti-pattern ‘ice cream cone’, an inversion of the ideal test pyramid, and Mike Cohen has good thoughts on this paradigm too.
To avoid this downward spiral, we need to prioritize automation AND which tests we chose to automate. Testing along architecturally significant boundaries, as Kevin Henney puts it, is good; but in a world full of both software and hardware, we need to broaden that idea to ‘technologically significant boundaries‘. The camera, GPS, biometric, and other peripheral interfaces on your phone are a significant boundary…fault lines of the user experience.
Many development teams have learned the hard way that not including real devices in automated testing leaves these UX fault lines at risk of escaping defects. People in the real world use real devices on real networks under real usage conditions, and our testing strategy should reflect this reality too.
The whole point of all this testing is to maintain confidence in our release readiness. We want to be in an ‘always green’ state, and there’s no way to do this without automated, continuous testing.
Your Code Delivery Pipeline to the Rescue!
Confidence comes in two flavors: quality and agility. Specifically, does the code we write do what we intend, and can we iterate and measure quickly?
Each team comes with their own definition of done, their own acceptable levels of coverage, and their own level of confidence over the what it takes to ship, but answering both of these questions definitively requires adequate testing and a reliable pipeline for our code.
Therein lies the dynamic tension between agility (nimbleness) and the messy world of reality. What’s the point of pushing out something that doesn’t match the needs of reality? So we try to pull reality in little bits at a time, but reality can be slow. Executing UI tests takes time. So we need to code and test in parallel, automate as much as possible, and be aware of the impact of changes on release confidence.
The way we manage this tension is to push smaller batches more frequently through the pipeline, bring the pain forward, in other words continuous delivery and deployment. Far away from monolithically, we shrink the whole process to an individual contributor level. Always green at the developer level…merge only code that has been tested automatically, thoroughly.
Even in a Perfect World, Your Front Door Still Jams
So automation is crucial to this whole thing working. But what happens when we can’t automate something? This is often why the “ice cream cone” exists.
Let’s walk through it together. Google I/O or WWDC drops new hardware or platform capabilities on us. There’s a rush to integrate, but a delay in tooling and support gums up development all the way through production troubleshooting. We mock what we have to, but fall back to manual testing.
This not only takes our time, it robs us of velocity and any chance to reach that “always green” aspiration.
The worst part is that we don’t even have to introduce new functionality to fall prey to this problem. Appium was stuck behind lack of iOS 10 support for months, which means most companies had no automated way to validate on a platform that was out already.
And if anything, history teaches us that technology advances whether the last thing is well-enough baked or not. We are still dealing with camera (i.e. driver stack) flakiness! Fingerprint isn’t as unreliable, but still part of the UI/UX. And many of us now face an IoT landscape with very few standards that developers follow.
So when faced with architectural boundaries that have unpolished surfaces, what do we do? Mocks…good enough for early integration, but who will stand up and say testing against mocks is good enough to go to production?
IoT Testing Provides Clues to How We Can Proceed
In many cases, introducing IoT devices into the user experience means adding architecturally significant boundaries. Standards like BLE, MQTT, CoAP and HTTP provide flexibility to virtualize much of the interactions across these boundaries.
In the case of Continuous Glucose Monitoring (CGM) vendors, their hardware and mobile app dev cycles are on very different cycles. But to integrate often, they virtualize BLE signals to real devices in the cloud as part of their mobile app test scripts. They also adopt “IoT ninjas” as part of the experience team, hardware/firmware engineers that are in change of prototyping changes on the device side, to make sure that development and testing on the mobile app side is as enabled as possible.
Adding IoT to the mix will change your pyramid structure, adding pressure to rely on standards/interfaces as well as manual testing time for E2E scenarios.
There are plenty of smart people looking to solve the busy-work problem with writing tests. Facebook Infer, Appdiff, Functionalize, and MABL are just a few of the new technologies that integrate machine learning and AI to reduce time-spend on testing busy-work.
But any and all programmatic approach, even AI, requires standard interfaces; in our case, universally accepted development AND testing frameworks and technologies.
Tool ecosystems don’t get built without foundational standards, like HTML/CSS/JS, Android, Java, and Swift. And when they want to innovate on hardware or platform, there will always be some gaps, usually in automation around the new stuff.
Example Automation Gap: Fingerprint Security
Unfortunately for those of us who see the advantages of integrating with innovative platform capabilities like biometric fingerprint authentication, automated testing support is scarce.
What this means is that we either don’t test certain critical workflows in our app, or we manually test them. What a bummer to velocity.
The solution is to have people who know how to implement multiple test frameworks and tools in a way that matches the velocity requirements of development.
Tailoring Fast Feedback to Resources (and vice versa)
As you incrementally introduce reality into every build, you’ll run into two problems: execution speed and device pool limits.
To solve the execution speed, most development teams parallelize their testing against multiple devices at once, and split up their testing strategy to different schedules. This is just an example of a schedule against various testing types.
Automating the quality of our web and mobile apps keeps us accurate, safe, and confident; but isn’t easy. Fortunately we have many tools and a lot of thought put in already to how to do this. Notwithstanding ignorance of some individuals, automation continues to change the job landscape over and over again.
Testing always takes tailoring to the needs of the development process to provide fast feedback. The same is true in reverse: developers need to understand where support gaps exist in test frameworks and tooling, otherwise they risk running the “ship” aground.
This is why, and my mantra remains, it is imperative to velocity to have the right people in the planning room when designing new features and integrating capabilities across significant technological boundaries.
Similarly, in my research on developer efficiency, we see that there is a correlation between increased coverage over non-functional criteria on features and test coverage. Greater completeness in upfront planning saves time and effort, it’s just that simple.
Just like Conway’s “law”, the result of your team, it’s structure, communication patterns, functions and dysfunctions, all show up in the final product. Have the right people in the room when planning new features, retros, and determining your own definition of done. Otherwise you end up with more gaps than simply in automation.
Meta / cliff notes:
“Everyone owns quality” means that the whole team needs to be involved in testing strategy
To what degree are various levels of testing included in Definition of Done?
Which test sets (i.e. feedback loops) provide the most value?
How are various tests triggered, considering their execution speed?
Who’s responsible for creating which types of tests?
How are team members enabled to interpret and use test result data?
When defects do escape certain stages, how is RCA used to close the gap?
Who manages/fixes the test execution framework and infrastructure?
Does the benefits of the current approach to testing outweigh the cost?
Multiple testing framework / tool / platform is 200 OK
We already use separate frameworks for separate test types
jUnit/TestNG (Java) for unit (and some integration) testing
Chakram/Citrus/Postman/RestAssured for API testing
Selenium, Appium, Espresso, XCTest for UI testing
jMeter, Dredd, Gatling, Siege for performance testing
Tool sprawl can be a challenge, but proper coverage requires plurality
Don’t overtax one framework or tool to do a job it can’t, just find a better fit
Incremental doses of reality across architecturally significant boundaries
We need reality (real devices, browsers, environments) to spot fragility in our code and our architecture
Issues tend to clump around architecturally significant boundaries, like API calls, hardware interfaces, and integrations to monolithic components
We stub/mock/virtualize to speed development; signs of “significant” boundaries, but it only tells us what happens in isolation
A reliable code pipeline can do the automated testing for you, but you still need to tell it what and when to test; have a test execution strategy that considers:
If we think traditional software is hard, just wait until all the ugly details of the physical world start to pollute our perfect digital platforms.
What is the IoT?
The Internet of Things (IoT) is a global network of digital devices that exchange data with each other and cloud systems. I’m not Wikipedia, and I’m not a history book, so I’ll just skip past some things in this definitions section.
Where is the IoT?
It’s everywhere, not just in high-tech houses. Internet providers handing out new cable modems that cat as their own WiFi is just a new “backbone” for these devices to connect in over, in almost every urban neighborhood now.
Enter the Mind of an IoT Tester
How far back should we go? How long do you have? I’ll keep it short: the simpler the system, the less there is to test. Now ponder the staggering complexity of the low-cost Raspberry Pi. Multiplied by the number of humans on Earth that like to tinker, educated or no, throw in some APIs and ubiquitous internet access for fun, and now we have a landscape, a view of the magnitude of possibility that the IoT represents. It’s a huge amount of worry for me personally.
Compositionality as a Design Constraint
Good designers will often put constraints in their own way purposely to act as a sort of scaffolding for their traversal of a problem space. Only three colors, no synthetic materials, exactly 12 kilos, can I use it without tutorials, less materials. Sometimes the unyielding makes you yield in places you wouldn’t otherwise, flex muscles you normally don’t, reach farther.
IoT puts compositionality right up in our faces, just like APIs, but with hardware and in ways that both very physical and personal. It forces us to consider how things will be combined in the wild. For testing, this is the nightmare scenario.
Dr. Strangetest, or How I Learned to Stop Worrying and Accept the IoT
The only way out of this conundrum is in the design. You need to design things to very discrete specifications and target very focused scenarios. It moves the matter of quality up a bit into the space of orchestration testing, which by definition is scenario based. Lots of little things are easy to prove working independent of each other, but once you do that, the next challenges lie in the realm of how you intend to use it. Therein lies both the known and unknown, the business cases and the business risks.
If you code or build, find someone else to test it too
As a developer, I can always pick up a device I just flashed with my new code, try it out, and prove that it works. Sort of. It sounds quick, but rarely is. There’s lots of plugging and unplugging, uploading, waiting, debugging, and fiddling with things to get them to just work. I get sick of it all; I just want things to work. And when they finally *do* work, I move on quickly.
If I’m the one building something to work a certain way, I have a sort of programming myopia, where I only always want it to work. Confirmation bias.
What do experts say?
I’m re-reading Code Complete by Steve McConnell, written more than 20 years ago now, eons in the digital age. Section 22.1:
“Testing requires you to assume that you’ll find errors in your code. If you assume you won’t, you probably won’t.”
“You must hope to find errors in your code. Such hope might feel like an unnatural act, but you should hope that it’s you who find the errors and not someone else.”
True that, for code, for IoT devices, and for life.