Performance Testing with XCTest

XCTest has some incredible new performance testing capabilities, but they are rough.

Testing is one of those things that people tend to have strong feelings about. There is quite a spectrum of philosophies, from full-on TDD to the folks that avoid automated testing completely. In Xcode 11, Apple added some new performance testing capabilities to XCTest that really seem to have slipped under the radar. Regardless of your feelings about automated testing, this new stuff is very powerful and worth a look.

I’ve been pairing these new performance testing tools with the XCTest UI Automation system, and it has been incredibly impressive. However, there are definitely some challenges with its current implementation. Also, while the APIs are the same, do keep in mind that I do all my work with a macOS app.

Getting your Bearings

Confusingly, XCTest now has two different, seemingly distinct performance testing systems. This can make it hard to get started.

The original performance testing system is built around XCTPerformanceMetric, and includes a suite of methods to perform tests. This post is not about that system. What we’re looking at here are some new, largely overlapping methods based on XCTMetric.

Measuring with XCTMetric

The first fascinating difference with this new system is that XCTMetric is a protocol. It’s possible to define custom measurements and have XCTest drive the data collection and presentation. I did fool around with this, and it does seem to work. But, I suspect most people will just use the stock metrics included in XCTest.

There are five included conforming types to XCTMetric.

XCTClockMetric
XCTCPUMetric
XCTMemoryMetric
XCTOSSignpostMetric
XCTStorageMetric

I believe that the XCTClockMetric is a direct replacement for the pre-existing measure API, which has a similarly named property. So, if you are using that original API, you should be able to start using that metric as a drop-in replacement. But, doing just that on its own won’t get you much. The power lies in the other metrics these new measure APIs allow you to use.

Even better, you can measure multiple metrics simultaneously. This is a very handy technique for reducing testing time while also getting finer-grained results. My typical technique is to use one coarse signpost to cover a large, expensive operation. Then, as I dig in, I’ll add additional finer-grained signposts to break down the work. Tracking all of them together within one test is a big time saver when actively optimizing, or investigating a regression.

Your First Test

I first stumbled across these new measurement APIs when creating a new UI-based test file in Xcode 11. Inside was a fascinating new placeholder test that looked something like this:

func testLaunchPerformance() throws {
    // This measures how long it takes to launch your application.
    measure(metrics: [XCTOSSignpostMetric.applicationLaunch]) {
        XCUIApplication().launch()
    }
}

If you have ever attempted to do application launch performance testing, this should be completely blowing your mind. In the past, this required carefully constructed manual instrumentation, along with laborious testing procedures. By comparison, this snippet of code is just incredible.

There are two things working together here to make this so awesome.

Signposts + UI Automation

First, the thing being measured is defined by a built-in signpost. Signposts are a super flexible and powerful performance measurement tool. You may already be using them with Instruments. The key aspect is you can precisely control when your measurement starts and stops, without being limited to what the XCTest API provides. You can do this in a limited fashion with the existing API, but XCTOSSignpostMetric makes this much easier and much more flexible.

Second, you can drive your test using UI automation. To my knowledge, this is only possible with these new APIs. Using UI automation is not a requirement, but it is an awesome capability. And, the stock performance measurement metrics were all built to instrument XCUIApplication instances right off the bat.

This combination makes for an incredibly flexible and powerful testing system. But, I have also run into a lot of problems with it. It’s possible some of these issues are macOS-specific. But, as it stands, my experience with this system has been rough.

XCTMemoryMetric Issues

One of the features I like best is gathering data on multiple metrics within the same test run. This saves a ton of time, particularly when using UI Automation, which can be extremely slow. Since it’s so easy, I thought it would be really cool to gather memory usage during my tests as well. But, as far as I can tell, XCTMemoryMetric doesn’t work correctly when targeting an XCUIApplication instance. The memory usage reported is always zero.

At first, I suspected that perhaps this was due to how I was managing app state, so I started looking into manually starting and stopping the test. Unfortunately, this led to more problems.

Starting/Stopping measurements

A big limitation of the block-based measure API is how you setup your test. The block passed into measure is run multiple times. So, it is essential that the state of the system being tested be the same at the beginning of the block. The XCTest template document above uses XCUIApplication.launch. But, that method kind of cheats, because it will terminate and relaunch the app if it is already running. This kind of state reset is exactly what all testing code needs to do, I’ve just found it to be rarely so simple.

Pretty much all of my tests look like this:

// get into initial state
measure(metrics: [...]) {
    // perform an action

    // get back to that initial state...
}

This initial state restoration code must also be inside the measurement block. That means your measurements cover not just your action being tested, but also this state restoration code. This actually isn’t a problem for the XCTOSSignpostMetric, because the start and stop points are defined externally by your signposts. That’s one of the beauties of signpost-based testing. But, that’s not true for any of the other metrics.

It seems like there is a solution to this, in the form of XCTMeasureOptions.InvocationOptions. This API claims to be able to control the start and stop points of your test within the block. However, I’m not sure how to actually use it. XCTest has some preexisting APIs for controlling the old measure method. But, these don’t appear to work with this new stuff. When trying, I get an exception thrown within the XCTest machinery. This certainly seems like a bug, but could also be my misuse of the API - this area is not documented at all. However, I tried a variety of approaches, and was not able to get any to work.

Run Variation

The measure method includes the ability to control the number of iterations of your test - 5 by default. But, it has an interesting policy of doing a throw-away run first, for 6 total. I’m sure the designers of this API have found that to be a useful heuristic, but for my usage, it isn’t sufficient.

Xcode’s testing system allows you to define baselines for your performance tests, along with acceptable standard deviation. I’ve had a really hard time tweaking the standard deviations to both allow the tests to regularly pass and also not miss regressions. This translates into frequent, spurious test failures. That’s a real pain, especially if you are using CI.

XCTest results showing one very large outlier result — Outliers, nearly every time

My tests tend to be long-running and disk-intensive. So, it’s totally possible this is just a bad interaction between my app and other processes running on my computer at the time. Perhaps your experience will be different, but it’s been a huge problem for me.

I think it would be really awesome if XCTest made this policy configurable, so I could have more fine-grained control. For example, throwing out the best and worst result.

Well Worth the Issues

Ok, so yes, there are a lot of problems with this system. It is rough and poorly documented, but it can still be used effectively. And it is definitely worth it. Even if you never normally do automated testing, I bet you’ll still like that stock launch performance test.

For me, the biggest issue is the run to run variations. That makes it really tough to run these in a test suite automatically. I rarely get through an entire run without at least one failure due to an outlying result. This is a blocker for me running these tests automatically in CI. And that’s particularly painful because these kinds of tests can be extremely slow.

But, I still am wowed by the performance testing + UI Automation combination, months into my first try. I’ve used it to successfully optimize a number of complex interactions in Chime, as well as catch regressions I would never have noticed otherwise. XCTTest now offers an incredible performance testing system, and I’d recommend anyone doing performance work take a look at it right away.

Sun, Mar 15, 2020 - Matt Massicotte

Previous: Chime 1.0 is Available

Next: AWS Keyspaces