Crash Reporting with MetricKit, Part 2

Building real, flexible crash reporting systems around MetricKit

When I originally wrote about MetricKit’s crash diagnostic capabilities, I didn’t plan on it turning into a multi-part series. But, that was really just a first look, all based on a beta of iOS 14. Since then, iOS 14 has shipped, and we now know what the final implementation looks like. I’ve also seen quite a bit of interest in using MetricKit to replace 3rd-party crash reporters. Seems like a good time to follow up.

Soooo, let’s take a look at building a real crash reporter with MetricKit!

Limitations

The final iOS 14 MetricKit API is very similar to what was delivered in the betas. There’s no API for MXCallStackTree. Its nested frame structure is… not to my taste. And, there’s still an awkward JSON-only system for encoding the data. It’s workable, but honestly it’s a real pain to actually use. I’ve updated Meter to include many more wrapper types for both parsing and interacting with MXDiagnosticPayload data, which makes on-device processing possible.

The big problem is the lack of details about uncaught NSException objects. I was really disappointed to discover this was missing from the betas, and that remains the case. This is a major oversight in my opinion. I believe that Apple’s crash reporting system for iOS strips out NSException name/message, and I have a feeling this policy may have been responsible. I’m also not entirely sure the throw-site exception stack trace, which can be critical, fits nicely into the MXCallStackTree abstraction. I’m hopeful this improves in future OSes, but right now I suspect this will be a major barrier to adoption for many developers.

There are also a few other minor things, like the lack of thread/queue names. Luckily, I believe the diagnostic JSON structure could accommodate these easily. Overall, the system is usable, but not particularly pleasant.

With all my complaining out of the way, let’s get to the good stuff.

Building Blocks

There are actually a number of components required to put together a real crash reporter. At a minimum, you’ll need to:

MetricKit’s diagnostic facilities really only give us a solution for capturing data, and that’s only if you are running iOS 14+. But, what about backwards compatibility for earlier OSes? Or, platforms that don’t yet support MetricKit? Well, this is one of the other problems Meter aims to address.

Capturing Crashes

Among other features, Meter includes an MXMetricManager stand-in. This class presents a compatible API, but that works across Apple’s platforms. This is a great way to write code against MetricKit’s API, but have a graceful fallback for when it isn’t supported. Once your minimum OS is 14, you can just rip out Meter and use MetricKit directly.

Even with this consistent interface, we still have a big problem. MetricKit supplies the actual crash data. On earlier OSes/unsupported platforms, we still need a way of capturing crashes. As it turns out, we happen to have another project that fits the bill: Impact. Impact is specifically built as a crash capturing system. It just records details about crash events, and does nothing else. It was intended to be used as a component of a larger crash reporter, and that’s exactly what we need here.

To hook them together, we’re releasing ImpactMeterAdapter. This library sets up Impact, translates its crash data, and delivers it via Meter as an MXCrashDiagnostic-compatible type. It still needs a little tweaking, but it works. Backwards-compatible and cross-platform crash reporting when MetricKit’s facility isn’t available, and real MetricKit data when it is! 😎

Because of its open-ended nature, other adapters could be created for Meter. If you have a diagnostics library you like, but are intersted in offering a MetricKit upgrade path, get in touch! I’d be happy to help out.

Transmitting Data

All this work isn’t too useful if you aren’t able to actually get at these diagnostic reports. So, another important component of a crash reporting system is the data management and transmission. Networking is an incredibly well-covered topic, and most apps have extensive networking systems. But, just in case you want a simple, drop-in solution, we whipped one up: Wells.

Wells is all about transmitting data back to a server via an HTTP request. It uses NSURLSession background requests, and takes care of the file-management challenges associated with those operations. Similar to Impact, Wells makes no assumptions about the nature of the data it is transmitting. So, it is equally useful for reporting Impact logs directly, MetricKit data, or even a custom diagnostics data format you use. Regardless of the data type, it should be able to do it.

What About the Server?

So why are we bothering with all this if we don’t have a place to actually send all this data? Well, that’s a good question. But, before you get too excited, no, we don’t have a server-side system for you. I know this is a critical piece of the puzzle, and I have heard of a number of people that use a custom system today. And, who knows, maybe someone is working on this 😀 If you are, please let us know, I’d love to link to it.

Example Uses

Let’s get down to some actual examples of how all this might fit together.

Say you’re all-in on MetricKit, no backwards compatibility required. Wells might be helpful.

import Foundation
import MetricKit
import Wells

class MetricKitOnlyReporter: NSObject {
    private let reporter: WellsReporter
    private let endpoint = URL(string: "https://mydiagnosticservice.com")!

    override init() {
        self.reporter = WellsReporter()

        super.init()

        MXMetricManager.shared.add(self)
    }

    private func submitData(_ data: Data) {
        var request = URLRequest(url: endpoint)

        request.httpMethod = "PUT"

        // ok, yes, I have glossed over error handling
        try? reporter.submit(data, uploadRequest: request)
    }
}

extension MetricKitOnlyReporter: MXMetricManagerSubscriber {
    func didReceive(_ payloads: [MXMetricPayload]) {
    }

    func didReceive(_ payloads: [MXDiagnosticPayload]) {
        payloads.map({ $0.jsonRepresentation() }).forEach({ submitData($0) })
    }
}

Or, perhaps you want MetricKit reporting when available, but still want crash reports from older OSes or other platforms. ImpactMeterAdapter is built to do exactly this.

import Foundation
import Wells
import ImpactMeterAdapter

class MetricKitWithFallbackReporter: NSObject {
    private let reporter: WellsReporter
    private let endpoint = URL(string: "https://mydiagnosticservice.com")!

    override init() {
        self.reporter = WellsReporter()

        super.init()

        MeterPayloadManager.shared.add(self)

        // Configure Impact here, if needed
        ImpactMeterDiagnosticProvider.shared.start()
    }

    private func submitData(_ data: Data) {
        var request = URLRequest(url: endpoint)

        request.httpMethod = "PUT"

        // ok, yes, I have again glossed over error handling
        try? reporter.submit(data, uploadRequest: request)
    }
}

extension MetricKitWithFallbackReporter: MeterPayloadSubscriber {
    func didReceive(_ payloads: [DiagnosticPayloadProtocol]) {
        // MXCrashDiagnostics when available, emulated wrappers when not
        payloads.map({ $0.jsonRepresentation() }).forEach({ submitData($0) })
    }
}

Symbolication

One critical component of crash reporting is symbolication. I have made a little bit of progress on this, but I still don’t have much to report here. For now, you’ll have to make due with a custom system, possibly based around atos.

Remember that you need access to the same binaries loaded into memory at the time of crash to perform symbolication. You know that massive directory that Xcode builds up in ~/Library/Developer/Xcode/iOS DeviceSupport/? That directory actually contains all the binaries you need for a given OS/architecture. So, while not necessarily a scalable, or even convenient solution, it is one way to get access to the Apple symbols you might need. Someone’s also put together a massive repo with many (all?) iOS versions.

This is still a big challenge for home-grown crash reporting. PRs welcome 😅

Shipping Something Real

Obviously, I’m excited about MetricKit’s crash reporting capabilities. And, I haven’t even touched on the non-crash diagnostics, which also look incredibly helpful. But, there are serious downsides. The symbolication requirement, lack of NSException support, and latency/availability associated with payload deliveries all hurt. It makes for a very limited system when compared to 3rd-party alternatives. Still, removing large, complex dependencies is likely very tempting. And, there are a number of termination types that are simply out of reach of anything but a first-party reporter. There are definitely trade-offs, but I can certainly see someone deciding to ship a MetricKit-based crash reporter.

Hopefully, the libraries I present here make that easier for you. I tend to favor smaller, more composable modules over larger systems, and I know that shows. If you’re the kind of person that avoids, or even shuns dependencies altogether, I get it. But, the payoff is the ability to really customize crash reporting behavior within your app. You want a transparent upgrade path to MetricKit for macOS? Done. You want an ultralight MetricKit-only system? No problem. I have a feeling this approach might appeal to those that are considering a fully-custom reporter in the first place.

If any of this is useful to you, or you actually use MetricKit crash reporting, I’d really love to hear from you!

References:

Mon, Oct 12, 2020 - Matt Massicotte

Previous: MetricKit Crash Reporting