Last week we released a major update to our ad serving infrastructure, which provides enhanced targeting and dynamic control of inventory. As part of this effort we decided it was time to move away from our custom binary protocol to Apache Avro for our data serialization needs. Avro provides a large feature set out of the box, however, we settled on the use of a subset of its functionality and then made a few adjustments to best suit our needs in mobile.
Why Avro?
There are many good alternatives for serialization frameworks and just as many good articles that compare them to one another across several dimensions. Given that our focus will be on what made Avro a good fit for us instead of our research into other frameworks.
Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing.
We maintain the invariant that the schema is always present when reading Avro data, however, unlike Avro’s RPC mechanism we never transport the schema with the data. While the Avro docs note this can be optimized to reduce the need to transport the schema in a handshake, such optimization could prove difficult and would likely not provide much added benefit for us. In mobile, we deal with many small requests from millions of devices that have transient connections. Fast, reliable transmissions with as little overhead as possible are our driving factors.
As an illustration of how this makes a large difference at scale, our AdRequest schema is roughly 1.5K while the binary-encoded datum is approximately 230 bytes on any request. At 50 million requests per day (we are actually well above that figure) this adds about 70GB of daily traffic across our band.To meet our network goals we version our endpoints to correspond to the protocol. We feel it is more transparent and maintainable to have a one to one schema mapping instead of relying on Avro to resolve the differences between a varied reader and writer schema. The result is a single call without the additional communication for schema negotiation while transporting only datum that has no per-value overhead of field identifiers.Mobile MattersAs a 3rd party library on mobile we had some additional areas we needed to address. First on Android, we needed to maintain support for Version 1.5+. Avro uses a couple of methods that were introduced in version 2.3, namely Arrays.copyOf() and String.getBytes(Charset). Those were easily changed to use System.arrayCopy and String.getBytes(String), but it did require us to build from source and repackage. On iOS, we need to avoid naming conflicts in case someone else integrates Avro in their library. We are able to do this by stripping the symbols. See this guide from Apple on that process. On Android, we use ProGuard, which solves conflict issues through obfuscation. Alternatively, you can use the jarjar tool to change your package naming.Sample Set UpWe’ve talked a lot about our Avro set up so now we’ll provide a sample of its use. These are just highlights, but we have made the full source available on our GitHub account at: https://github.com/flurry/avro-mobile. This repo includes a sample Java server, iOS client, and Android client.1. Set up your schema. We use a protocol file (avpr) vs a schema file (avsc) simply because we can define all schemas for our protocol in a single file. The protocol file is typically used with Avro’s embedded RPC mechanisms, but it doesn’t have to be used exclusively in that manner. Here are our sample schemas outlining a simple AdRequest and AdResponse:AdRequest:AdResponse:2. Pre-compile the schemas into Java objects with the avro-tools jar. We have included an ant build file that will perform this action for you.3. Build our sample self contained Avro Jetty server. The build file puts all necessary libs in a single jar (flurry-avro-server-1.0.jar) for ease of running. Run this from the command line with one argument for the port:
java -jar flurry-avro-server-1.0.jar 80804. Test. One of the things that distinguishes Avro from other serialization protocols is its support for dynamic typing. Since no code generation is required, we can construct a simple curl to verify our server implements the protocol as expected. This has proven useful for us since we have an independent verification measure for any issue that occurs in communication between our mobile clients and servers. We simply construct the identical request in JSON, send to our servers via curl, and hope we can blame the server guys for the issue. This also allows anyone working on the server to simulate requests without needing to instantiate mobile clients.Sample Curl:Note: To simulate an error, type the AdSpace Name as “throwError”.5. Setting up the iOS app takes a little more work than on the Java side simply because Avro-C does not support static compilation. The easiest way we have found to define the schemas is to go to the Java generated schema file and copy the JSON text from the SCHEMA$ definition:
Paste that text as a definition into your header file, then get the avro schema from the provided method avro_schema_from_json. From that point write/read from the fields in the schema using the provided Avro-C methods as done in AdNetworking.m.
In the sample app provided, run AvroClient > Sim or Device and send your ad request to your server (defaults to calling http://localhost:8080). To simulate an error, type the AdSpace Name as “throwError”.6. Since Android is based on Java, moving from the Avro Java Server to Android is a simple process. The same avpr and buildProtocol.xml files can be used to generate the Java objects. Constructing objects are equally as easy as seen here:ConclusionWe’re excited about the use of Apache Avro on our new Ad Serving platform. It provides consistency through structured schemas, but also allows for independent testing due to its support for dynamic typing and JSON. Our protocol is much easier to understand internally since the Avpr file is standard JSON and attaches data types to our fields. Once we understood the initial setup across all platforms it became very easy to iterate over new protocol versions. We would recommend anyone looking to simplify communication over multiple environments while maintaining high performance give Avro strong consideration.