Flurry Tech Blog

Lyric: Linear Regression in Pure Javascript

Lyric is an Open Source implementation of Linear Regression in pure javascript, built on the great javascript matrix library Sylvester by James Coglan and the popular JQuery library. Feel free to use it to implement your own trending and prediction charts or read more below on how it's built and how to use it. You can download or fork it on Github (Lyric) in the Flurry Github account.

What is Linear Regression?

Given a dataset including a dependent and an explanatory variable, linear regression is a statistical technique used to create a model (equation) that best represents the relationship between the variables. This model can then be used to predict additional values that will follow the same pattern. It is a technique commonly used to add trend lines to charts of time series data and in Machine Learning to predict future values from a given training set of data.

For example, given the following set of noisy timeseries data:

It might be hard to tell from this sample if the values are increasing or decreasing over time. Applying linear regression can yield a trendline to make the pattern clear, such as the following (in green):

Typically, linear regression is implemented using an optimization approach such as Gradient Descent which starts with a rough approximation and improves the accuracy over a large number of iterations. While such an approach will optimize the model, it can be slow based on the number of iterations required. In some cases the problem can be greatly simplified and solved in closed form using a derivation called the Normal Equation.

Lyric uses the Normal Equation to make it fast and efficient, as it should work for most applications.

Using Lyric

First, make sure your data is represented in the form of a 2xN Array comprised of elements with an 'x' and 'y' value. The x value should be the explanatory and the y the dependent variables.

Then you need to have Lyric determine the best equation to represent this data. The equation is known as the model and you can build it using the following:

Now that you have your model, you will want to apply it to a set of inputs. The newInput should be a 1xN array containing only the explanatory variable values you would like to calculate the dependent values. This will result in a new 2xN array which will include the resulting series.

The following is a complete example which, given some values for the explanatory values 1 through 5, estimates the values of 6 through 8:

If you wanted to create a trend line, such as in the example above, you would simply apply the model to the same x values you provided in the input to build the model.

For more information on using Lyric (and more advanced options) please refer to the Lyric README.

How is Lyric implemented?

Lyric implements the normal equation using a series of matrix operations implemented by Sylvester. However, before the matrix operations can be applied, the input series x (explanatory values) must be modified to represent the degree of polynomial we want to use to fit the data. To do this, we simply take the vector of x values and create a matrix where each row is the values of x raised to a given power. For example, given the power = 3 (for a 3rd degree polynomial) the output O will be of the form:

O[0] = x^0 i.e. all ones

O[1] = x^1 i.e. the same as x

O[2] = x^2 i.e. all elements of x squared

O[3] = x^3 i.e. all elements of x cubed

If you are familiar with linear algebra, you'll recognize that this represents an equation of the form:

Ax^3+Bx^2+Cx+D

Once the input is in this form, the actual matrix operations are fairly simple, following the normal equation steps.

The resulting theta matrix is the values of the constants A, B, C and D from the above equation. Then, by multiplying future values of x by the same theta matrix we can predict y values.

Learn More

If you're like to learn more about Linear Regression, the Machine Learning class offered by Coursera reviews it in high detail (as well as many other machine learning topics).

Apache Avro at Flurry

Last week we released a major update to our ad serving infrastructure, which provides enhanced targeting and dynamic control of inventory. As part of this effort we decided it was time to move away from our custom binary protocol to Apache Avro for our data serialization needs. Avro provides a large feature set out of the box, however, we settled on the use of a subset of its functionality and then made a few adjustments to best suit our needs in mobile.

Why Avro?

There are many good alternatives for serialization frameworks and just as many good articles that compare them to one another across several dimensions. Given that our focus will be on what made Avro a good fit for us instead of our research into other frameworks.

Perhaps the most critical element was platform support. Even though we are currently only applying Avro for our ad serving through iOS and Android, we wanted an encoder that provides uniform support across all of our SDKs (Windows Phone 7, HTML5, Blackberry, and JavaME) in the event we extend its use to all services. We receive over 1.4 billion analytics sessions per day, so it was critical that we maintain a binary protocol that is small and efficient. While this binary protocol is a necessity, Avro also supports a JSON encoder. This has dual benefits for us as it can be used for our HTML5 SDK and as a testing mechanism (we’ll show later how easy it is to curl an endpoint). Lastly, Avro provides the rich data structures we need with records, enums, lists, and maps.

Our Subset

We are using Avro-C for our iOS SDK and Avro Java for our Android SDK and server. As mentioned earlier Avro has a large feature set. The first thing we had to decide was what components made sense for our environment. All language implementations of Avro support core, which includes Schema parsing from JSON and the ability to read/write binary data. The ability to write data files, compress information, and send it over embedded RPC/HTTP channels is not as uniform. Our SDKs have always stored information separately from the transport mechanism and likewise our server parses that data and stores in a format efficient for backend processing. Therefore, we did not have the need to write data files. In addition to this we have a sophisticated architecture for the reliable transport of data and using Avro’s embedded RPC mechanisms would affect our deployment and failover strategy. Given our environment we integrated only core from Avro.

Integrating core provided us a great deal of flexibility in how to use this efficient data serialization protocol. One of the diversions we made was in our treatment of schemas. Schemas are defined on the Avro site as:

Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing.

We maintain the invariant that the schema is always present when reading Avro data, however, unlike Avro’s RPC mechanism we never transport the schema with the data. While the Avro docs note this can be optimized to reduce the need to transport the schema in a handshake, such optimization could prove difficult and would likely not provide much added benefit for us. In mobile, we deal with many small requests from millions of devices that have transient connections. Fast, reliable transmissions with as little overhead as possible are our driving factors.

As an illustration of how this makes a large difference at scale, our AdRequest schema is roughly 1.5K while the binary-encoded datum is approximately 230 bytes on any request. At 50 million requests per day (we are actually well above that figure) this adds about 70GB of daily traffic across our band.

To meet our network goals we version our endpoints to correspond to the protocol. We feel it is more transparent and maintainable to have a one to one schema mapping instead of relying on Avro to resolve the differences between a varied reader and writer schema. The result is a single call without the additional communication for schema negotiation while transporting only datum that has no per-value overhead of field identifiers.

Mobile Matters

As a 3rd party library on mobile we had some additional areas we needed to address. First on Android, we needed to maintain support for Version 1.5+. Avro uses a couple of methods that were introduced in version 2.3, namely Arrays.copyOf() and String.getBytes(Charset). Those were easily changed to use System.arrayCopy and String.getBytes(String), but it did require us to build from source and repackage.

On iOS, we need to avoid naming conflicts in case someone else integrates Avro in their library. We are able to do this by stripping the symbols. See this guide from Apple on that process. On Android, we use ProGuard, which solves conflict issues through obfuscation. Alternatively, you can use the jarjar tool to change your package naming.

Sample Set Up

We’ve talked a lot about our Avro set up so now we’ll provide a sample of its use. These are just highlights, but we have made the full source available on our GitHub account at: https://github.com/flurry/avro-mobile. This repo includes a sample Java server, iOS client, and Android client.

1. Set up your schema. We use a protocol file (avpr) vs a schema file (avsc) simply because we can define all schemas for our protocol in a single file. The protocol file is typically used with Avro’s embedded RPC mechanisms, but it doesn’t have to be used exclusively in that manner. Here are our sample schemas outlining a simple AdRequest and AdResponse:

AdRequest:

AdResponse:

2. Pre-compile the schemas into Java objects with the avro-tools jar. We have included an ant build file that will perform this action for you.

3. Build our sample self contained Avro Jetty server. The build file puts all necessary libs in a single jar (flurry-avro-server-1.0.jar) for ease of running. Run this from the command line with one argument for the port:
java -jar flurry-avro-server-1.0.jar 8080

4. Test. One of the things that distinguishes Avro from other serialization protocols is its support for dynamic typing. Since no code generation is required, we can construct a simple curl to verify our server implements the protocol as expected. This has proven useful for us since we have an independent verification measure for any issue that occurs in communication between our mobile clients and servers. We simply construct the identical request in JSON, send to our servers via curl, and hope we can blame the server guys for the issue. This also allows anyone working on the server to simulate requests without needing to instantiate mobile clients.

Sample Curl:

Note: To simulate an error, type the AdSpace Name as “throwError”.

5. Setting up the iOS app takes a little more work than on the Java side simply because Avro-C does not support static compilation. The easiest way we have found to define the schemas is to go to the Java generated schema file and copy the JSON text from the SCHEMA$ definition:

Paste that text as a definition into your header file, then get the avro schema from the provided method avro_schema_from_json. From that point write/read from the fields in the schema using the provided Avro-C methods as done in AdNetworking.m.

In the sample app provided, run AvroClient > Sim or Device and send your ad request to your server (defaults to calling http://localhost:8080). To simulate an error, type the AdSpace Name as “throwError”.

6. Since Android is based on Java, moving from the Avro Java Server to Android is a simple process. The same avpr and buildProtocol.xml files can be used to generate the Java objects. Constructing objects are equally as easy as seen here:

Conclusion

We’re excited about the use of Apache Avro on our new Ad Serving platform. It provides consistency through structured schemas, but also allows for independent testing due to its support for dynamic typing and JSON. Our protocol is much easier to understand internally since the Avpr file is standard JSON and attaches data types to our fields. Once we understood the initial setup across all platforms it became very easy to iterate over new protocol versions. We would recommend anyone looking to simplify communication over multiple environments while maintaining high performance give Avro strong consideration.

Flurry APIs now support CORS

We recently upgraded all of our REST APIs to support Cross-Origin Resource Sharing (CORS). If you're not familiar with CORS, this means that you can now access Flurry APIs via Javascript on web pages other than flurry.com. Get started now or read on to learn more about CORS.

Why do we need CORS?

Modern Browsers implement something called a same origin policy when processing and rendering the data associated with a website. This policy says that the web page can only load resources that come from the same host (origin) as the web page itself. For example, if you load flurry.com your browser will only let the Javascript in the page load resources from flurry.com.

Why is a same origin policy important? Cross-site scripting attacks, which take advantage of cross-origin interactions, are one of the more common methods of personal information stealing these days. A basic cross-site scripting attack would show a user a webpage which looks just like the login page for your website. When they user enters their credentials into the malicious website, believing it to be yours, the malicious site takes the credentials and uses javascript to log them into your website using AJAX. After being logged in through javascript, they can steal data, manipulate the account and change the password - all using AJAX behind the scenes without the user being aware.

Such attacks appear to the attacked service as legitimate traffic since they originate from a normal computer browser - complete with the cookies you have set. By preventing access to resources not hosted on the origin, and hence preventing AJAX from reaching another host, the browser is protecting you from this kind of attack.

However, with the rise of HTML5, more and more web content is loaded dynamically through javascript and rendered in the browser. There are now very legitimate uses for cross-origin resource access in javascript, including widgets, applications and content management.

What is CORS?

CORS bridges the gap between security and flexibility by allowing a host to specify which resources are available from non-origin domains. This allows you to make REST APIs available for access from other domains in the browser, but not your login page.

Adding CORS support is as simple as adding an extra HTTP response header that specifies what origins can access a given resource. To allow any domain to access a resource, you would include the following HTTP header in responses to requests for that resource:

Access-Control-Allow-Origin: *

Or, to only allow access from Flurry's website domain you would use the following:

Access-Control-Allow-Origin: http://flurry.com

Note that since the CORS header is in the response of the HTTP request, the request has already been made before your browser evaluates whether to allow access to the result. It's important to keep that in mind since even if the browser detects a CORS violation, the request will have already been processed on your servers.

Not all browsers support CORS right now but most modern browsers do. You can read more on the CORS Wikipedia page.

The Delicate Art of Organizing Data in HBase

Here at Flurry we make extensive use of HBase, a distributed column-store database system built on top of Apache Hadoop and HDFS. For those used to relational database systems, HBase may seem quite limited in the ways you can query data. Rows in HBase can have unlimited columns, but can only be accessed by a single row key. This is kind of like declaring a primary key on a SQL table, except you can only query the table by this key. So if you wish to look up a row by more than one column, as in SQL where you would perform a select * from table where col1 = val1 and col2 = val2, you would have to read the entire table and then filter it. As you can probably tell, reading an entire table can be an extremely costly operation, especially on the data sizes that HBase usually handles. One of the solutions to this is to use a composite key, combining multiple pieces of data into a single block.

What’s a composite key?

HBase uses basic byte arrays to store its data, in both row keys and columns. However it would be tedious to try and manipulate those in our code, so we store our keys in a container class that knows how to serialize itself. For example:

Here we have two fields forming a single key with a method to serialize it to a byte array. Now any time we want to look up or save a row, we can use this class to do so.

While this allows us to instantly get a row if we know both the userId and applicationId, what if we want to look up multiple rows using just a subset of the fields in our row key? Because HBase stores rows sorted lexicographically by comparing the key byte arrays, we can perform a scan to effectively select on one of our index fields instead of requiring both. Using the above index as an example, if we wanted to get all rows with a specific userId:

Since we’ve set the start and stop rows of the Scanner, HBase can give us all the rows with the specified userId without having to read the whole table. However there is a limitation here. Since we write out the userId first and the applicationId second, our data is organized such that all rows with the same userId are adjacent to each other, and rows with the same applicationId but different userIds are not. Thus if we want to query by just the applicationId in this example, we need to scan the entire table.

There are two “gotchas” to this serialization approach that makes it work as we expect it. First, we assume here that userId is greater than 0. The binary representation of a negative 2’s Complement Integer would be lexicographically after the largest positive number. So if we intend to have negative userIds, we would need to change our serialization method to preserve ordering. Secondly, DataOutput.writeUTF specifically serializes a string by first writing a short (2 bytes) for its length, then all the characters in the string. Using this serialization method, the empty string is naturally first lexicographically. If it we serialized it using a different method, such that the empty string were not first, then our scan would stop somewhere in the next userId.

As you can see, just putting our index data in our row key is not enough. The serialization format of our components determines how well we can exploit the properties of HBase lexicographic key ordering, and the order of the fields in our serialization determines what queries we are able to make. Deciding what order to write our row key components in needs to be heuristically driven by the reads we need to perform quickly.

How to organize your key

The primary limitation of composite keys is that you can only query efficiently by known components of the composite key in the order they are serialized. Because of this limitation I find it easiest to think of your key like a funnel. Start with the piece of data you always need to partition on, and narrow it down to the more specific data that you don’t often need to distinguish. Using the above example, if we almost always partition your data by userId, putting it in our index first is a good idea. That way we can easily select all the applicationIds for a userId, and also select a specific applicationId for a userId when we need to. However, if we are often looking up data by the applicationId for all users, then we would probably want to put the applicationId first.

As a caveat to this process, keep in mind that HBase partitions its data across region servers based on the same lexicographic ordering that gets us the behavior we’re exploiting. If your reads/writes are heavily concentrated into a few values for the first (or first few) components of your key, you will end up with poorly distributed load across region servers. HBase functions best when the distribution of reads/writes is uniform across all potential row key values. While a perfectly uniform distribution might be impossible, this should still be a consideration when constructing a composite key.

When your key gets too complex

If your queries go beyond the simple funnel model, then it’s time to consider adding another table. Those used to heavily normalized relational databases will instinctively shy away from repeating data in multiple tables, but HBase is designed to handle large amounts of data, so we need to make use of that to overcome the limitations of a single index. Adding another table that stores the same cells with a differently organized key can reduce your need to perform full table scans, which are extremely expensive time-wise, at the cost of space, which is significantly cheaper if you’re running at Hadoop scale. Using the above example, if we routinely needed to query by just userId or applicationId, we would probably want to create a second table. Assuming we still want to be able to query by both userId and applicationId, the field we use as a key for the second table would depend on the distribution of the relationship of user to application and vice versa. If a user has more applications than an application has users, then that would mean scanning a composite key of (userId, applicationId) would take longer than scanning it in (applicationId, userId) order, and vice versa.

The downside to this approach is the added complexity of ensuring that the data in both of your tables is the same. You have to ensure that whenever you write data to one table, you write it to both tables simultaneously. It helps to have all of your HBase reading and writing encapsulated so that individual producers or consumers are not accessing the HBase client code directly, and the use of multiple tables to represent a single HBase-backed entity is opaque to its users.

If you’ve got a lot of rows

Sometimes storing large parts of your data in the key can be hazardous. Scanning multiple rows is usually more costly than reading a single row, even if that single row has many columns. So if performing a scan on the first part of your composite key often returns many rows, then you might be better off reorganizing your table to convert some of the parts of your composite key to column names. The tradeoff there is that HBase reads entire rows at a time, so while you can instruct the HBase client to only return certain columns, this will not speed up your queries. In order to do that, you need to use column families.

Getting it right the first time

Finally, always take extreme care to pick the right row key structure for your current and future querying needs. While most developers are probably used to throwing out most of the code they write, and refactoring many times as necessary for evolving business requirements, changing the structure of an HBase table is very costly, and should be avoided at all costs. Unlike with most relational databases, where adding a new index is a simple operation with a mild performance hit, changing an HBase table’s structure requires a full rewrite of all the data in the table. Operating on the scale that you’d normally use Hadoop and HBase for, this can be extremely time consuming. Furthermore, if you can’t afford your table to be offline during the process, you need to handle the migration process while still allowing reads from the old and new tables, and writes to the new table, all while maintaining data integrity.

HBase or not HBase

All of this can make using HBase intimidating to first time users. Don't be afraid! These techniques are common to most NoSQL systems which are the future of large scale data storage. Mastering this new world allows you to unlock a massively powerful system for large data sets and perform analysis never possible before. While you need to spend more time up front on your data schema, you gain the ability to easily work with Petabytes of data and tables containing hundreds of Billions of rows. That's big data. We'll be writing more about HBase in the future, let us know if there's something you'd like us to cover.

Here at Flurry, we constantly working to evolve our big data handling strategies. If that sounds interesting, we are hiring engineers! Please check out http://flurry.com/jobs for more information.

Scaling @ Flurry: Measure Twice, Plan Once

Working in web operations can be quite exciting when you get paged in the middle of the night to debug a problem, then burn away the night hours formulating a solution on the fly using only the resources you have at hand. It’s thrilling to make that sort of heroic save, but the business flows much more smoothly when you can prepare for the problems before they even exist. At Flurry we watch our systems closely, find patterns in traffic and systems’ behavioral changes, and generally put solutions in place before we encounter any capacity-related problems.

A good example of this sort of capacity planning took place late last year. During the Christmas season Flurry usually sees an increase in traffic as people fire up their new mobile phones, so we knew in advance of December 2011 that we’d need to accommodate a jump in traffic—but how much? Fortunately, we had retained bandwidth and session data from earlier years, so were able to estimate our maximum bandwidth draw based on the increases we experienced previously, and our estimate was within 5% of our actual maximum throughput. There are probably some variables we still need to account for in our model, but we were able to get sufficient resources in place to make it through the holiday season without any serious problems. Having worked in online retail, I can tell you that not getting paged over a holiday weekend is something wonderful.

Doubling of outbound bandwidth from Nov-Dec 2011, Dec 24th yellow

Outside of annual events, we also keep a constant eye on our daily peak traffic rates. For example, we watch bandwidth to ensure we aren’t hitting limits on any networking choke points, and requests-per-second is a valuable metric since it helps us determine scaling requirements like overall disk capacity (each request taking up a small amount of space in our data stores) and the processing throughput our systems can achieve overall. The overall throughput includes elements like networking devices (switches, routers, load balancers) or CPU time spent handling and analyzing the incoming data.

Other metrics we watch on a regular basis include disk utilization, number of incoming requests, latency for various operations (time to complete service requests, but also metrics like software deployment speed, mapreduce job runtimes, etc.), CPU, memory, and bandwidth utilization, as well as application metrics for services like MySQL, nginx, haproxy, and Flurry-specific application metrics. Taken altogether, these measurements allow us to gauge the overall health and trends of our systems’ traffic patterns, from which we can then extrapolate when certain pieces of the system might be nearing capacity limits.
Changes in traffic aren’t the only source of capacity flux, though—because the Flurry software is a continually-changing creature, Flurry operations regularly coordinates with the development teams regarding upcoming changes that might cause changes like increases in database connections, more time spent processing each incoming request, or other similar items. Working closely with our developers also allows Flurry to achieve operational improvements like bandwidth offloading by integrating content delivery networks into our data delivery mechanisms.
One area I think we could improve is in understanding what our various services are capable of when going all-out. We’ve done some one-off load tests to get an idea of upper limits for requests per second per server, and have used that research as a baseline for rough determinations of hardware requirements, but the changing nature of our software makes that a moving target. Getting more automated capacity tests would be handy in both planning hardware requirements and for potentially surfacing performance-impacting software changes.
Overall, though, I think we’ve done pretty well. Successful capacity planning doesn’t prevent every problem, but paying attention to our significant metrics allows us to grow our infrastructure to meet upcoming demand, saving our urgent-problem-solving resources for the truly emergent behavior instead of scrambling to keep up with predictable capacity issues.

Using the Java ClassLoader to write efficient, modular applications

Java is an established platform for building applications that run in a server context. Writing fast, efficient, and stable code requires effective use of algorithms, data structures, and memory management, all of which are well supported and documented by the Java developer community. However, some applications need to leverage a core feature of the JVM whose nuances are not as accessible: the Java ClassLoader.

How Does Class Loading Work?

When a class is first needed, either through access to a static field or method or a call to a constructor, the JVM attempts to load its Class instance from the ClassLoader that loaded the referencing Class instance (see note 1). ClassLoaders can be chained together hierarchically, and the default strategy is to delegate to the parent ClassLoader before attempting to load the class from the child. After being loaded, the Class instance is tracked by the ClassLoader that loaded it; it persists in memory for the life of the loader. Any attempt to load the same class again from the same loader or its children will produce the same Class instance; however, attempts to load the class from another ClassLoader (that does not delegate to the first one) can produce a second instance of a Class with the same fully qualified name. This has the potential to cause confusing ClassCastExceptions, as shown below.

Why Not Use The Default ClassLoader?

The default ClassLoader created by the Java launcher is usually sufficient for applications that are either short-lived or have a relatively small, static set of classes needed at runtime (see note 2). Applications with a large or dynamic set of dependencies can, over time, fill up the memory space allocated for Class instances – the so-called “permanent generation.” This manifests as OutOfMemory errors in the PermGen space. Increasing the PermGen allocation may work temporarily; leaking Class memory will eventually require a restart. Fortunately, there are ways to solve this problem.

Class Unloading: Managing The Not-So-Permanent Generation

The first step to using Class memory efficiently is a modular application design. This should be familiar to anyone who has investigated object memory leaks on the heap. With heap memory, object references must be partitioned so the garbage collector can free clusters of inter-related objects when they are no longer referenced from the rest of the application. Similarly, Class memory in the so-called “permanent” generation can also be reclaimed by the garbage collector when it finds clusters of inter-related Class instances with no incoming Class references (see note 3).

To demonstrate, let's consider two Java projects with one class each: a container, and a module (see note 4).

For extremely modular applications where no communication is required between the container and its modules, there is an apparently easy solution: load the module in a new ClassLoader, release references to the ClassLoader when the module is no longer needed, and let the garbage collector do its thing. The following test demonstrates this approach:

Success!

When Third Party Code Attacks

So you've been diligent about modularizing your own code, and each module runs in its own ClassLoader with no communication with the container. How could you have a Class memory leak? The answer could lie in third party code used by both the container and module:

Loading a module in its own ClassLoader is not enough to prevent Class memory leaks when using a ClassLoader with the default delegation strategy. In this case, the module's ResourceLibrary Class instance is the same as the Container's, so the ResourceLibrary's HashMap holds a reference to the module's Class instance – which references its ClassLoader, which references all other Class instances in the module. The following test demonstrates the problem, and a possible solution:

The result:

Although the test with the default ClassLoader fails due to a memory leak, the test with a stand-alone ClassLoader succeeds. Creating a ClassLoader with no parent forces it to load all Class instances itself – even for classes already loaded by another ClassLoader. The (leaky) ResourceLibrary Class instance referenced by the module is different from the one used by the container, so it gets garbage collected when its ClassLoader is released by the container – along with the rest of the module's Class instances. This fixes the Class memory leak; but what happens if you need some communication between the container and the module?

The stand-alone ClassLoader approach won't work now, because the IModule Class instance loaded in the container is different from the IModule Class instance loaded by the module. The result is a confusing ClassCastException when casting a Module to an IModule, as seen in this test:

The result:

This test introduces two new strategies for loading classes: post-delegation and conditional delegation. Post-delegation simply loads classes from the child first, then from the parent if not found. Unfortunately, that doesn't work if the module's classpath includes any of the classes shared with the container, as the test shows. This makes configuring the classpath a chore, especially if shared classes have dependencies on utility classes that both the container and module require. In this test, conditional delegation works best: it allows shared classes to be loaded from the parent ClassLoader, but sandboxes all other classes (like potentially leaky third-party code) to the child ClassLoader.

Conclusion

When using custom ClassLoaders in a Java application, there are many pitfalls to watch out for. Dealing with ClassCastExceptions and ClassNotFoundExceptions can be frustrating if the overall design is not well planned. Even worse, memory leaks and PermGen errors can be very difficult to reproduce and fix, especially if they only happen in long-running production systems. Although there's not one right answer to using custom ClassLoaders, the techniques here can address most of the issues one might encounter (see note 5).

Creating a clear design and strategy for loading classes is not easy, but the reward is efficient, stable, robust server code. The time spent planning is a small price for the time saved debugging, looking through memory dumps, and dealing with outages. Please check out my examples on GitHub, find other ways to leak PermGen space, and post comments!

Note 1: in this article, “class” refers to a Java type; the term “Class instance” refers to a runtime object that models a class and is an instance of the type java.lang.Class. The difference is important, as one class can have many Class instances at runtime, if the application uses multiple ClassLoaders.

Note 2: There are other reasons to use custom ClassLoaders. Some common uses are in applications that dynamically update their code without restarting, applications that load classes from non-standard sources like encrypted jars, and applications that need to sandbox untrusted 3^rd party code. This article focuses on a more general scenario, but the lessons are applicable to those cases as well.

Note 3: PermGen Garbage collection was a bit tricky in older versions of the JRE. By default it was never reclaimed (hence the name). Sun used to provide an alternate garbage collection strategy (concurrent mark-sweep) that could be configured to reclaim PermGen space; support from other vendors varied. However, in recent versions of Java, PermGen collection works quite well using the default configuration.

Note 4: the complete source code for these examples is posted in the public git repository here.

Note 5: for more in-depth analysis on how to find and fix PermGen leaks, see Frank Kieviet's excellent blog series on the topic: here, here, and here.

Modular Libraries in iOS

The Flurry SDK uses a dynamically-linked modular library design which breaks up features into different archives. This helps developers with large applications make sure that their application size is as small as possible by avoiding unnecessary libraries. We frequently get asked about how to implement dynamically-linked modular libraries in iOS so here is how it works.

Why modular libraries?

The Flurry SDK has grown considerably since its first release. New analytics and new advertising products were introduced with AppCircle and AppCircle Clips. Keeping developers in mind, we want to provide you with functionality necessary to accomplish your goals but not require you to carry the weight of all functionality if there's something that would go unused. For example, some developers use Analytics while some use Analytics and AppCircle. Others may use Analytics, AppCircle, and AppCircle Clips. The solution was to partition the existing library into smaller modular libraries that could be layered onto a core Analytics library.

Analytics is our core SDK library and it is the library responsible for communicating session data and ad data between our servers and apps. A request/response action takes place upon start and optionally upon pause or close of an app. Each request reports session data and the response contains a shared buffer filled with ad data which is relayed to each enabled ad library. Our servers currently receive 25,000 requests per second and growing, so this communication design decision (through a single communications point in the Analytics library) was made to keep traffic between our servers and apps efficient. This also creates asymmetric library dependencies since, for instance, Analytics can operate with or without AppCircle but AppCircle cannot operate without Analytics.

How modularization was accomplished

In AppCircle, the AppCircle library must be enabled.

This call above serves two purposes. First, it tells Analytics that AppCircle is being used so Analytics will request ad data on behalf of AppCircle. Second, it provides Analytics with a delegate, giving it access into the AppCircle library. The delegate is used as a callback to relay the ad data buffer into AppCircle to create the ads it provides through its API.

In Analytics, we maintain the delegate callback into AppCircle. The delegate is purposely declared an NSObject. This is set by AppCircle's setAppCircleEnabled method described above.

Then in Analytics, the delegate to AppCircle is tested and its callback is invoked with the ad data buffer as shown below. If AppCircle is not enabled then this collaboration with AppCircle is skipped altogether.

Declaring the delegate as NSObject and testing for the callback selector avoids any hard dependency to AppCircle, allowing Analytics to operate with or without AppCircle present. This pattern is also used for AppCircle Clips and future modular libraries in the works.

Modularizing our libraries significantly reduces the size of the Analytics library from a previous 6mb file to 1.7mb currently and allows layering any or all of our ad libraries on top of the Analytics library, so developers can optimize the amount Flurry library code they are linking into their applications. And because server communication remains the responsibility of Analytics during start, pause, or close of an app, our service can to continue to perform well as the traffic scales upward.

Flurry, by the numbers

Welcome to the Flurry Technology Blog!

We here at Flurry are lucky to be at the center of the mobile application revolution. Flurry Analytics is currently used by over 65,000 developers across over 170,000 mobile applications and tracks over 1.2 billion application sessions a day. Flurry AppCircle, our application recommendation network, currently reaches over 300 million users every month. We spend our days working with the developers of apps and games you probably use everyday.

Up until now we haven’t shared much about how we have built our platform to handle the vast scale and amazing rate of growth seen by our products and the mobile application ecosystem. Let’s start with some statistics about the platform:

> 26,000 requests per second
2 TB of new data processed per day
> 20 billion custom events tracked per day

Along the way we’ve had to find solutions to problems that many companies that are growing quickly have to face. This blog is our way of sharing what we’ve learned and opening a conversation with you about how to build Big Data services and mobile applications.

This blog will be technical, focusing on optimizing linux servers, efficiently writing Map Reduce jobs and advanced areas of mobile platforms like iOS. If you’re interested in learning more about the mobile application market in general, please visit the Flurry Blog where we present market research and examine trends across the entire network of Flurry products.

We’ll be writing regular blog posts in the coming months. If there is something that you’d like to learn about or something that you think we can do better just let us know by posting here in the comments.

Enjoy.