Using the Java ClassLoader to write efficient, modular applications

Java is an established platform for building applications that run in a server context. Writing fast, efficient, and stable code requires effective use of algorithms, data structures, and memory management, all of which are well supported and documented by the Java developer community. However, some applications need to leverage a core feature of the JVM whose nuances are not as accessible: the Java ClassLoader.

How Does Class Loading Work?

When a class is first needed, either through access to a static field or method or a call to a constructor, the JVM attempts to load its Class instance from the ClassLoader that loaded the referencing Class instance (see note 1). ClassLoaders can be chained together hierarchically, and the default strategy is to delegate to the parent ClassLoader before attempting to load the class from the child. After being loaded, the Class instance is tracked by the ClassLoader that loaded it; it persists in memory for the life of the loader. Any attempt to load the same class again from the same loader or its children will produce the same Class instance; however, attempts to load the class from another ClassLoader (that does not delegate to the first one) can produce a second instance of a Class with the same fully qualified name. This has the potential to cause confusing ClassCastExceptions, as shown below.

Why Not Use The Default ClassLoader?

The default ClassLoader created by the Java launcher is usually sufficient for applications that are either short-lived or have a relatively small, static set of classes needed at runtime (see note 2). Applications with a large or dynamic set of dependencies can, over time, fill up the memory space allocated for Class instances – the so-called “permanent generation.” This manifests as OutOfMemory errors in the PermGen space. Increasing the PermGen allocation may work temporarily; leaking Class memory will eventually require a restart. Fortunately, there are ways to solve this problem.

Class Unloading: Managing The Not-So-Permanent Generation

The first step to using Class memory efficiently is a modular application design. This should be familiar to anyone who has investigated object memory leaks on the heap. With heap memory, object references must be partitioned so the garbage collector can free clusters of inter-related objects when they are no longer referenced from the rest of the application. Similarly, Class memory in the so-called “permanent” generation can also be reclaimed by the garbage collector when it finds clusters of inter-related Class instances with no incoming Class references (see note 3).

To demonstrate, let's consider two Java projects with one class each: a container, and a module (see note 4).

For extremely modular applications where no communication is required between the container and its modules, there is an apparently easy solution: load the module in a new ClassLoader, release references to the ClassLoader when the module is no longer needed, and let the garbage collector do its thing. The following test demonstrates this approach:

Success!

When Third Party Code Attacks

So you've been diligent about modularizing your own code, and each module runs in its own ClassLoader with no communication with the container. How could you have a Class memory leak? The answer could lie in third party code used by both the container and module:

Loading a module in its own ClassLoader is not enough to prevent Class memory leaks when using a ClassLoader with the default delegation strategy. In this case, the module's ResourceLibrary Class instance is the same as the Container's, so the ResourceLibrary's HashMap holds a reference to the module's Class instance – which references its ClassLoader, which references all other Class instances in the module. The following test demonstrates the problem, and a possible solution:

The result:

Although the test with the default ClassLoader fails due to a memory leak, the test with a stand-alone ClassLoader succeeds. Creating a ClassLoader with no parent forces it to load all Class instances itself – even for classes already loaded by another ClassLoader. The (leaky) ResourceLibrary Class instance referenced by the module is different from the one used by the container, so it gets garbage collected when its ClassLoader is released by the container – along with the rest of the module's Class instances. This fixes the Class memory leak; but what happens if you need some communication between the container and the module?

The stand-alone ClassLoader approach won't work now, because the IModule Class instance loaded in the container is different from the IModule Class instance loaded by the module. The result is a confusing ClassCastException when casting a Module to an IModule, as seen in this test:

The result:

This test introduces two new strategies for loading classes: post-delegation and conditional delegation. Post-delegation simply loads classes from the child first, then from the parent if not found. Unfortunately, that doesn't work if the module's classpath includes any of the classes shared with the container, as the test shows. This makes configuring the classpath a chore, especially if shared classes have dependencies on utility classes that both the container and module require. In this test, conditional delegation works best: it allows shared classes to be loaded from the parent ClassLoader, but sandboxes all other classes (like potentially leaky third-party code) to the child ClassLoader.

Conclusion

When using custom ClassLoaders in a Java application, there are many pitfalls to watch out for.  Dealing with ClassCastExceptions and ClassNotFoundExceptions can be frustrating if the overall design is not well planned.  Even worse, memory leaks and PermGen errors can be very difficult to reproduce and fix, especially if they only happen in long-running production systems.  Although there's not one right answer to using custom ClassLoaders, the techniques here can address most of the issues one might encounter (see note 5).

Creating a clear design and strategy for loading classes is not easy, but the reward is efficient, stable, robust server code.  The time spent planning is a small price for the time saved debugging, looking through memory dumps, and dealing with outages.  Please check out my examples on GitHub, find other ways to leak PermGen space, and post comments!

Note 1: in this article, “class” refers to a Java type; the term “Class instance” refers to a runtime object that models a class and is an instance of the type java.lang.Class. The difference is important, as one class can have many Class instances at runtime, if the application uses multiple ClassLoaders.

Note 2: There are other reasons to use custom ClassLoaders. Some common uses are in applications that dynamically update their code without restarting, applications that load classes from non-standard sources like encrypted jars, and applications that need to sandbox untrusted 3rd party code. This article focuses on a more general scenario, but the lessons are applicable to those cases as well.

Note 3: PermGen Garbage collection was a bit tricky in older versions of the JRE. By default it was never reclaimed (hence the name). Sun used to provide an alternate garbage collection strategy (concurrent mark-sweep) that could be configured to reclaim PermGen space; support from other vendors varied. However, in recent versions of Java, PermGen collection works quite well using the default configuration.

Note 4: the complete source code for these examples is posted in the public git repository here.

Note 5: for more in-depth analysis on how to find and fix PermGen leaks, see Frank Kieviet's excellent blog series on the topic: here, here, and here.