Clojure javaagent

tl;dr - download a minimum working example of a dynamically loadable javaagent in Clojure here.

While developing a memory-intensive Clojure application I thought using Instrumentation.getObjectSize might be helpful in tracking down objects with high memory consumption1. The JVM exposes an instrumentation service that allows us to inspect the bytecode of a running JVM. Access to this class is only provided through the use of a special type of jar file called a javaagent. Getting a javaagent to work, unsurprisingly, is no straight-forward task2. As with most things in Clojure this process is both easier and harder than the analogous task in Java, due to Clojure being a very dynamic and very poorly documented language, respectively.

Generating the javaagent

The JVM, in its infinite staticness, will only load a javaagent from a jar file. While this is relatively unpleasant from the typical Cider-fueled REPLy goodness we’ve come to expect from the Clojure environment, the javaagent can be written to export its Instrumentation object, so that its dynamically accessible anywhere, and the build process can be largely automated with Leiningen.

The code of the javaagent can be relatively simple, since we’ll be using the instrumentation interactively. When working from a REPL, you basically need the javaagent to provide a reference to a Instrumentation object, and the rest can be done on the fly.

In file called, say, instrumentation.clj, do something like:

(ns clojure-javaagent.instrumentation
  (:import [java.lang.instrument Instrumentation])
  (:gen-class
   :methods [^:static [agentmain [String java.lang.instrument.Instrumentation] void]]))

(defn -agentmain [^String args ^Instrumentation instrumentation]
  (println "Loading agentmain")

; Give the Instrumentation object a binding for access from other namespaces
(def instrumentation instrumentation))

The gen-class up top tells Clojure to make a fully fledged Java class out of the functions in this namespace. The singular :method declaration must conform to the signature required by the instrumentation spec. And lastly, we define the -agentmain method3 to (a) tell us when it’s starting – helpful for debugging – and (b) create an externally accessible reference to the Instrumentation object that’s been passed to us.

All that’s left in creating the javaagent is to get Leiningen to create a separate jar out of this code. We can add a separate profile to the project’s project.jar, like so:

:profiles {:instrumentation
           {:main clojure-javaagent.instrumentation
            :aot [clojure-javaagent.instrumentation]
            ; Note the '_' instead of '-' since we're describing a java package
            :manifest {"Agent-Class" "clojure_javaagent.instrumentation"
            :uberjar-name "instrumentation.jar"}}}

There are a couple important aspects to this profile. Firstly it defines in which namespace our instrumentation lives, and tells Leiningen to compile it ahead of time (aot). Then it adds some secret sauce to tell the JVM which (generated) Java class will contain the magic agentmain method needed to start the javaagent. Lastly it gives a name to the jar we’ll be creating, which we’ll have to reference later.

Loading the javaagent at runtime

After creating our javaagent, we must then load it at runtime4. This process involves getting a path to the javaagent jar, and handle to the current JVM, which in turn requires knowing the process id (pid) of its executing job. Getting a reference to the running JVM is facilitated by Sun’s tools.jar which can added to the classpath the hard way5, or using a 3rd-party lein plugin :plugins [[lein-jdk-tools "0.1.1"]]

(ns clojure-javaagent.core
  (:import [com.sun.tools.attach VirtualMachine]
           [java.lang.management ManagementFactory]))

(defn pid []
  "PID of the running JVM"
  (re-find #"\d+" (.getName (ManagementFactory/getRuntimeMXBean))))

(defn load-instrumentation
  "Load a javaagent from a jar file and attach it to this process"
  [jar-path]
  (.loadAgent (VirtualMachine/attach (pid)) jar-path))

(let [project-path   (.getCanonicalPath (clojure.java.io/file "."))
      javaagent-path (str project-path "/target/instrumentation.jar")]

; Attach the instrumentation javaagent to this process
(load-instrumentation javaagent-path))

At this point, we’ve successfully loaded the javaagent. We should’ve seen our startup message "Loading agentmain" print in the REPL. All that’s left is to use the instrumentation for whatever it was we wanted it for.

Using the Instrumentation object

Back in our instrumentation.clj file, we defined a var with the Instrumentation we received. This var is now available to other namespaces in our project.

; Load the instrumentation namespace to get access to its vars
(require '(clojure-javaagent [instrumentation :as inst]))

; Find the size of an object
(println "The size of a String is"
         (.getObjectSize inst/instrumentation "Hello World")
         "bytes"))

And here’s where the heartbreak hits. The code outputs:

"The size of a String is 24 bytes"

Despite its wonderfully promising name, Instrument.getObjectSize only returns the memory footprint of the exact object passed to it, and none of the objects contained within. Every string will have the same size, as will every array, etc. In order for this method to be truly useful in debugging memory consumption, it’d need to recurse over every element stored in every object/datastructure. And this sounds like a job for an external library.

To play with a fully working example, checkout this github repo.

Footnotes

[1] It wasn’t. But, Caliper’s ObjectGraphMeasurer was.

[2] It took me so many hours to figure out the magic incantations that (a) I decided I needed to write a blog post on it, and (b) I’m too embarrassed to say exactly how long I wasted in the process.

[3] The leading - is used because gen-class by default looks for functions prefixed by a single dash to turn into methods on the defined class.

[4] Alternatively, you can define a different method premain, and load the jar upon JVM boot. This method is a bit less spontaneous, however, since you can’t just spit out a reference to an Instrument object inside your REPL as soon as you realize you need one.

[5] I know there is a hard way, but I don’t know how it’s done.