Thursday, November 29, 2012

Packed Objects in Java

Recently I spoke at EclipseCon Europe on some ideas for improving Java, citing the industry need for rapid innovation, and the business drivers which impact these priorities. Given the talk was co-sponsored by the OSGi community event, I focused specifically on how modularity can help drive improved performance, better software engineering discipline and some ideas on evolving Java and maintain compatibility by using versions to do more than just describe what’s in the box, essentially using versions to guide API evolution via JVM assisted mediation.

One of the charts I showed had a long “laundry list” of improvements – each topic a whole talk on its own which nicely triggered many fun technical discussions at the bar with various techies on their favorite wish list item.

For the low level types, I had listed a better Foreign Function Interface, better external memory access, packed structures for efficient memory use and I want these to be first class Java enhancements so the JDK libraries can benefit as well as end user applications.

And then I noticed this.

It’s not the first time I’ve seen unsafe hackery used for performance and while it’s fun to see the abuse, and even though well done, it will be sensitive to processor architecture, memory and other hardware dependencies and the code will  even perform differently between releases of the same architectures. Maybe ok if you don’t care about portability and have the deep technical skills to deal with fragile code but not a great solution for wider adoption especially when the security code-review people find you (and they will).

The J9 VM team has been working on a generalized solution for more flexible data access which provides the user with more control over how objects will be layed out. You can find a high level overview from Marcel Mitran at  The goal is to improve density of data structures in Java, improve access to data not in Java heap format (usually external structures) and be able to define new data types beyond what Java’s object model allows today including value types if you also include read-only capabilities.

The main driver here has been our customers and their demanding applications,  most recently high performance big data analytics, virtualization, the need to support higher density cloud deployment and lightweight data structures needed to "pack" data to delivery better performance. Martins blog post triggers more discussions, specifically on the topic of reducing Java overhead.

While our "packed object" is somewhat related to the notion of "value types" (see JEP 169) there are some important differences that I will touch on as well. John Rose's blog post does an excellent job of introducing the implications and the advantages of having value types.

So, why have packed objects with more explicit control over layout ?

In Java, any non-primitive instance field/array element is a reference, which in a JVM implementation, implies a pointer to an object that has a header. Apart from the space overhead associated with the reference and the header, the resulting "pointer chasing" incurs data cache misses (e.g. in something as common as object serialization). In addition, as the layout of the Java objects is completely abstracted away from the application code, Java is inherently challenged when required to inter-operate with other 'native' data structures. Java typically requires copying and marshalling/boxing of native data structures into the Java heap before they can be manipulated. This can be cumbersome/awkward from an application perspective while also incurring significant overhead.

Value types can solve some of these issues (like packing new value types like complex), though it does not allow optimization of the "direct" native memory access scenario and my view is if you separate concerns between layout, packing and read-only attributes for objects, you can get all the functionality mentioned above, including the value type use case.

The solution to these problems involve giving the Java programmer the capability to specify how instances of a class should be treated by the JVM, and enable specific types of operations.

  1. Object inlining : Given a "packed" class A and an instance field in the class (say A.f) declared to be of "packed" class B, this would give the user the ability to specify the JVM allocate space corresponding to an instance of B (for A.f specifically) along with the space necessary for an instance of  A. This essentially gets rid of the header for B and co-locates the data access into a single object which improves performance.  For tightly coupled classes, where the two objects are always allocated together, this can be used for high performance data structures. 
  2. Object splitting : Given an instance of a "packed" class, this would give the user the ability to separate the "data" part of the instance from the "header" part where necessary, essentially allowing a user to describe external data via a Java definition. 
To benefit, the Java programmer would need to declare a class to be "packed" in their Java code thereby specifying that he would like the ability to "inline" an instance of that class into some other packed class instance (and/or to have other packed classes inlined into it).  For Arrays of packed objects, the header overhead is gone, which enables low overhead arrays of arbitrary types of objects,.

A user could also explicitly specify when they do not want an instance field to be inlined. Intuitively, object inlining allows data to be packed into an object (improving data locality) whereas object splitting allows an object to refer to its data even when it is not located immediately following the object's header (as it traditionally does in most JVMs); once the data is separated from the header it can really reside anywhere, including in native memory or inlined inside some other object. These two capabilities are thus key enablers for allowing "nested" packed fields (e.g. A.f in our earlier example). Because such nested fields have only their data inlined into the containing object, the JVM would need to (completely transparently to the user) create a packed object header (that points to the nested field's data) any time the nested packed field is read in the bytecodes. A reference to the packed object header still "looks" like a normal reference except that the data in the object is separated from the header and the data could be located anywhere in the address space.

We view this approach to be an intermediate step on the evolutionary path to introducing full blown value types in the Java language; yet it is successful at solving some important performance problems. One key difference from full blown value types is that the "packing" is only for data accessed via "nested" fields (namely, inlined instance fields or array elements) in objects on the Java heap; this should not be confused with the fact that the "data" itself can reside anywhere in the address space. In particular, declaring a local variable or parameter to be of a packed class type does not necessarily result in the data being "packed" on the stack. Local variables and parameters are "references" regardless of their declared type. This also means that no change in behaviour is required at the astore/aload, getstatic/putstatic and call/return bytecodes in the JVM. Immutability is another property that is typically implied when one discusses traditional value types; however, immutability is orthogonal to the solution, and would prevent some use cases such as read/write native memory structures. In allowing packed classes to optionally have an immutability property, we would enable evolution to value types. As immutability is not mandatory, common operations on traditional value types like equals, hashCode based solely on the field values is not required and can be supplied by the user if desired.

In this approach, a "packed object" lies somewhere in between a traditional object and a value type instance in its properties; since packed object references are synthesized transparently by the JVM, it is also free to share such references (or not) as appropriate. This means that the semantics of common operations on traditional object references like identity, hashCode, synchronization etc. are not well defined for packed object references because there may not be a reference pointer to each and every inlined instance and (there may be sensible ways to implement these methods in terms of the address where the data resides, rather than depending on the short lived synthesized references).

Note that this approach is different when compared with C#'s value types which overloads the meaning of the assignment operator to implicitly perform copy-by-reference, copy-by-value, or boxing. The latter does not align well with the evolution of Java which has steered away from operator overloading; this proposal provides consistent definition of the '=' operator as assignment-by-reference. It is suggested that assignment-by-value semantics should be made explicit through the introduction of new operators such as ':=' if this is accepted as a Java language change. Other necessary language changes might be around packed array allocations and accesses to explicitly specify to the JVM when a packed array is in use and avoid confusion with traditional (i.e. reference) arrays.

Feedback is most welcome.

Friday, January 28, 2011


Yes, IBM joined OpenJDK and it was big news ...last fall... .

Why then no blog entry from yours truly ? Umm, I've been busy! You know, working on a seamless transition, ensuring IBM continues to deliver the best performing, highest quality, Java runtimes on the planet :)

As Mark just announced we are working on the OpenJDK community bylaws - a draft soon to be posted. The goal is to create an environment that enables an open vibrant OpenJDK community. It's a project IBM is joining so we spent time understanding how the project works today and tried to build on the existing model. Any changes were evaluated in the spirit of ensuring an open, transparent, and meritocratic project that can be run in a lightweight and efficient manner. We want to be inclusive and hope contributors feel comfortable joining the project.

In terms of OpenJDK, IBM will be bringing much to the table. Years of deep experience in Java runtimes, and we support a wide range of HW platforms, so we'll be taking our skills and work on earning our commit rights so we can get to it directly in the OpenJDK codebase. I expect our joint efforts will strengthen the Java community, accelerate innovation in the overall Java ecosystem as well as give a boost to the floundering blogging community currently suffering from a lack of things to write about :)

In OpenJDK, if we do it right, we'll see innovation from more than just the big players, and that will make the overall ecosystem richer. In the end, it will be our customers who benefit, from innovation, and a multi-vendor platform they can build their business on.

That said, while we will cooperate on the platform, we're still going to compete like mad on the implementations (delivering performance leadership :)), and if you haven't noticed, we're also gonna have some fun trash talking too.

Monday, June 14, 2010


Arrived at work today and noticed the "the Answer to the Ultimate Question of Life, the Universe, and Everything." on my odometer.
Then, The Daily WTF had an awesome post on a spectacular use of binary.
Nice. Twice

Saturday, February 20, 2010

Give it a shot, it might work

"Give it a shot. It might work." Made me laugh. An optimistic caveat delivered by CTV when browsed via Chrome (turns out it does work). Of course, they don't actually give you the opportunity to not "give it a shot" as the modal dialog prevents all other choices.

So, I mock it a little, but personally I think it's better than many of the checks we see for browser compatibility. Some web sites, are very specific and annoyingly refuse if the browser is not exactly the one they tested with. So, good on ya, you optimistic Silverlight developers ! but next time you can try out our long running VM team phrasing - "totally bogus, never been tested, should work" as a positive alternative.
The video in question was Jon Montgomery's 4th run on his way to winning gold in the uber scary skeleton event. Nice display of Canadian values too, by drinking a pitcher of beer on his way to the TV interviews. He's welcome on any of our fishing trips anytime.

Monday, December 28, 2009


You take your most precious things, and you put em on some cheap slippery plastic and hurl them down snow covered hills until they scream with joy (and then beg you do it again and again). You only stop after two face plants, one noggin with ice chunk collision and a busted sled from a near miss between a 200 lb man and a 5 year old boy.

Don't worry, it's a "mom approved" activity but if the moms were not actually there, dad would have sent them all down the steep side. (which is great ! btw)


Happens to every phone I've ever owned, eventually the charger slot requires manual intervention to support charging. It's those darn custom super small charger plugs which can't withstand the eternal plug-unplug cycle. I suspect they are designed to fail during the holiday season when getting a new phone is the last thing I want to go out and do.

Meanwhile, MacGyver lent me some rubber bands and suggested I skip the fruits and go robot.

Saturday, December 26, 2009

173 years of blogging (1926 - 2099)

Google thinks I've been blogging for 173 years. I know this because on a regular basis, the google bot fetches blog entries from far past and future dates. The Google bot is slowly working it's way through time, month by month (Google is just being very thorough).
I noticed this in my web server logs (you don't scan your logs once in a while ?). I found multiple entries generated by fetches by the google bot. I saw a bunch that looked like this.
"GET /?month=11&year=2039 HTTP/1.1"
"GET /?month=12&year=2039 HTTP/1.1"
"GET /?month=1&year=2040 HTTP/1.1"
The earliest year was 1926 and I was about to break into the Y2.1K on the high end. I was amused at the time, Google was proactively fetching future blog entries in eager anticipation of their high quality and entertainment value. I fetched one of the future date pages and sure enough, a valid page is returned with no errors displayed and the calendar shows the correct future month and a link to previous and the next month as well. Ok, that explains why google is crawling forward in time - my silly blog software is publishing future links. I checked my front page and was surprised to see the link to the future date was not there - apparently the software is intelligent enough to not link into a future month if you were on the current month. It's just not smart enough to not link to future dates if you happen to fetch a future month. Sounds like a simple fix to me.
But we do have a mystery - how did google get that first link into the future if it's not published on my main page ? I first assumed that google was smart enough to know how to manipulate patterns like "year=2009" and do an increment to "year=2010" and walk through the entries that way but that's just asking for bad links so I doubt it. I think it's more likely that was is a timing window - daylight savings time or possibly when I moved from the old server HW to the new server, that a date mismatch caused a future link to be generated which started google on the path forward (and backwards). It does make you wonder, what do these guys do ? They generate future links from year to year and I suspect they (and many other calendar web pages) will have Google and other sites fetching pages through time.
So, Google follows these links into next month and prev month and by the time I caught up with it, 173 years of blogging had passed. For reference, in the full log set (6 months worth), there were 5300 GET's with past or future dates.
I decided instead of waiting to see "year=3001" in my blogs at some point, I should fix the code. Should be easy right ? Well, yes it was it was so easy I fixed it three times. I check the main page and see a juicy looking "calendar.js" script reference so I grab that file. Take a couple peeks and I notice the code has a check for the current date and doesn't insert a future link if you are in the current month (m==currentMonth && y==currentYear). Clearly only excluding the current month is bogus. I tack on the obvious (y > currentYear) and think to myself "that was easy". Time to test - and of course the change has no effect. I futz with it a few times and nope, the changes have no effect. I hack in some obvious visible changes "December" -> "HACK DECEMBER" so I could see it and know I was in the right file. Nope, this code is not running.
So, grep around (again) and voila ! find a *second* file that looks "calendar like" but not called calendar (hidden code crafty!) and again - I find the exact same code, apply same changes and retest. Nope - no effect. This is getting silly. So, first - to prove the files are the real ones, I use the old trick - rename the two files, lets be sure they are being used at all, rerun the tests - nope - deleting the files has no effect. Obi-wan speaks to me These are not the files you're looking for.
Grep around some more, and hey, even more calendar code ! (must have been a 3 for one sale), and voila!, the same code there too, with a incorrect comment that says "prevent future dates from being displayed" and of course the exact same bogus code. ()
Third times a charm, and the link into the future is gone, so next time Google grabs the last cached link, it will stop the eternal march into the future (and it did). I also fixed the eternal past links, choosing an arbitrary cutoff so that Google doesn't march it's way down to 1/1/0001 or something since those old blog entries are soooooo embarassing.

Thursday, November 05, 2009

The answer is yes

For those who live in warmer climates and ask me almost daily, yes, the answer is yes.
You may begin the weather related mocking now.