So I'm working on this idea of representing a Java heap visually as a TreeMap.
"A treemap is a data visualization technique that uses nested rectangles to represent hierarchical data". The surface of the rectangle is proportional to the size of a node, which is typically the sum of the size of its child nodes.
You've probably seen TreeMap used to represent directory sizes on file system (also, Sunburst chart).
The Java heap is a graph, not a tree. Can't TreeMap a graph.
However, you can compute the dominator tree of a graph.
Dominator Tree?
I talked about dominators here: https://androiddev.social/@py/113176284840571965
Each node has a dominator, so each dominator has its own dominator, recursively until the root. So that makes a tree.
Once you have the dominator tree, you can compute the shallow size of each object in the graph, then for each dominator you can compute its retained size as the sum of the shallow size of each object it dominates (including itself)
So you get a tree where each node has a "retained size" which represents how much memory could be reclaimed if that node became unreachable.
If you've seen a TreeMap representation of the file system before, you know that the purpose is exactly the same: "which directory & files are using all my disk / where should I go clean things"
So, showing a TreeMap based on the dominator graph of a Java heap makes a ton of sense!
I was surprised I've never seen any tool that does that. It feels like it'd be a very intuitive representation of the Java heap.
So, I plugged my heap dump parser (Shark) into the Dominator tree code from perflib (source: https://cs.android.com/android-studio/platform/tools/base/+/mirror-goog-studio-main:perflib/src/main/java/com/android/tools/perflib/heap/analysis/LinkEvalDominators.kt;l=36;drc=499fa43009666c0f0a686d8e21722dbea8b2ecf0)
Then I output the dominator tree as a CSV file, imported it into Google Spreadsheet, selected the 3 columns (dominated, dominator, size) and inserted a TreeMap chart.
Here's the result
Now, this TreeMap looks cool. For a second it felt really awesome.
Until I started trying to use it. The giant blocks are a few big arrays, not terribly interesting. Note that I'm only displaying the first level of depth here, so every rectangle we see is a root node. All these tiny rectangles are dominated by root.
I saw views and activities being dominated by root, when I really expected them to be a part of a parent rectangle that'd be a large block in the chart.
That's weird!
In graph theory parlance, the root of a dominator tree is the SOURCE node from which the traversal starts.
However, the heap if a graph with multiple source nodes. Those are called "GC roots". Dominator algorithms can totally deal with multiple sources: they just insert a single virtual "root" node as the source node of all sources. That virtual root node is the root of the dominator tree.
And that's what's making our TreeMap fairly useless here.
As we saw, gc root nodes are dominated by the virtual root of all roots.
What's happening is we have objects reachables (directly or indirectly) through multiple gc roots, where the only way to make these objects unreachable is to clear N gc roots.
Any such node is dominated by the virtual root.
And unfortunately this property is recursive, these nodes are now effectively acting like new gc roots from a dominator standpoint.
Hence many many roots.
This problem also generalizes at every layer of the dominator tree: even if we didn't have multiple sources / gc roots, we could render a tree map with a few big blocks but then a few levels down we'd find many many tiny nodes with a mostly flat hierarchy.
This means leveraging a classic dominators algorithm to render the Java heap as a TreeMap isn't terribly useful.
So, what now?
I have a few half baked ideas.
One is to help the tree structure resurface through allow/deny listing some nodes and references, leveraging what we know about the runtime.
For example, make it so that an a
*attached* view can only be reachable through its parent, or through its activity if it's the root view. Because we know all views are supposed to become unreachable once the activity is destroyed, so the activity really is the dominator of all views attached to it.
Another idea is to insert new intermediary virtual roots that are parents to groups of roots.
Or similarly, you could apply that approach for any node in the dominator tree: if a node is the immediate dominator of more than N nodes, find a way to create intermediate virtual dominators that represent a smaller group of dominated nodes, in a way that.. provides a somewhat balanced tree? idk.