Posted by & filed under Build.

An often overlooked extension to Structure101 is the “Headless” mode of operation. This lets you hook Structure101 into your nightly build so that it checks for things like newly introduced complexity or architectural rules being violated while you sleep. You specify what you want checked, and whether you want to break the build or just receive a warning.

Eran Harel (“Code Slut”) has blogged how he uses this in Orbitz.

Posted by & filed under Architecture, Architecture Diagrams, Dependency Management, Structure101.

Architecture Diagrams in Structure101 are mapped to the physical code by patterns associated with each cell in the diagram. This enables the visual specification of rules that can then be applied to a specific version of your code so that Structure101 can overlay any violations on the diagram and let you discover the offending code items.

When you let Structure101 create the diagrams for you, and stick to changes like adjusting the layering, expanding packages, or moving packages to new parents, etc., you can mostly leave it to Structure101 to handle the patterns. However, occasionally you may see some behavior that seems odd unless you understand how patterns interact.  And of course you may want to take advantage of the more advanced capabilities.

A key aspect that is not obvious at first is the “most specific pattern” rule. This says that where a physical class maps to more than one cell, the Architecture Diagram associates the class with the cell that has the “most specific” pattern.

Here’s an example diagram which was automatically generated from part of the Findbugs code-base:

18-11-2009 15-48-38

… and here is the same diagram showing the pattern associated with each cell:

18-11-2009 16-16-02

If I have a code-base that happens to contain a class edu.umd.cs.findbugs.classfile.impl.ClassFactory, then that class matches the pattern associated with the cell called ClassFactory. However it also matches the patterns for both ancestor cells “impl” and “classfile”. Since the ClassFactory cell’s pattern is the most specific (no wildcards), that is the cell to which Structure101 associates our physical class. And since a class is associated with at most one cell, that class is not associated with either ancestor cell, even though it matches their (less specific) patterns.

As expected, here are the associated items that Structure101 reports for the ClassFactory cell:

18-11-2009 16-51-41

… and the parent “impl” cell:

18-11-2009 16-55-15

Now I delete the 3 inner cells so that the diagram looks like this (the patterns for the remaining cells are unaltered):

18-11-2009 17-11-59

Since the cells with the more specific patterns are gone, the classes that were associated with them now match the next most specific pattern – the parent “impl” cell:

18-11-2009 17-17-54

This can be puzzling – I click on the “expand” button, the child cells are added to the diagram (which I expect), but the list of items associated with the parent is suddenly empty (which I don’t).

Why would we do it this way? We figured it was the best way to be future proof (the wildcards handle any added/removed classes) while at the same time supporting refactoring (e.g. if I move a class to a new package, the most specific pattern goes with it, but all the other classes still match to the original parent).

Posted by & filed under Programming Languages.

Go” is a new systems programming language created by Google. Syntax is based on C++, and it compiles (like greased lightning apparently) – even has a Printf()! But beyond trivial similarities it is a very different beast:

  • Interfaces replace class inheritance, but unlike Java interfaces, no explicit reference to the interface is required – as long as a type provides the methods named in the interface, it implements the interface.
  • Garbage collection.
  • Arrays are first class citizens – you don’t need to (can’t) use pointers to access the elements – instead you use [], and “slices”.
  • A “slice” is a section of an array and behaves very much like an array – you can create a slice of a slice.
  • Strings are first class and immutable, so memory allocation is not an issue.
  • “Maps” are hash tables and are also first class.
  • Threading is part of the language (no RTOS needed). “Goroutines” run in parallel and communicate via “channels”.
  • No header files (that would have ruled it out of my book). “Packages” are used to group and reference (“import”) stuff.
  • Reflection.
  • Type conversion is explicit, no function overloading (that’s an odd one – presume it’s to speed up compile time – expect a lot of awkward function names), no user-defined operators.

Google reckon it’s “fun” to use. Compared to C++ that’s presumably a no-brainer – I’d say Java programmers would probably take to it as easily. Assuming it does what it says on the can.

From a dependency management point of view, I don’t much like the implicit (read “hidden”) implementation of interfaces. But I like the absorbsion of concurrency into the language – should allow modelling of references that disappear into the OS in C++. But it’s not ready for primtime yet – it’ll be a while before we go building a Go backend for Structure101…

Posted by & filed under Structure.

Herbert Simon’s parable of the watchmakers was constructed to convey his belief that complex systems will evolve from simple systems much more rapidly if there are stable intermediate forms present in that evolutionary process than if they are not present.

Arthur Koestler built on this in his 1967 book “The Ghost in the Machine“, in the process coining the term holon to denote something that is simultaneously a whole or a part, depending on how you look at it. Here Mark Edwards explaining this duality:

Every identifiable unit of organization, such as a single cell in an animal or a family unit in a society, comprises more basic units (mitochondria and nucleus, parents and siblings) while at the same time forming a part of a larger unit of organization (a muscle tissue and organ, community and society). A holon, as Koestler devised the term, is an identifiable part of a system that has a unique identity, yet is made up of sub-ordinate parts and in turn is part of a larger whole.

Importantly, Koestler further described holons as

… autonomous, self-reliant units that possess a degree of independence and handle contingencies without asking higher authorities for instructions. These holons are also simultaneously subject to control from one or more of these higher authorities. The first property ensures that holons are stable forms that are able to withstand disturbances, while the latter property signifies that they are intermediate forms, providing a context for the proper functionality for the larger whole. [Summary text from wikipedia]

Though the terminology is different, I am sure the key tenets of Koestler’s principles will resonate with most people in software. Certainly, the importance of meaningful wholes (within the context of a wider system) is well recognized, and reflected in established principles such as Single Responsibility and Reuse Release Equivalency. Similarly, most would agree that the ability to withstand disturbances is hugely desirable, though we generally talk about this in terms of agility (or its converse fragility). The one aspect that might jar a little is the reference to higher authorities – I’ll revisit this later in the context of Emergent Design.

Koestler also introduced the term holarchy to denote a hierarchy of holons. As I suggested in my previous post on this subject area, I rather feel that, mostly, today’s software thinking tends to buy Koestler’s notions on holons but fall down on holarchy. Specifically, we tend to pay little or no attention to the world of complexity between the low-level coding constructs (classes, methods) and the unit of deployment (jar, dll).

Just as one example of this, see Bob Martin’s Principles of OO Development. He describes five principles that apply to the class level, and six that operate at the unit of deployment. Nothing inbetween. Similarly, and related, there are lots and lots of (not necessarily very useful) metrics that measure aspects of classes and methods, but there is an almost complete vacuum at the (what Booch would have called) “class cluster” level. One of the very few exceptions to this is DMS and related stability metrics for Java packages (based on Martin’s Acyclic Dependencies Principle). However, and somewhat amusingly, it would seem that these metrics only came into being because of confusion over Martin’s use of the term “package” (apparently, he actually intended this to denote unit of deployment)…

The situation changes instantly if we embrace hierarchy, holarchy. I do not see this as anything particularly radical, rather just a generalizing of existing principles. However, the ramifications could be quite far reaching. In the next two posts, I will explain for example how holarchy opens the door to automated visualization and holistic measurement.

Posted by & filed under Structure.

The parable of the two watchmakers was introduced by Nobel Prize winner Herbert Simon to describe the complex relationship of subsystems and their larger wholes.

There once were two watchmakers, named Hora and Tempus, who made very fine watches. The phones in their workshops rang frequently and new customers were constantly calling them. However, Hora prospered while Tempus became poorer and poorer. In the end, Tempus lost his shop. What was the reason behind this?

The watches consisted of about 1000 parts each. The watches that Tempus made were designed such that, when he had to put down a partly assembled watch, it immediately fell into pieces and had to be reassembled from the basic elements. Hora had designed his watches so that he could put together sub-assemblies of about ten components each, and each sub-assembly could be put down without falling apart. Ten of these sub-assemblies could be put together to make a larger sub-assembly, and ten of the larger sub-assemblies constituted the whole watch.

I am reasonably sure that most software people reading this little parable would be inclined to nod. For sure, modularity is and always has been a hugely desirable trait in our attempts at software development and design.

In fact, though, I would suggest that the overwhelming majority of software projects today follow the example of Tempus, who lost his shop, rather than Hora. Why?

Because, mostly, we only pay attention to aspects of modularity and component-ness at two levels of granularity: low-level code (classes, methods) at one end of the spectrum, and unit of deployment (jar, dll) at the other. Everything in between we tend to treat as a largely amorphous blob comprising hundreds or even thousands of interacting entities. Even in those case where we do have meaningful abstractions/layers between the low-level code and and the unit of deployment, these are generally invisible and unmeasured. In this context it is hardly surprising that they will tend to degrade over time.

Simon’s parable was one of the key drivers behind Koestler’s theory of holons and holarchy. I will follow up on this – and its (to my mind) huge relevance to software thinking today – in a future post.

Posted by & filed under Dependency Management.

Too much baggage

Code is like traveling: the less baggage the better. No bags is bliss, a little backpack hardly noticeable. Chunky wheelie bag: bearable but irksome. But several chunky wheelie bags, and it starts to get … logistically challenging. Not to mention increased risk of hernia.

Often, of course, some amount of baggage is unavoidable. If you are embarking on an expedition to the North Pole, for instance, you would be well advised to take a decent supply of warm underwear.

Pretty much all code has baggage, but some code has more baggage than other code. Ask a developer what they would prefer to tackle – implement a little standalone utility, or write something that sits astride all the obscure notions and constructs emitted by others – and I’m pretty sure I know which one most would pick.

Successful baggage limitation

Of course, you can’t code up a (meaningful) system without some number of building blocks. So even in a perfectly architected and layered system, you inevitably accumulate some baggage as you move up the stack. The trick, though, is to try and minimize this (while also hiding off the details of the contents).

This is hardly new or revolutionary: much of software theory is specifically dedicated to strategies that help us to avoid excessive coupling and so promote modularity. That said, I do rather like the baggage metaphor and am inclined to see minimizing baggage as a primary goal, with e.g. re-usability a side-effect, rather than the other way around.

So what do I mean by baggage? Very informally, I’m thinking of this as “stuff you need to know about” to implement another bit of stuff. In this sense, baggage is a universal aspect of software development, entirely independent of e.g. programming language or framework.

Where a code-base is written in a strongly typed language like Java, it is relatively easy for static analysis to detect most of the baggage automatically, for instance A carries B baggage because class A extends class B and/or method A.foo() calls method B.bar(). And tools like Structure101 for Java exploit this to provide visualization and analysis of the baggage landscape.

It is important to understand, however, that there are always likely to be blind spots in such tools. A highly specific example I came across recently was where class X emitted a convoluted (highly X-specific) string that got passed to class Y which contained custom code to parse that string. In a (simplistic) static analysis of the code (and in the absence of a class Z that wraps the string in some form), Y does not depend on X (or Z). Conceptually, however, Y is most definitely carrying X baggage.

Other (tool-specific) blind spots may become apparent, so to speak, if we look beyond the confines of the immediate code-base. For example, consider a piece of code that constructs and executes a gruesomely contorted SQL statement. Static analysis that only looks at the code reveals a dependency on say javax.sql.* but misses the additional baggage that arises from intimate knowledge of the database schema. The same kinds of issue arise if we are using e.g. internal DSLs as part of a wider solution.

Baggage landscape

Does this invalidate the use of static analysis tools as some have argued (see previous thread, especially the comments)? Well, strictly speaking, I guess it is a percentages game and depends to a large event on the specifics of the analysis engine and project in question. When it comes to blind spots outside the code-base (as in the database example above), the key factor to my mind is their contribution to overall system complexity. In the typical database scenario, I would tentatively suggest that this is generally marginal (assuming that the relevant code is suitably compartmentalized). As for within the code-base, clearly, the higher the correlation between reported and conceptual baggage, the greater the utility of the tool. In the case of Java (and strongly typed languages in general), I would say the correlation is extremely strong (though my viewpoint here may be rather predictable, given I am one of the guys behind Structure101). There is also the question of whether accurate subset visibility is preferable to no visibility at all…

Whose bag is it anyway?

When playing the percentages game, however, it is important not to confuse the baggage that you are genuinely carrying with other suitcases that just happen to be in the same space. This is the distinction between static and runtime views of the world. I’ll paraphrase the issue here as: I’ll worry about my baggage and let others worry about their’s.

For example, if I were given the job of coding up java.util.ArrayList (an array-style container implementation), my baggage would be (broadly) just my interface (java.util.List) and members (instances of java.lang.Object). At runtime, someone may use an ArrayList to hold a collection of Foo instances; so when the list’s get() method is invoked, the returned object is in fact an instance of Foo. But that does not mean that my ArrayList is in any way conceptually dependent on their Foo. This is their baggage, not mine.

A similar nugget in the Java space is reflection (and e.g. dependency injection a la Spring), often seen as a gap in static analysis tools in the sense that some dependencies are missed. However, this is really just the same issue as the list of Foos above. At coding time, all I need to know is that (say) some input string will be a class name that I can use to instantiate an appropriate implementation of something or other (often involving a cast to an interface that does get picked up by static analysis). The rest is runtime, someone else’s baggage.

That said, there is a scenario where reflection can be used to induce a blind spot wilfully. For example, I know that the object I am getting is a Foo and that I will invoke its bar() method, but I deliberately do this using reflection rather than casting. The baggage is there whichever approach I choose but in one case (typed invocation) the baggage is transparent while in the other (reflection) it is obfuscated. There is a danger that a blind adoption of rules and metrics around baggage measurement may, in extreme circumstances, encourage some team members to adopt the obfuscation approach. To “game the system”. I think here that there would be a static analysis counter-measure – namely to control access to reflection – but obviously the better approach is to address any such dysfunctionality at source…

Invisible baggage?

In this sense, dynamic languages are (of course) really just an extension of the reflection paradigm. The baggage is still there – it’s just a heck of a lot harder (though not necessarily impossible) to detect. This means that there tends to be way less tooling support, but also, and more importantly, it may be much more difficult for the developers to understand their baggage situation. Interestingly, this has led some to question whether dynamic languages can scale to larger code-bases and teams because of a finite “complexity budget”. For an overview of some of the issues here, see this post by Ted Neward.

Shit happens

Finally, if everyone pays attention to their baggage, does that mean that the system is guaranteed to work? No, of course not. When I check in my bags at the airport, I should ensure that they are securely closed and suitably labeled. That in itself, however, is absolutely no guarantee they will be there at the other end for me to collect (though it should at least make life easier for the airport’s baggage management system and so help to make the desired outcome more probable). The one and only thing I can be sure of is that any screwups will not be my fault. Seems to me that this is the essence of good software: lots and lots of well-defined, self-contained, autonomous units doing their own job faithfully and keeping fingers crossed that others do the same…

Posted by & filed under Dependency Management, Structure101.

Here an interesting use case.

I am currently working on a reuse project. We have a large legacy Java application that we are trying to farm for implementations of some high level functions in a new application. To do this we are identifying the top-level classes that provide the initial entry point(s) to the desired high level functionality and then trying to discover all of the classes in the old system needed to support the identified top-level classes.

I have been doing this manually by going to the collaboration perspective in Structure101, selecting the “go to suppliers” option of a top-level class, and then manually drilling down through all of the classes the top-level class uses (that is, for every class the top-level class uses, I select “go to suppliers” and find out all the classes that class uses, etc. etc. etc.), tracking the needed classes as I go. This is not feasible to do given the size of the project.

Is there anyway I can get Structure101 to basically give me the transitive closure of all the classes used by the identified top-level classes, preferably as plain old ASCII text? Stucture101 seems to have already computed all the information I need, I just cannot figure out a non-manual way of getting the information.

As it happens, there is no first class support in Structure101 for this specific feature. However, it is do-able by leveraging other features and model options. Here’s how you would go about it.

First step is to set Overview granularity in the project properties. With this setting, the model stops at the outer class level but still takes account of all the member-level dependencies. So if Foo.x() calls Bar.y(), the model shows Foo and Bar and a “uses” dependency from Foo to Bar.

Second, tagging. Select a top-level class A in e.g. the Composition perspective, right click and choose Tag / Used by selected / Indirectly. Tag adornments (little blue dots) will appear on all the classes that A uses directly or indirectly (transitive closure of A). Then right click again and choose Tag / Selected so that A is also tagged. Repeat this for other top-level classes.

Now you have all the classes tagged, so all you need to do is export the tag list. Unfortunately, Structure101 does not have a button for that (grrrr) so we have to find another way of getting there…

From the main menu choose Tag / Invert item tags followed by Tag / Hide tagged. You have just subsetted the model to contain only those classes that you are interested in. Now all we have to do is get them into one table where we can right click and choose Copy / Copy all. Easiest for this is probably to switch to the Slice perspective, choose the “Outer class” level, and then you’ll likely see a single main cluster. Select this (actually it will get automatically selected for you). Hey presto, the table bottom left (Items tab of the Graph Contents viewer) contains the full list so a right-click should seal the deal.

Posted by & filed under Structure101.

We just released our new generic jobbie, Structure101g.

If you already know Structure101 for Java or Structure101 for C/C++, you probably already have a good idea of what Structure101g might be. This is for those (and there are many, oh so many) who do not.

Graphviz is a wonderful tool that can be used to create graph visualizations of stuff. All you do is stick in a logical graph model via a text file and out pops a nice picture. For the seminal example, see this view of the Unix family tree (and here is the corresponding text input file). Visit their gallery for lots more examples.

Occasionally the input is written by hand, but mostly it is generated by some piece of code that parses the domain-specific artifacts. For example, Graphviz is widely used to obtain subset pictures in code-base scenarios, e.g. the set of files in a directory and the includes/imports relationships between those files.

Although it is generally totally wonderful, graphviz (and other tools of its ilk) have one big weakness: graph visualizations do not scale. Subset pictures work fine (as in the includes example above) but there is no way to get a meaningful visualization of all the files across all directories.

The key in Structure101g, as in all Structure101 products, is to view the big model through the prism of hierarchy. Divide & conquer – use slices as a mechanism to get both subset views and “big picture” views. The other key differentiators are rich browsing and analysis environment rather than bitmap (or SVG) image, and some nice stuff around plumbing so that end users can create models interactively without leaving the UI.

Ok, so far so product pitch. Here the important stuff:

  • To display data froma particular domain, Structure101g needs a meta-description (xml) of the entity and relationship types in that domain. We call this a flavor.
  • In nearly all real-world cases, a flavor has an associated runner: This is the piece of code that parses the domain specific artifacts (or perhaps just some glue on top of an existing parser).
  • At time of writing, there are flavor/runner implementations for the domains OSGi/Eclipse (bundles), Maven (POMs), Ant (targets and properties in a build.xml file), XSL (stylesheets), and Web (html pages and associated images, scripts, stylesheets, etc.).
  • Graphviz is completely free to all and sundry.
  • Structure101g is completely free for the above flavor/runner pairs, but we (Headway) have control-freakish tendencies so you need to talk to us about making a flavor generally available (either for free or commercially) or buy a domain license for proprietary usage.
  • Graphviz does lots of different graph types and layouts. Structure101g is hard-wired to directed graphs with hierarchical top-down layout.

For more info and background, see this demo or visit the Structure101g home page.

Posted by & filed under Complexity, Emergent Design.

My recent post on architectural erosion in the findbugs code-base was generally well received, but there were some skeptical voices.

In a comment, Emeric questioned whether cyclic dependencies at the package level are anything more than a smell (if that). Itay Maman was a little more forthright, offering a little series of posts arguing that I was peddling myths, tangled packages are the norm (so they must be okay), and all static analysis is in any case completely pointless.

In both cases, they honed in exclusively on the rather narrow issue of package tangles, while also ignoring the time dimension, and in this sense I think both rather missed the point (though perhaps some more than others).

As I said in the opening paragraph of the original post, the key for me is levels of abstraction above the raw code: architectural components within a code-base if you like. In the case of findbugs, there are several instances where you can see that an architectural decision was made, only for this to be come blurred and ultimately lost over time. In all the early releases (e.g. 0.8.6), and surely not by accident, the ba component does not use the findbugs component. In 0.8.8, a rogue dependency creeps in. If you follow the full series of snapshots, you will see that this back dependency steadily rises from an initial weight of 2 code-level references (that could be easily reversed out) to the point where the interdependency is deeply entrenched in the code. Other examples are the blurring of the relationship between config and findbugs, and the attempt to interface off the dependencies on the specific parser library (asm or bcel).

It is not in the least bit surprising to me that this form of erosion happens over time for the simple reason that it is generally invisible. How can we rationalize about that which we cannot see (or measure, or define)?

Enter Structure101. This is based on the simple principle that, in order for design items to be first class citizens in the code-base, we need to be able to see them and, especially, the interactions between them.

Note that I am using terms like “architectural component” and “design item”, not “package”. In general, I am loathe to assume that there is necessarily a one-to-one correspondence between the Java package hierarchy on the one hand, and the “design hierarchy” on the other – for sure, there is absolutely nothing in the language specification to say that this must be so. I can say with confidence that this correlation exists for the Structure101 code-base (because I co-own it) and the Spring code-base (because they make a lot of noise about it), but I think this is a dubious assumption in general. For findbugs, however, I think this little leap-of-faith was reasonable given the clarity of the package diagrams in the early releases.

Now suppose that you have do have a code-base with a formal design hierarchy but one that does not correspond to the package structure. If you point Structure101 at that code-base, the initial (default) views will leave you completely cold because you are looking at the interactions between arbitrary subsets of code. You don’t care about these in the slightest, and quite rightly so. However, that does not mean that the tool is of no use – you can use transformations to map the code so that the resulting hierarchy does mirror the design. This scenario is actually quite common, for instance where the first level of breakout is managed via separate IDE projects / jar files and the logical package view results in (unintended) package name collisions.

With that background in place, let’s take a closer look at some of the dissenting voices.

@Emeric
But what are your arguments to say that 2 or 3 interdependents packages are a “blob” which you imply is bad ?
Don’t these packages deserve a name in their own and an isolation of comprehension even if they have cyclic dependencies ?
I think they initially deserve the separation, even with cyclic dependencies.

This does not seem unreasonable and is in fact a good fit with the tangle of findbugs, config and filter in 0.8.8. Here is the raw package diagram (note that I’m excluding io and anttask as noise):

So let’s transform this model to introduce a new architectural component – controller – as the union of all three of these. Here’s what we get:

This gives us a much better view of the architectural components at this level in the code-base, and shows just the one (clearly) rogue dependency from ba back to the controller. Needless to say, if we drill into the controller component, we will see that it contains package tangles…

… but it is essentially up to you (the user, team, …) whether or not you choose to care about this. Indeed, you can formally capture those things you do care about (and, by implication, those you do not) using architecture diagrams. Note however that the XS (excessive complexity) metric will always punish tangles at higher levels in the code (in that it measures distance from a structural ideal) – more on this in a bit.

Let’s now take another look at the most recent version (1.3.5). We could apply the same principle here, and transform all the packages involved in the tangle into a single “architectural component”, though this time it’s harder to think of a name. Let’s go with … errr … blob.

And, hey presto, we have an acyclic graph at the first level of breakout. However, 99% of the code-base is now located within blob so we have not really achieved terribly much in terms of architectural divide & conquer. Also, needless to say, the package breakout within the blob component is still essentially anarchic.

This leads on to the final point about package tangles. There is always a simple fix way to fix any package tangle: just merge all the classes into a single package. Here is another view of the blob component (ok, so you’ll need very good eyesight for this one) but this time I tweaked the transformations to strip out all the sub-packaging.

This one is way too “fat” (642 classes and 6,830 dependencies) to do as a nice diagram so I switched to a matrix view and set the cell size to 1 pixel. In many ways, I prefer this view because it is a much more accurate representation of how the code really is (a bunch of classes), as opposed to the raw package view which is basically just showing arbitrary subsets. That said, this view doesn’t actually help me to understand the code-base in any way shape, or form. Instead, it makes me think of Jonathan Edwards’ fine quote that “the human mind can not grasp the complexity of a moderately sized program, much less the monster systems we build today“.

This aspect of structure – the dichotomy between tangles and fat – is important when it comes to measurement, since it is clearly insufficient to take account of one without the other. The XS metric does factor in both sides of this coin – I do not claim that it is perfect, but I do rather suspect it is the best we have.

@Emeric
I agree that cyclic dependencies is a bad smell, except perhaps when there is a large dependency [in one direction] and a small backward dependency [in the other].

I agree that cyclic dependencies between packages is merely a smell, but I would argue that cyclic dependencies between architectural components is more than that. Where the backward dependency is small, I would see this as probably indicating good abstractions but with some rogue code that should ideally be fixed some time (but see also Keep A Lid On It). Heavy interdependency between architectural components is essentially the same state as no architectural components…

@Emeric
May I ask you opinion on the possibility to refactor, in the future, findbugs with Java Modules and friends packages.

I think I have addressed this already in the sense that the design (module, component, …) hierarchy is not necessarily the same as the package hierarchy (friends or no friends) but please follow up if I’m missing something here.

In his “Mythbusting” series, Itay hauls out the scattergun and sprays it around pretty indiscriminately. I was concerned that it would take me a long time to respond to all the various points, but actually I am delighted to says that others have already done a far better job than ever I could have done, so for the most part I will just refer you to the meaty comments sections. However, there is one aspect I would like to follow up on.

Here is the very first line in the very first post:

@Itay
(Disclaimer: as always, the issue of reasoning about software quality is largely a matter of personal judgment)

This is later fleshed out a little more (in a comment):

@Itay
It all boils down to the fact that we don’t have a single-, absolute-, objective-, metric of software quality (we all wish we had). Hence, we are constantly looking for approximation techniques. We must be careful not to mistake the approximation for the real thing.

This standpoint is thoroughly defensible, and reminds me of a number of conversations I have had with customers about the Structure101 metrics. The general feeling here, however, has consistently been that metrics are critical in terms of bridging the gap between technical staff (who instinctively understand why a particular activity is needed) and management staff (who need things like line charts and red and blue bars to be able to justify such activities further up the food chain). The two most interesting metrics here are XS (mentioned above) and number-of-architectural-violations. The former is all predefined and set in stone (though you can tweak the thresholds) while the latter is totally in the control of the team because it is calculated based on the architecture diagrams that the team (not the tool) defines. Some customers use one, some the other, and some both. I think it is correct to say that all are careful and none would claim for a second that these are absolute measures of perfection.

Had this – being careful with stats and metrics – been the essence of Itay’s posts, I think he would have caused less of a storm (though perhaps the storm was always his objective). Instead, Itay chose to mostly bury the “being careful…”  bit and lead with sweeping statements such as “Dependency analysis is largely useless”. Shame…

Posted by & filed under Complexity, Emergent Design.

My particular area of interest in software these days is the importance of levels of abstraction above the raw code. In Java, the most natural place for these to manifest themselves is through the package structure (though this is certainly not the only possibility).

Recently I used Structure101 to do some analysis on the evolution of the findbugs code-base, and was rather stunned at what I saw. Here is the root dependency graph for the first public release (0.7.2) back in March 04.

This diagram shows us the top-level packages in the code-base and the interactions between them (the numbers beside the arrows denote the number of code-level references).

With just a little knowledge about what findbugs does (it is a static analysis tool that scans bytecode for potential bugs), it is easy to rationalize about how this code-base is internally architected:

  • graph is a re-usable (baggage-less) data structure to model the control flow within a method body
  • visitclass wraps a bytecode parser with a visitor pattern to shield the parser implementation from the interpretation of the parser callbacks
  • ba (bug analyzer?) is the bit where specific rules (policies, strategies, …) are implemented
  • findbugs is the controller that drives the interactions between the other components

The image below again shows the top-level breakout, but this time several releases later in Oct 04 (0.8.6).

Although the code-base has grown significantly, it is still absolutely possible to rationalize about the architecture and where the new pockets of code (annotations, xml, config, etc) fit into the “big picture”. The only apparent blemish is that io is now disconnected (perhaps dead code).

The first significant imperfection creeps in in April 05 (0.8.7).

The relationship between config and findbugs has become blurred. Since both packages are dependent on each other, it is no longer clear that either of these in isolation represent meaningful or useful abstractions, and it may make more sense to think about the relevant “component” as being the union of the two.

Skip forward just a month…

… and the confusion has spread (0.8.8). The filter package has been added but it too has a 2-way dependency with the findbugs package, so it seems reasonable to say that the whole world of controller and config (incl. filtering) is in essence a blob where the individual packages do not really contribute anything in isolation.

There is also a rogue dependency here from ba to findbugs. This is clearly contrary to the original architectural intent.  The weight is just 2, so this would have been very easy to reverse out had it been spotted.

If we fast-forward a year (1.0.0), however, we see that this rogue dependency has become entrenched (the weight has increased from 2 to 99).

Moreover, more and more packages are being pulled into the tangle such that it is hard or impossible to talk about these as meaningful entities in their own right. For example, what is the point of a util package if it contains code that depends on the findbugs package?

Nevertheless, we can still see evidence of meaningful architectural decisions. For example, the bcel and asm packages are presumably wrappers for the BCEL and ASM bytecode engineering libraries that, together with the classfile package, enable an element of plug&play in terms of which library actually gets used for the analysis.

However, moving on to Nov 07 (1.3.0)…

… we see that these too have been sucked into the tangle. From now on, it seems, all testing, deployment etc. will need to include both.

And here is the most recent snapshot from September 08 (1.3.5):

This diagram doesn’t help any more – nearly all the higher-level abstractions appear to have eroded away. Moreover, a peek under the hood reveals that there is a large code-level tangle involving 43% of all the classes and spanning 33 packages – this implies that the interdependency has become deeply entrenched in the code. Shame…

For a quick view of the full history, I did up a little animated gif showing the “progression” through all 27 releases. If you are interested in something meatier, see the “Structure101 in a Nutshell Part 1″ presentation on the Headway Take a Tour page.