Code is like traveling: the less baggage the better. No bags is bliss, a little backpack hardly noticeable. Chunky wheelie bag: bearable but irksome. But several chunky wheelie bags, and it starts to get … logistically challenging. Not to mention increased risk of hernia.
Often, of course, some amount of baggage is unavoidable. If you are embarking on an expedition to the North Pole, for instance, you would be well advised to take a decent supply of warm underwear.
Pretty much all code has baggage, but some code has more baggage than other code. Ask a developer what they would prefer to tackle – implement a little standalone utility, or write something that sits astride all the obscure notions and constructs emitted by others – and I’m pretty sure I know which one most would pick.
Of course, you can’t code up a (meaningful) system without some number of building blocks. So even in a perfectly architected and layered system, you inevitably accumulate some baggage as you move up the stack. The trick, though, is to try and minimize this (while also hiding off the details of the contents).
This is hardly new or revolutionary: much of software theory is specifically dedicated to strategies that help us to avoid excessive coupling and so promote modularity. That said, I do rather like the baggage metaphor and am inclined to see minimizing baggage as a primary goal, with e.g. re-usability a side-effect, rather than the other way around.
So what do I mean by baggage? Very informally, I’m thinking of this as “stuff you need to know about” to implement another bit of stuff. In this sense, baggage is a universal aspect of software development, entirely independent of e.g. programming language or framework.
Where a code-base is written in a strongly typed language like Java, it is relatively easy for static analysis to detect most of the baggage automatically, for instance A carries B baggage because class A extends class B and/or method A.foo() calls method B.bar(). And tools like Structure101 for Java exploit this to provide visualization and analysis of the baggage landscape.
It is important to understand, however, that there are always likely to be blind spots in such tools. A highly specific example I came across recently was where class X emitted a convoluted (highly X-specific) string that got passed to class Y which contained custom code to parse that string. In a (simplistic) static analysis of the code (and in the absence of a class Z that wraps the string in some form), Y does not depend on X (or Z). Conceptually, however, Y is most definitely carrying X baggage.
Other (tool-specific) blind spots may become apparent, so to speak, if we look beyond the confines of the immediate code-base. For example, consider a piece of code that constructs and executes a gruesomely contorted SQL statement. Static analysis that only looks at the code reveals a dependency on say javax.sql.* but misses the additional baggage that arises from intimate knowledge of the database schema. The same kinds of issue arise if we are using e.g. internal DSLs as part of a wider solution.
Does this invalidate the use of static analysis tools as some have argued (see previous thread, especially the comments)? Well, strictly speaking, I guess it is a percentages game and depends to a large event on the specifics of the analysis engine and project in question. When it comes to blind spots outside the code-base (as in the database example above), the key factor to my mind is their contribution to overall system complexity. In the typical database scenario, I would tentatively suggest that this is generally marginal (assuming that the relevant code is suitably compartmentalized). As for within the code-base, clearly, the higher the correlation between reported and conceptual baggage, the greater the utility of the tool. In the case of Java (and strongly typed languages in general), I would say the correlation is extremely strong (though my viewpoint here may be rather predictable, given I am one of the guys behind Structure101). There is also the question of whether accurate subset visibility is preferable to no visibility at all…
When playing the percentages game, however, it is important not to confuse the baggage that you are genuinely carrying with other suitcases that just happen to be in the same space. This is the distinction between static and runtime views of the world. I’ll paraphrase the issue here as: I’ll worry about my baggage and let others worry about their’s.
For example, if I were given the job of coding up java.util.ArrayList (an array-style container implementation), my baggage would be (broadly) just my interface (java.util.List) and members (instances of java.lang.Object). At runtime, someone may use an ArrayList to hold a collection of Foo instances; so when the list’s get() method is invoked, the returned object is in fact an instance of Foo. But that does not mean that my ArrayList is in any way conceptually dependent on their Foo. This is their baggage, not mine.
A similar nugget in the Java space is reflection (and e.g. dependency injection a la Spring), often seen as a gap in static analysis tools in the sense that some dependencies are missed. However, this is really just the same issue as the list of Foos above. At coding time, all I need to know is that (say) some input string will be a class name that I can use to instantiate an appropriate implementation of something or other (often involving a cast to an interface that does get picked up by static analysis). The rest is runtime, someone else’s baggage.
That said, there is a scenario where reflection can be used to induce a blind spot wilfully. For example, I know that the object I am getting is a Foo and that I will invoke its bar() method, but I deliberately do this using reflection rather than casting. The baggage is there whichever approach I choose but in one case (typed invocation) the baggage is transparent while in the other (reflection) it is obfuscated. There is a danger that a blind adoption of rules and metrics around baggage measurement may, in extreme circumstances, encourage some team members to adopt the obfuscation approach. To “game the system”. I think here that there would be a static analysis counter-measure – namely to control access to reflection – but obviously the better approach is to address any such dysfunctionality at source…
In this sense, dynamic languages are (of course) really just an extension of the reflection paradigm. The baggage is still there – it’s just a heck of a lot harder (though not necessarily impossible) to detect. This means that there tends to be way less tooling support, but also, and more importantly, it may be much more difficult for the developers to understand their baggage situation. Interestingly, this has led some to question whether dynamic languages can scale to larger code-bases and teams because of a finite “complexity budget”. For an overview of some of the issues here, see this post by Ted Neward.
Finally, if everyone pays attention to their baggage, does that mean that the system is guaranteed to work? No, of course not. When I check in my bags at the airport, I should ensure that they are securely closed and suitably labeled. That in itself, however, is absolutely no guarantee they will be there at the other end for me to collect (though it should at least make life easier for the airport’s baggage management system and so help to make the desired outcome more probable). The one and only thing I can be sure of is that any screwups will not be my fault. Seems to me that this is the essence of good software: lots and lots of well-defined, self-contained, autonomous units doing their own job faithfully and keeping fingers crossed that others do the same…