The value of codebase structure

Most experienced engineers would accept that there are attributes of a code base under the heading of “structure” or “architecture”, that make a substantial difference to the ease of development. However, since there is some cost to improving these attributes, I do get rightly asked asked for data that quantifies the benefit, usually from higher up the food chain where code quality is less visible, and hard numbers more so.

The benefits promised by good codebase structure are mainly in developer productivity. Developer productivity is hard enough to quantify in its own right, let alone predicting differences of its value between doing things one way versus doing them another. Try suggesting that the team should develop the product twice so that you can answer your boss’s question! Luckily there is at least one study that did just this…

The study (see page F-8/9) was carried out by the US DoD. A change was required that added 3KLOC to a 50KLOC code base. This was implemented once on the original code-base. Then the original codebase structure was improved, and the changes were made again. This time it cost only 1/2 as much to modify the structured software and took less than 1/2 the time, with 1/8 the number of bugs. This was not a simple matter of the same team doing a better job the second time round, they actually used “2 teams of equal ability”!

In “Avoiding architectural degeneration: an evaluation process for software architecture” the authors make a convincing case that maintainability (which I consider to be much the same as development productivity in iterative development) is substantially influenced by the degree of coupling between modules. The coupling can be quantified (they suggest 2 metrics), and so the relative ease of development between 2 structures can be assessed. However the specific relationship between the coupling metrics and the cost of maintenance/development is not addressed, so unfortunately this remains qualitative.

Getting a bit closer to the meat of the matter, “A Case Study of Measuring Degeneration of Software Architectures From A Defect Perspective” (IEEE) looked at defect fixes that affected code in more than one component as one indicator of architectural degeneration. It is possible to infer a cost from these metrics since a multi-component defect (MCD) was associated with more files in total requiring fixing (than defects confined to a single component). There was also a correlation between MCDs and the overall connectedness of the components, which would indicate a relationship between the complexity of the architecture (poor structure) and the impact/cost of change.

A very nice study from Accenture and Infosys, “Modularization of a Large-Scale Business Application: A Case Study”, reports some concrete benefits to modularizing an “unmanageable monolith” with many MLOC. The effort to locate a fault is reduced since it is now only necessary to search one component (<10% of the codebase) rather than the entire codebase, which reduced the effort by 50% for simple faults, and 20-25% for complex faults. A 15% reduction in the faults detected during regression testing was observed (they are working to reduce the number of test cases due to the improved structure, which they expect to reduce overall regression testing effort). Runtime memory requirements reduced by 43%, since it is generally necessary to load only a subset of the modules and libraries. Link, build and startup times were reduced by 30%.

I’ll add to this when I come across more nice material.

The value of codebase structure

Leave a Reply Cancel Reply