Rediscovering Modularity in Switzerland

I will be giving my talk to .NET user groups in Switzerland next week, in Bern, Luzern and Zurich.

  • Tuesday May 14 in Berne.
    • Register with Xing, or fill in the contact form on the Berne .NET user group site.
  • Wednesday May 15, Luzern.
  • Thursday May 16 in Zurich.

Come along for the chat and/or beer!

Rediscovering Modularity heads back to Germany

Chris is on the road again and back to Germany.

This time thanks to the good folks at INETA Germany, in particular Lars Keller and Alex Groß.

The focus will be on applying modularity to .Net codebases but the principles are common to all coding languages. So if you missed Chris’s German Java User Group talks from last year do come along.

It’s a busy schedule!

  • Tuesday March 12th 19:00 – 21:30
    Location: Berlin, Hotelplan CC Services GmbH, Erich-Weinert-Str. 145.
    More information.
  • Wednesday March 13th 17:00 – 20:00
    Location: Leipzig, Erich-Zeigner-Allee 69-73, 2. OG.
    More information.
  • Thursday March 14th 19:00
    Location: Dresden, InQu Informatics GmbH.
    To be rescheduled.
  • Tuesday March 19th
    Location: Hamburg.
    More information.
  • Wednesday March 20th 19:00
    Location: Braunschweig, Restaurant Zucker.
    More information.
  • Thursday March 21st 18:00
    Location: Bielefeld, Diamant Software, Sunderweg 2.
    More information.
  • Wednesday April 3rd
    Location: Ingolstadt, Hotel-Gasthof zum Anker in der Tränktorstraße 1
    More information.
  • Tuesday April 16 19:00 – 21:30
    Location: Osnabrück, Sedanstraße 61, Osnabrück.
    More information.
  • Wednesday April 17 18:00
    Location: Stuttgart, Haus der Vereine, “Fuchsbau”, Leonberger Straße 39
    More information.
  • Thursday April 18
    Location: Karlsruhe
    More information.
  • Tuesday April 23
    Location: Bonn, Comma Soft AG, Pützchens Chaussee 202-204a
    More information.
  • Wednesday April 24 18:00 – 20:00
    Location: Ratingen, 7P Solutions & Consulting AG, Calor-Emag-Straße 1, Ratingen
    More information.
  • Thursday April 25 19:00 – 22.30
    Location: Erlangen, CLEAR IT GmbH, Am Wolfsmantel 46, Erlangen
    More information.

Rediscovering Modularity Texas

The Rediscovering Modularity tour continues. This time 3 dates in Texas at the end of January.

Chris will be presenting again.

Tuesday January 29th 6:00-8:00pm
Location: San AntonioNew Horrizons.

Wednesday January 30th 6:15-8:15pm
Location: DallasMatrix Consulting, 5151 Beltline Road, Suite 1010 Dallas, TX 75254.

Thursday January 31st 6:30-8:30pm
Location: HoustonTBD.

Many thanks to the Houston, San Antonio and Dallas Java User Group folks, Jim Bethancourt, Ryan Stewart and Dan Kern-Ekins for assisting in the organisation.

Be sure Chris gets to enjoy some of that great Texan prime beef!

Rediscovering Modularity in Brazil

We’re excited, first trip to Brazil!

Founder, Chris Chedgey, will be presenting on Rediscovering Modularity in Brazil next week.

First up in Sao Paulo at JavaOne Latin America on Tuesday December 4th at 13:30.

Then at  the INE Auditorium, UFSC UniversityFlorianopolis on Thursday, December 6th at 18:30, with many thanks to Marcio Marchini at BetterDeveloper and Prof. Patricia Vilain at UFSC.

In between there will be one or two customer visits but I suspect there will be a day on the beach at Florianopolis.

Lucky boy, “the place to be” apparently!

Florianopolis

Rediscovering Modularity – upcoming presentations

Founder, Chris Chedgey, will be presenting on Rediscovering Modularity in an existing codebase (and more) over the coming weeks in The Netherlands, Sweden and Germany. Dates in the US will be announced shortly.

The Netherlands at The JFall Conference
Presentation details: Rediscovering Modularity
Wednesday  October 31st 11.35 – 12.25
Location: Hart van Holland, Nijkerk

Sweden at The Oredev Conference
Tuesday 6th Nov 13.30 – 16.30
Rediscovering Modularity with Restructure101 (3 Hour Workshop)
Note: Registration for this workshop is required.
Friday 9th Nov 11.10 – 12.00
Retrofitting a software architecture to an existing code-base
Location: Malmö

Germany at German Java User Group meetings
Tuesday November 20th 18:30
Location: Große Falterstraße 6a, Stuttgart
Presentation details: Rediscovering Modularity

Wednesday November 21st 19:00
Location: Frankfurter Ring 105 – D-80807, Munich

Thursday November 22nd 18:00
Location: NCC NürnbergConvention Center Ost, Messezentrum, 90471 Nürnberg
Room: “Tagungsraum in der Zwischenebene”
Free entrance to DOAG conference (and party) starting 16:30
(U1, exit Messe) 

Synopsis

The principles of modularity have been applied to engineering projects since Gorak built the wheel, and Thag the barrow of the world’s first wheelbarrow. Thag’s barrow didn’t care that the wheel was first hewn from rock, and later upgraded to a lighter, wooden one, and the same wheel design was reused for the world’s first chariot.

Analogous abstraction techniques are taught in Software Engineering 101. We apply these routinely as we develop and continuously refactor the code encapsulated within classes. However when the number of classes reaches some limit (‘Uncle’ Bob Martin has suggested 50 KLOC), higher level abstractions are needed in order to manage the complexity of the growing codebase. This limit is usually overshot, and the team is soon drowning in an ocean of classes. It is time to organize the classes into a hierarchy of modules.

 

How to create a killer call graph for impact analysis

Sometimes you really want to cut through a code-base to discover all the functions that can get called in response to a specific function being invoked, with all the other code removed from the picture.

Or you might be coming at it bottom up – when you change a particular function, what is the subset of the code-base that can be affected? Or you might even want to ask the question from both ends – what are all the code items that could get called (or impacted) between 2 specific functions?

(Updated to reflect current Structure101 UI changes, plus video added at the end).

Here are the steps to get you that diagram.

  1. Download and install the free trial of Structure101 Studio.
  2. Start Structure101 Studio, file/new, and enter the minimal set of information. Mostly you are just pointing the wizard at your byte-code. Just make sure you set the “granularity” option to be “detailed”. When you have created the project you will see something like this (this is a Maven project (actually THE Maven code-base as it happens)):

    This is an LSM of your code-base, where as far as possible dependencies flow downward.

  3. Assume you want a call tree starting at a function called getDownstreamProjects in a class called DefaultProjectDependencyGraph in the maven-core project. Drill down to this function by double-clicking on the relevant parent items:

  4. Right click on the starting function, tag, used by selected, indirectly

  5. This causes a subset of the model to be tagged with a blue dot. You can now filter the model so that only the tagged items are included by using the “filter in (isolate) tagged items” button on then main toolbar:

    Which gives you something like this:

    Which is a heavily filtered LSM model – notice there are now just 3 POM’s left, and if you drill into these, each contains just a subset of the packages and classes that are actually in the project. Drilling down in each of the POMs/packages/classes is feasible if there is not too much in the filtered LSM and gives something like this:

  6. Now untag everything (just for tidiness really) using the relevant button on the main toolbar:
    SNAG-0055
  7. Select the class slice from the LSM window toolbar:
    SNAG-0057
    This removes all the POM’s, Jars, and packages from the LSM, leaving just classes. If you expand each class (by double-clicking) you end up with something like this:

    Remember that you are only seeing code that is in the dependency closure of getDownstreamProjects, so for instance there are methods and fields inside those classes that are not shown. The 2 interfaces and 1 class are not expanded because they contain no child items – that is they contain no methods or fields that are indirectly used by getDownstreamProjects – and so are not expandable. If you’re not seeing all the dependencies, but want to see them, click the “show all dependencies button”

    (default is to show only dependencies on selected items)

  8. You use the same principle to work upwards, to filter on items that may be impacted by a specific item. For example if you want to show only paths that start at getDownstreamProjects (the current filter) and end at Parent::groupId, then select groupId, tag/uses selected/indirectly, filter in tagged items, which gives this further reduced LSM:

  9. You can add back some structural context by selecting the “leaf package” slice (same menu as the “outer class slice” earlier on):

    Which gives this (you might have to expand through the packages by double clicking on each):

    Or switch off slicing altogether (deselect the slice button) to show the complete paths of the subset of items:

That’s it!

Here’s a short (silent) video that shows the screen as I went through these steps:

Fear the Ubergeek

Ubergeek is a strange and wonderful creature. He possesses supernatural powers for retaining vast swathes of detail in his head at one time. This makes him designed for coding, a priceless gem when you need to get version 1 of a new project out the door ASAP. He is the 10x productivity guy of programming lore.

However this uncanny ability to surf a vast sea of detail has a cost – Ubergeek has no need (or ability?) to construct the islands of abstraction that the more pedestrian human brain requires in order to navigate the otherwise featureless terrain. And when neither he nor others understand this fundamental difference, Ubergeek can be lethal to your project.

Some say Ubergeek can be spotted for his taste in clever, quirky/British movies such Memento, Brazil, or Monty Python, though this stereotype is probably apocryphal. For a more reliable diagnosis, watch what happens when you distribute a complex spreadsheet; Ubergeek is the one ignoring your carefully crafted summary charts, and directly processing the rows and columns of numbers. Pretty pictures just hide an arbitrary subset of the truth, and as such serve no purpose for our master of detail. This is not a problem in the case where a regular geek produces the data for other regular geeks – no skin off your nose that Ubergeek ignores your summaries. It is however a problem when Ubergeek is the author of the raw data, and everyone else needs the abstraction that he will not be able to provide.

Of course Ubergeek doesn’t waste time making recreational spreadsheets. He goes for the hard drugs, the most complex engineering artefact we construct – the monolithic code-base.

As already mentioned, Ubergeek is perfect for creating the first version of a new code-base. While it is relatively small, there is less need for abstractions above the implementation level, even if there were regular geeks in need of such abstraction, which there may not be if you’re letting Ubergeek at it on a solo run, perhaps with just a couple of awe-struck underlings to do the bits that bore Ubergeek. Also, it is generally regarded as a good thing to let the higher-level structures emerge from the early, detailed implementation, rather than trying to be prescriptive before the natural structures are understood. So far, as long as your fledgling concept is sound, everything is set up to maximize success.

The waters start to get choppy when your project moves beyond Version 1. As the critical first step, v1 is so important that it is easy to forget that it is upon v2+ that your product idea or fledgling company floats or sinks. And simply continuing to develop v2 in the same way as you did v1 is not usually an option. As the code face grows, you have no choice but to add more engineers to the mix. Plus there is a distinct danger that, with the novelty diminishing, Ubergeek will grow restless, and you do not want this guy getting bored on your baby!

As you start beefing up your development team and processes, you will notice tension brewing between Ubergeek and the new team members. The new guys want to understand the big picture, but Ubergeek doesn’t do big pictures – he needs specific questions which he answers in terms of specific functions, classes and interfaces. He gets frustrated that the purity of his code keeps getting soiled, and the new guys are nervous about making changes that might incur his wrath.

Schedule pressure from above helps the team work through this awkwardness at first, and the code starts to change and grow rapidly. As changes are made by regular geeks, in the absence of a clearly understood architecture, the code drifts from Ubergeek’s implementation-level designs, inevitably becoming more and more complicated, until it is more than can fit in even his head. Now nobody understands the code-base. Now the vicious circle has started – complexity reduces comprehension, which increases complexity – with the expected impact on schedules, quality, and morale. “Fix the architecture!!” say the regular geeks; “F*ch the architecture, fix the code!!” retorts Ubergeek.

Management usually side with Ubergeek. Apart from the fact that he has been on the project since the start, and management know that he is exceptionally smart, they have a natural abhorrence to any talk of taking longer over the next iteration in order to make the next 10 iterations shorter – such advocates are seen as naïve idealists, more interested in the “art” of software engineering than the software business itself.

So what is the outcome of this situation. Most often the project continues on with nobody happy. The “idealists” become despondent and resigned to the fact that the best they can do is to keep out of trouble and take home a salary. New recruits try valiantly for months to make sense of the codebase, but eventually go the way of the idealists. Ubergeek becomes a hacker, mopping up the code-base where he can, insomuch as he still has some understanding of how it “should” work. He probably gets tired of the raging and the fighting after a while, and moves onto the next new project, where he happily restores his faith in his own infallibility. The original project has become an average PITA project to work on, with below average quality and productivity characteristics.

If I had the choice, I would go for a solid engineer and a couple of eager disciples for my new project, rather than an omnipotent Ubergeek-management coalition, every time. We may take longer to get to v1, but the real goal of a successful product and viable code-base is a possibility. Plus, who has time for all that drama?!

The best of all worlds is an enlightened Ubergeek/Engineer/management team, where everybody plays the role they’re designed for – now that would be truly invincible…

Restructure101 version 2 released

For Restructure101 version 2 (press release) we rolled in a load of feature requests that came back from users of version 1. A lot of these were around making the existing functionality more accessible. Others make version 1 use-cases much quicker/easier to achieve. In particular, the combination of filtering and slicing is a massive improvement in isolating the exact subset of structure that you need at any time – it is well worth getting very used to these new commands.

image

Above is the fully expanded UI. It’s got a bit busier but it is easier to control the display to hide the different panels. The main differences from version 1 are:

  • The overall layout is now the same as Structure101; the borders between the 4 main panels can be adjusted by dragging, and any panel can be maximized by double clicking its header.
  • The dependency breakout, which was on a pop up in v1, is now in the main UI center bottom (like Structure101).
  • All the Tangles and Fat related information (and only that info) is now co-located in the left panel. This means that the items which contribute to non-zero values in the chart are shown immediately below the chart.
  • The version 1 viewing control panel (which was on the right of the LSM) is now moved to commands on the toolbars. For example the show all/between/selected edges option are on the dropdown second from the left on on the LSM toolbar.
  • Other than the tangles/fat, all the lists that were on the lhs of the LSM are now distributed across the 3 sections of the right panel. Don’t miss the drop-down to select the notables in the top right panel. Also we added a “tags” tab on the center right viewer to make it easy to find tagged items on a large LSM.
  • A new filtering framework is accessed via the main toolbar, and the context menu. Filter commands work off the currently selected items, and you can filter items out (i.e. hide them), or filter items in (i.e. isolate them by filtering everything else out). You can unflilter the contents of just selected items, and clear all filtering so that nothing is hidden. There is a filter stack that you can walk using the forward and back arrows.
  • Hiding of tagged items has been removed – similar functionality is available from the filter in/out tagged items buttons. This is a bit different from the old filter/tags in that it applies to only the current sandbox/action list.
  • You can now create “slice” views of the LSM from the dropdown on the main LSM viewer. These are like the Slice perspective in Structure101, removing composition from the visible model. This means that you no longer need to “flatten to classes” in order to see all the classes in a region of the codebase (“flatten to classes” remains as an action).
  • We changed the name from “sandbox” to “action list” (in preparation for some future features).
  • Setting an action list to be the “to do” action list is now done using the “share” button – we felt this was more familiar/”normal”.
  • We optimized the Over-complexity Chart to not calculate the fat and tangles values for every action on very large action list, just every nth (configurable) action – speeds up loading very large action lists on very large codebases.

There are lots of other less visible changes. If you can’t find something, it is there somewhere, just ask and we’ll tell you where we moved it.

Happy restructuring!

The value of codebase structure

Most experienced engineers would accept that there are attributes of a code base under the heading of “structure” or “architecture”, that make a substantial difference to the ease of development. However, since there is some cost to improving these attributes, I do get rightly asked asked for data that quantifies the benefit, usually from higher up the food chain where code quality is less visible, and hard numbers more so.

The benefits promised by good codebase structure are mainly in developer productivity. Developer productivity is hard enough to quantify in its own right, let alone predicting differences of its value between doing things one way versus doing them another. Try suggesting that the team should develop the product twice so that you can answer your boss’s question! Luckily there is at least one study that did just this…

The study (see page F-8/9) was carried out by the US DoD. A change was required that added 3KLOC to a 50KLOC code base. This was implemented once on the original code-base. Then the original codebase structure was improved, and the changes were made again. This time it cost only 1/2 as much to modify the structured software and took less than 1/2 the time, with 1/8 the number of bugs. This was not a simple matter of the same team doing a better job the second time round, they actually used “2 teams of equal ability”!

In “Avoiding architectural degeneration: an evaluation process for software architecture” the authors make a convincing case that maintainability (which I consider to be much the same as development productivity in iterative development) is substantially influenced by the degree of coupling between modules. The coupling can be quantified (they suggest 2 metrics), and so the relative ease of development between 2 structures can be assessed. However the specific relationship between the coupling metrics and the cost of maintenance/development is not addressed, so unfortunately this remains qualitative.

Getting a bit closer to the meat of the matter, “A Case Study of Measuring Degeneration of Software Architectures From A Defect Perspective” (IEEE) looked at defect fixes that affected code in more than one component as one indicator of architectural degeneration. It is possible to infer a cost from these metrics since a multi-component defect (MCD) was associated with more files in total requiring fixing (than defects confined to a single component). There was also a correlation between MCDs and the overall connectedness of the components, which would indicate a relationship between the complexity of the architecture (poor structure) and the impact/cost of change.

A very nice study from Accenture and Infosys, “Modularization of a Large-Scale Business Application: A Case Study”, reports some concrete benefits to modularizing an “unmanageable monolith” with many MLOC.  The effort to locate a fault is reduced since it is now only necessary to search one component (<10% of the codebase) rather than the entire codebase, which reduced the effort by 50% for simple faults, and 20-25% for complex faults. A 15% reduction in the faults detected during regression testing was observed (they are working to reduce the number of test cases due to the improved structure, which they expect to reduce overall regression testing effort). Runtime memory requirements reduced by 43%, since it is generally necessary to load only a subset of the modules and libraries. Link, build and startup times were reduced by 30%.

I’ll add to this when I come across more nice material.

Structure101 adds Doxygen and Understand support for C/C++, Delphi/Pascal and Python

Thanks to Marcio Marchini who developed Doxygen and Understand flavors or “third-party parser plugins”, Structure101, Restructure101 and Structure101 Build now support the parsing of:

  • C, C++ using Doxygen, or Understand from Scientific Toolworks;
  • Delphi/Pascal using ModelMaker, or Understand;
  • beta support for Python using Understand.

So head over to our downloads page if you are working with any of the above and wish to:

  • better understand your software architecture,
  • need to improve communication of your architecture to your development team, or
  • know it’s time to look at refactoring your architecture because there’s too much ‘everything uses everything’.

Getting started is easy! There is a wealth of general “how-to” videos, not to mention:

As one customer who is already up and running put it:

Doxygen parses our ~20 000 C++ files and finds ~100 000 include dependencies between them in less than 15 minutes. Restructure101g eats it up in exactly 30 seconds and Structure101 checks the architecture in about the same time. It is quick and accurate!