Common sense and Code Quality - Part 2

This is the second article in the series of my articles on "Code Quality". You might want to read the first article of this series at Common sense and Code Quality - Part 1. It is also available at dzone.

Structural Analysis

Tangles i.e. Cyclic dependencies in structure of Junit4
In this article about "code quality" I am going to talk about "quality of the software structure" in particular.

The theme of the sequence of these article takes a very simplistic and common sense approach to code quality. The intent is to demystify code quality and help project teams pick process and tool that makes sense to them.

I am going to try and keep the article as simple as I can. However, the audience be aware that this topic i.e. "structural analysis of software code" has been and continues to be a fairly involved subject. Mathematicians and computer scientists have published seminal work on this subject as early as 1970s.

Fortunately, some excellent material is available on this subject in the public domain. I would particularly like to call out the following works that I have relied on heavily for data used in this article.
1.   "A Complexity Measure" published in IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-2, NO.46, DECEMBER 1976 by Thomas J. McCabe.
2.  "A Metrics Suite for Object Oriented Design" published in IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 20, NO. 6, JUNE 1994, by Shyam R. Chidamber and Chris F. Kemerer.
3.  "OO Design Quality Metrics, An Analysis of Dependencies" in 1994 by Robert Martin.
4.  "Design Principles and Design Patterns" published by Robert Martin. 

Let me try and present the gist of my interpretation of these works, in the following sections.

Patterns

Let's start by enlisting the basic fundamental patterns of bad code structure. These are intuitive in nature and do not have a mathematical or scientific definition.

Patterns of Structural Flaws Definition and Explanation
Rigidity The software is difficult to change.
Fragility Making changes in one part of the software causes breakage in conceptually unrelated part of the software. 
Immobility It is difficult to move around components of the code as code is not sufficiently modular. 
Viscosity Wrong practices are so deep rooted in the software that it is easier to keep continuing with the wrong practices, rather than introducing the right practices.
Opacity The system is difficult to understand.

Matrices

The matrices are concrete, measurable items, with scientific and mathematical definition. Standard tools are available, that will measure them for your code base. Of course the list of matrices or tools supplied here are not exhaustive.

Matrices Definition and Explanation Tools
Number of Classes If a comparison is made between projects with identical functionality, those projects with more classes are better abstracted. You would want to keep this number down. Following OOPs concepts efficiently should help.  Sonar (free, open source) 
Lines of Code (LOC) If a comparison is made between projects with identical functionality, those projects with fewer lines of code has superior design and requires less maintenance. Sonar (free, open source)  
Number of Children (NOC) It is the number of immediate sub-classes of a class. Try to keep it down. Else classes become too complex. Stan4J
Response for Class (RFC) This the count of all methods implemented within the class plus the number of methods accessible to an object of this class due to implementation.
Try to keep it lower. Higher the RFC, higher is the effort to make changes.
Sonar (free, open source),  Stan4J
Depth of Inheritance Tree (DIT) Maximum inheritance path from the class to the root class. Try to keep it under 5.  Stan4J
Weighted Methods Per Class (WMC) Average number of methods defined in class. Try to keep it under 14.  Stan4J
Coupling between Object Classes (CBO) Number of classes to which a class is coupled. Try to keep it under 14.  Stan4J
Lack of Cohesion of Methods (LCOM4) It measures the number of "connected components" in a class. A low value suggests that the code is simpler and reusable. A high value suggests that the class should be broken up into smaller classes. Try to break down the class if this matrix become higher than 2.  Sonar (free, open source),
Stan4J
Cyclomatic Complexity (CC) Measure of different executable paths through a module. Higher number of executable paths through the code means more effort to test completely. This is turn makes it more difficult to understand and change. Try to keep this value under 10.  JDepend (free, open source) 
Distance (D) Distance from the idealized line of A + I = 1. The smaller the distance of your software from the idealized line, the better you are.

Abstractness (A) = Na/Nc.
where,
Na = Number of abstract classes, and
Nc = Number of concrete classes

Instability (I) = Ce / (Ce + Ca).
where,
Afferent Couplings(Ca) = The number of other packages that depend upon classes within the package.
Efferent Couplings(Ce) = The number of other packages that the classes in the package depend upon.
JDepend (free, open source) , Stan4J 


So, matrices are there and so are the thresholds and tools to report on them. If you read up the material I had quoted at the beginning of the article, you will find many more matrices. I can safely recommend the use of at least Sonar and the basic matrices that Sonar reports on. That is the very least that any enterprise grade software should have.

As I had mentioned in the first article of this sequence, start by measuring. Compare against the figures of the same project build on build and make small incremental changes. Small baby steps in the right direction taken diligently build on build will do wonders. Just don't for the big kill, measure against the so called "industry standard" and everything should be alright.

Beyond Matrices

With all due respect to the matrices, their utility is limited to doing a health check on the existing code. Given the number of matrices and the plethora of tools to measure them (add conflicting views among technocrats about the efficacy of the tools and matrices) it soon gets confusing. It is like looking at admin panel with all dials and bulbs going berserk while you frantically try to figure out how to appease all. What you also need is a tool to analyse all these matrices and point out straightaway the complicated and vulnerable parts of your code. Of course it helps if it does so with an intuitive visual interface. There are a few software which does just this (unfortunately none of them is free). I have used and have quite liked Structure 101. It reports on "fat" packages, classes, designs etc, which is it's way of saying that it thinks that the "fat" artifacts are excessively complex and hence tentative candidates to refactor / restructure. These artifacts generally have tangles (cyclic dependencies) and the tool does and excellent job of showing those.
The figure shows 17 packages in a tangle. This tangle report is created from Junit4 code base.
That brings me to the end of this article. In conclusion, I just want to say, creating simple code is a complex business. It is not (only) labor. It is skill. And like all skills, mastering tools of the trade is important. Knowing which tools to pick from the free opensource basket and which ones to pay for (because they are worth it) is crucial.

In the next article we will talk about creating future state architecture for the project (assuming it is a long running support and upgrade project) and how to measure increased conformance of the code base to future state architecture, build by build. 

Until then, happy coding.

A version of this article - slightly edited, is also available at this link at Javalobby.

If you want to get in touch, you can look me up at Linkedin or Google + .

No comments: