Abstract
Software architecture conformance is a key software quality control activity that aims to reveal the progressive gap normally observed between concrete and planned software architectures. However, formally specifying an architecture can be difficult, as it must be done by an expert of the system having a high level understanding of it. In this paper, we present a lightweighted approach for architecture conformance based on a combination of static and historical source code analysis. The proposed approach relies on four heuristics for detecting absences (something expected was not found) and divergences (something prohibited was found) in source code based architectures. We also present an architecture conformance process based on the proposed approach. We followed this process to evaluate the architecture of two industrial-strength information systems, achieving an overall precision of 62.7 % and 53.8 %. We also evaluated our approach in an open-source information retrieval library, achieving an overall precision of 59.2 %. We envision that an heuristic-based approach for architecture conformance can be used to rapidly raise architectural warnings, without deeply involving experts in the process.
Similar content being viewed by others
Notes
In a depends predicate, the pattern _ (underscore) matches any value.
To improve the paper’s comprehension, we translated the class names from Portuguese to English.
Mean precision is the average precision of the iterations evaluated by the architect, whereas Overall precision is the total number of true warnings by the total number of warnings.
References
Araujo JE, Souza S, Valente MT (2011) Study on the relevance of the warnings reported by Java bug-finding tools. IET Software 5 (4):366–374
Baeza-Yates R, Ribeiro-Neto B (2011) Modern Information Retrieval: The Concepts and Technology Behind Search. Addison Wesley Professional
Bittencourt RA (2012) Enabling Static Architecture Conformance Checking of Evolving Software. Universidade Federal de Campina Grande, PhD thesis
Brunet J, Guerreiro D, Figueiredo J (2011) Structural conformance checking with design tests: An evaluation of usability and scalability. In: 27th International Conference on Software Maintenance (ICSM), pp 143–152
Brunet J, Murphy GC, Serey D, Figueiredo J (2014) Five years of software architecture checking: A case study of Eclipse. IEEE Software:1–6
Copeland T (2005) PMD Applied. Centennial Books
Ducasse S, Anquetil N, Bhatti MU, Hora A, Laval J, Girba T (2011) MSE and FAMIX 3.0 an Interexchange Format and Source Code Model Family. Technical report. Software Composition Group - SCG, RMOD - INRIA Lille - Nord Europe
Ducasse S, Pollet D (2009) Software architecture reconstruction: A process-oriented taxonomy. IEEE Trans Software Eng 35 (4):573–591
Eichberg M, Kloppenburg S, Klose K, Mezini M (2008) Defining and continuous checking of structural program dependencies. In: 30th International Conference on Software Engineering (ICSE), pp 391–400
Fowler M (2002) Patterns of Enterprise Application Architecture. Addison-Wesley
Hammad M, Collard ML, Maletic JI (2009) Automatically identifying changes that impact code-to-design traceability. In: 17th IEEE/ACM International Conference on Program Comprehension (ICPC), pp 20–29
Holt RC (1998) Structural manipulations of software architecture using tarski relational algebra. In: 5th Working Conference on Reverse Engineering (WCRE), pp 210–219
Hora A, Anquetil N, Ducasse S, Valente MT (2013) Mining system specific rules from change patterns. In: 20th Working Conference on Reverse Engineering (WCRE), pp 1–10
Hou D, Hoover JH (2006) Using SCL to specify and check design intent in source code. IEEE Trans Software Eng 32(6):404–423
Hovemeyer D, Pugh W (2004) Finding bugs is easy. SIGPLAN Notices 39(12):92–106
Johnson SC (1977) Lint: A C program checker. Technical Report 65, Bell Laboratories
Kim S, Ernst MD (2007) Which warnings should I fix first?. In: 15th International Symposium on Foundations of Software Engineering (FSE), pp 45–54
Kim S, Pan K, Whitehead EEJ Jr. (2006) Memories of bug fixes. In: 14th International Symposium on Foundations of Software Engineering (FSE), pp 35–45
Knodel J, Popescu D (2007) A comparison of static architecture compliance checking approaches. In: 6th Working IEEE/IFIP Conference on Software Architecture (WICSA), p 12
Koschke R (2010) Incremental reflexion analysis. In: 14th European Conference on Software Maintenance and Reengineering (CSMR), pp 1–10
Koschke R, Simon D (2003) Hierarchical reflexion models. In: 10th Working Conference on Reverse Engineering (WCRE), pp 36–45
Larus JR, Ball T, Das M, DeLine R, Fahndrich M, Pincus J, Rajamani SK., Venkatapathy R (2004) Righting software. IEEE Software 21(3):92–100
Livshits B, Zimmermann T (2005) DynaMine: finding common error patterns by mining software revision histories. In: 13th International Symposium on Foundations of Software Engineering (FSE), pp 296–305
Maffort C, Valente MT, Anquetil N, Hora A, Bigonha M (2013) Heuristics for discovering architectural violations. In: 20th Working Conference on Reverse Engineering (WCRE), pp 222–231
Maffort C, Valente MT, Bigonha M, Hora A, Anquetil N (2013) Mining architectural patterns using association rules. In: 25th International Conference on Software Engineering and Knowledge Engineering (SEKE), pp 375–380
Mens K, Kellens A, Pluquet F, Wuyts Roel (2006) Co-evolving code and design with intensional views A case study. Comput Languages Syst & Struct 32(2-3):140–156
Mileva YM, Wasylkowski A, Zeller A (2011) Mining evolution of object usage. In: 25th European conference on Object-oriented programming, pp 105–129
Murphy G, Notkin D, Sullivan K (1995) Software reflexion models Bridging the gap between source and high-level models. In: 3rd Symposium on Foundations of Software Engineering (FSE), pp 18–28
Murphy G, Notkin D, Sullivan KJ (2001) Software reflexion models bridging the gap between design and implementation. IEEE Trans Software Eng 27:364–380
Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi J, Nguyen TN (2010) Recurring bug fixes in object-oriented programs. In: 32nd International Conference on Software Engineering (ICSE), pp 315–324
Nierstrasz O, Ducasse S, Gı̌rba T (2005) The story of Moose: an agile reengineering environment. In: Foundations of Software Engineering (FSE), pp 1–10
Palomba F, Bavota G, Penta MD, Oliveto R, Lucia AD, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: 28th International Conference on Automated Software Engineering (ASE), pp 268–278
Passos L, Terra R, Diniz R, Valente MT, Mendonca N (2010) Static architecture-conformance checking An illustrative overview. IEEE Software 27(5):82–89
Perry DE, Wolf AL (1992) Foundations for the study of software architecture. Software Eng Notes 17(4):40–52
Sarkar S, Maskeri G, Ramachandran S (2009) Discovery of architectural layers and measurement of layering violations in source code. J Syst Software 82:1891–1905
Santonu S, Ramachandran S, Kumar GS, Madhu K, Iyengar K, Rangarajan, Sivagnanam S (2009) Modularization of a large-scale business application: A case study. IEEE Software 26:28–35
Silva L, Valente MT, Maia M (2014) Assessing modularity using co-change clusters. In: 13th International Conference on Modularity, pp 49–60
Terra R, Valente MT (2009) A dependency constraint language to manage object-oriented software architectures. Software: Practice and Experience 32(12):1073–1094
Terra R, Valente MT, Czarnecki K, Bigonha R (2012) Recommending refactorings to reverse software architecture erosion. In: 16th European Conference on Software Maintenance and Reengineering (CSMR), pp 335–340
Terra R, Valente MT, Czarnecki K, Bigonha RS (2013) A recommendation system for repairing violations detected by static architecture conformance checking. Software: Practice and Experience:1–36
Acknowledgments
Our research is supported by CAPES, FAPEMIG, and CNPq. We thank the architects of the SGA and M2M systems for validating the warnings raised by the proposed approach.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Rocco Oliveto, Massimiliano Di Penta and Romain Robbes
Appendices
Appendix A: Formal Definition
In this appendix section, we describe the heuristics proposed by ArchLint.
1.1 Notation
The definition of the heuristics relies on the following notation:
-
\(\mathcal {C}=\{c_{1}, c_{2}, ..., c_{n}\}\) is the set of all classes in the system under analysis.
-
C P={c p 1,c p 2,...,c p n } is the set of components in the high-level component model.
-
d e p e n d s(c 1,c 2,t,v) indicates that class c 1 has a dependency of type t with class c 2 in a given version v.
-
c o m p(c) is the component cp of a class c.
-
m o d(c) is the module m of a class c.
-
fi r s t(c) is the version in which class c was originally inserted in the repository.
-
H is the identifier of the last version of the system in the repository.
In a depends predicate, the pattern _ (underscore) matches any value. For example,
d e p e n d s(c 1,c 2,_,_) indicates that class c 1 depends on class c 2, despite the dependency type and the version.
1.2 Detecting Absences
D e p C o m p C l a s s(c,t,c p) is the set of classes in a component cp that—in the current version of the system—have a dependency of type t with a class c, as follows:
C l a s s C o m p(c p) is the set of classes in the component cp, as follows:
D e p S c a R a t e(c,t,c p) is the ratio between (i) the number of classes in component cp that have a dependency of type t with a target class c and (ii) the total number of classes in component cp, as follows:
C r e a t e d W i t h o u t D e p(c,t,c p) is the set of classes of a component cp that were committed in the repository for the first time without a dependency of type t with a target class c, as defined next:
D e p A d d(c,t,c p) is the set of classes in component cp initially created without a dependency of type t with a target class c but that later were maintained to include this dependency, as follows:
D e p I n s R a t e(c,t,c p) is the ratio between (i) the number of classes in the component cp originally created without a dependency of type t with a target class c, but that have this dependency in the last version of the system under analysis, and (ii) the total number of classes in component cp originally created without a dependency of type t with class c, as follows:
Finally, the candidates for absences in a component cp are defined as follows:
1.3 Detecting Divergences
1.3.1 Heuristic #1
D e p S y s M o d(m) is the set of classes in the current version of the system that have a dependency with classes of a module m, as follows:
D e p C o m p M o d(m,c p) is the set of classes in component cp that have a dependency with a module m, as defined next:
D e p S c a R a t e(m,c p) is the ratio between (i) the number of classes in component cp that have a dependency with a module m and (ii) the total number of classes in the current version of the system that have a dependency with classes of m, as follows:
D e p A d d A n y(m,c p) is the set of classes in component cp that have established—in any version of the system—a dependency with a class in module m, as defined next:
D e p D e l(m,c p) is the set of classes returned by D e p A d d A n y(m,c p) that in the current version of the system no longer have a dependency with classes in module m, as defined next:
D e p D e l R a t e(m,c p) is the ratio between (i) the number of classes in component cp that no longer have a dependency with classes in module m and (ii) the total number of classes in component cp that have established a dependency with any class in module m, as defined next:
H e a v y U s e r(m) is a function that returns the component whose classes mostly depend on classes located in module m, i.e., the component cp that provides the following maximal value:
However, this maximal value must be greater than 0.5. Otherwise, the function H e a v y U s e r returns null.
Finally, the candidates for divergences in a given component cp are defined as follows:
1.3.2 Heuristic #2
D e p A d d A n y(c,t,c p) is the set of classes in component cp that have established—in any version of the system—a dependency of type t with a class c, as defined next:
D e p D e l(c,t,c p) is the set of classes returned by D e p A d d A n y(c,t,c p) that no longer have a dependency of type t with a class c (i.e., the dependencies were removed), as defined next:
Additionally, D e p D e l R a t e(c,t,c p) is the ratio between (i) the number of classes in component cp that no longer have a dependency of type t with a class c, and (ii) the total number of classes in component cp that have established a dependency of type t with a class c, as defined next:
Finally, the candidates for divergences in a given component cp are defined as follows:
1.3.3 Heuristic #3
This heuristic assumes that r f(c p 1,c p 2) denotes the number of references from classes in component c p 1 to classes in component c p 2, as defined next:
D e p D i r W e i g h t(c p 1,c p 2) is defined as follows:
Finally, the candidates for divergences in a given component cp are defined as follows:
Appendix B: M2M Conformance Process
In this section, we show the results achieved after each iteration when detecting architectural violations in the M2M system. Table 14 shows the iterations performed for detecting absences. Tables 15 and 16 shows the results achieved by the second and third heuristics for detecting divergences, respectively. Heuristic #1 for divergences did not report warnings in the M2M system.
Appendix C: Lucene Conformance Process
In this section, we show the results achieved after each iteration when detecting architectural violations in the Lucene system. Tables 17, 18, and 19 shows the results achieved by the first, second and third heuristics for detecting divergences, respectively. The heuristic for absences did not report warnings in the Lucene system.
Rights and permissions
About this article
Cite this article
Maffort, C., Valente, M.T., Terra, R. et al. Mining architectural violations from version history. Empir Software Eng 21, 854–895 (2016). https://doi.org/10.1007/s10664-014-9348-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-014-9348-2