Mining architectural violations from version history | Empirical Software Engineering Skip to main content
Log in

Mining architectural violations from version history

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Software architecture conformance is a key software quality control activity that aims to reveal the progressive gap normally observed between concrete and planned software architectures. However, formally specifying an architecture can be difficult, as it must be done by an expert of the system having a high level understanding of it. In this paper, we present a lightweighted approach for architecture conformance based on a combination of static and historical source code analysis. The proposed approach relies on four heuristics for detecting absences (something expected was not found) and divergences (something prohibited was found) in source code based architectures. We also present an architecture conformance process based on the proposed approach. We followed this process to evaluate the architecture of two industrial-strength information systems, achieving an overall precision of 62.7 % and 53.8 %. We also evaluated our approach in an open-source information retrieval library, achieving an overall precision of 59.2 %. We envision that an heuristic-based approach for architecture conformance can be used to rapidly raise architectural warnings, without deeply involving experts in the process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. In a depends predicate, the pattern _ (underscore) matches any value.

  2. https://gforge.inria.fr/projects/verveinej.

  3. To improve the paper’s comprehension, we translated the class names from Portuguese to English.

  4. Mean precision is the average precision of the iterations evaluated by the architect, whereas Overall precision is the total number of true warnings by the total number of warnings.

References

  • Araujo JE, Souza S, Valente MT (2011) Study on the relevance of the warnings reported by Java bug-finding tools. IET Software 5 (4):366–374

    Article  Google Scholar 

  • Baeza-Yates R, Ribeiro-Neto B (2011) Modern Information Retrieval: The Concepts and Technology Behind Search. Addison Wesley Professional

  • Bittencourt RA (2012) Enabling Static Architecture Conformance Checking of Evolving Software. Universidade Federal de Campina Grande, PhD thesis

    Google Scholar 

  • Brunet J, Guerreiro D, Figueiredo J (2011) Structural conformance checking with design tests: An evaluation of usability and scalability. In: 27th International Conference on Software Maintenance (ICSM), pp 143–152

  • Brunet J, Murphy GC, Serey D, Figueiredo J (2014) Five years of software architecture checking: A case study of Eclipse. IEEE Software:1–6

  • Copeland T (2005) PMD Applied. Centennial Books

  • Ducasse S, Anquetil N, Bhatti MU, Hora A, Laval J, Girba T (2011) MSE and FAMIX 3.0 an Interexchange Format and Source Code Model Family. Technical report. Software Composition Group - SCG, RMOD - INRIA Lille - Nord Europe

    Google Scholar 

  • Ducasse S, Pollet D (2009) Software architecture reconstruction: A process-oriented taxonomy. IEEE Trans Software Eng 35 (4):573–591

    Article  Google Scholar 

  • Eichberg M, Kloppenburg S, Klose K, Mezini M (2008) Defining and continuous checking of structural program dependencies. In: 30th International Conference on Software Engineering (ICSE), pp 391–400

  • Fowler M (2002) Patterns of Enterprise Application Architecture. Addison-Wesley

  • Hammad M, Collard ML, Maletic JI (2009) Automatically identifying changes that impact code-to-design traceability. In: 17th IEEE/ACM International Conference on Program Comprehension (ICPC), pp 20–29

  • Holt RC (1998) Structural manipulations of software architecture using tarski relational algebra. In: 5th Working Conference on Reverse Engineering (WCRE), pp 210–219

  • Hora A, Anquetil N, Ducasse S, Valente MT (2013) Mining system specific rules from change patterns. In: 20th Working Conference on Reverse Engineering (WCRE), pp 1–10

  • Hou D, Hoover JH (2006) Using SCL to specify and check design intent in source code. IEEE Trans Software Eng 32(6):404–423

    Article  Google Scholar 

  • Hovemeyer D, Pugh W (2004) Finding bugs is easy. SIGPLAN Notices 39(12):92–106

    Article  Google Scholar 

  • Johnson SC (1977) Lint: A C program checker. Technical Report 65, Bell Laboratories

  • Kim S, Ernst MD (2007) Which warnings should I fix first?. In: 15th International Symposium on Foundations of Software Engineering (FSE), pp 45–54

  • Kim S, Pan K, Whitehead EEJ Jr. (2006) Memories of bug fixes. In: 14th International Symposium on Foundations of Software Engineering (FSE), pp 35–45

  • Knodel J, Popescu D (2007) A comparison of static architecture compliance checking approaches. In: 6th Working IEEE/IFIP Conference on Software Architecture (WICSA), p 12

  • Koschke R (2010) Incremental reflexion analysis. In: 14th European Conference on Software Maintenance and Reengineering (CSMR), pp 1–10

  • Koschke R, Simon D (2003) Hierarchical reflexion models. In: 10th Working Conference on Reverse Engineering (WCRE), pp 36–45

  • Larus JR, Ball T, Das M, DeLine R, Fahndrich M, Pincus J, Rajamani SK., Venkatapathy R (2004) Righting software. IEEE Software 21(3):92–100

    Article  Google Scholar 

  • Livshits B, Zimmermann T (2005) DynaMine: finding common error patterns by mining software revision histories. In: 13th International Symposium on Foundations of Software Engineering (FSE), pp 296–305

  • Maffort C, Valente MT, Anquetil N, Hora A, Bigonha M (2013) Heuristics for discovering architectural violations. In: 20th Working Conference on Reverse Engineering (WCRE), pp 222–231

  • Maffort C, Valente MT, Bigonha M, Hora A, Anquetil N (2013) Mining architectural patterns using association rules. In: 25th International Conference on Software Engineering and Knowledge Engineering (SEKE), pp 375–380

  • Mens K, Kellens A, Pluquet F, Wuyts Roel (2006) Co-evolving code and design with intensional views A case study. Comput Languages Syst & Struct 32(2-3):140–156

    Article  MATH  Google Scholar 

  • Mileva YM, Wasylkowski A, Zeller A (2011) Mining evolution of object usage. In: 25th European conference on Object-oriented programming, pp 105–129

  • Murphy G, Notkin D, Sullivan K (1995) Software reflexion models Bridging the gap between source and high-level models. In: 3rd Symposium on Foundations of Software Engineering (FSE), pp 18–28

  • Murphy G, Notkin D, Sullivan KJ (2001) Software reflexion models bridging the gap between design and implementation. IEEE Trans Software Eng 27:364–380

    Article  Google Scholar 

  • Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi J, Nguyen TN (2010) Recurring bug fixes in object-oriented programs. In: 32nd International Conference on Software Engineering (ICSE), pp 315–324

  • Nierstrasz O, Ducasse S, Gı̌rba T (2005) The story of Moose: an agile reengineering environment. In: Foundations of Software Engineering (FSE), pp 1–10

  • Palomba F, Bavota G, Penta MD, Oliveto R, Lucia AD, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: 28th International Conference on Automated Software Engineering (ASE), pp 268–278

  • Passos L, Terra R, Diniz R, Valente MT, Mendonca N (2010) Static architecture-conformance checking An illustrative overview. IEEE Software 27(5):82–89

    Article  Google Scholar 

  • Perry DE, Wolf AL (1992) Foundations for the study of software architecture. Software Eng Notes 17(4):40–52

    Article  Google Scholar 

  • Sarkar S, Maskeri G, Ramachandran S (2009) Discovery of architectural layers and measurement of layering violations in source code. J Syst Software 82:1891–1905

    Article  Google Scholar 

  • Santonu S, Ramachandran S, Kumar GS, Madhu K, Iyengar K, Rangarajan, Sivagnanam S (2009) Modularization of a large-scale business application: A case study. IEEE Software 26:28–35

    Article  Google Scholar 

  • Silva L, Valente MT, Maia M (2014) Assessing modularity using co-change clusters. In: 13th International Conference on Modularity, pp 49–60

  • Terra R, Valente MT (2009) A dependency constraint language to manage object-oriented software architectures. Software: Practice and Experience 32(12):1073–1094

    Google Scholar 

  • Terra R, Valente MT, Czarnecki K, Bigonha R (2012) Recommending refactorings to reverse software architecture erosion. In: 16th European Conference on Software Maintenance and Reengineering (CSMR), pp 335–340

  • Terra R, Valente MT, Czarnecki K, Bigonha RS (2013) A recommendation system for repairing violations detected by static architecture conformance checking. Software: Practice and Experience:1–36

Download references

Acknowledgments

Our research is supported by CAPES, FAPEMIG, and CNPq. We thank the architects of the SGA and M2M systems for validating the warnings raised by the proposed approach.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristiano Maffort.

Additional information

Communicated by: Rocco Oliveto, Massimiliano Di Penta and Romain Robbes

Appendices

Appendix A: Formal Definition

In this appendix section, we describe the heuristics proposed by ArchLint.

1.1 Notation

The definition of the heuristics relies on the following notation:

  • \(\mathcal {C}=\{c_{1}, c_{2}, ..., c_{n}\}\) is the set of all classes in the system under analysis.

  • C P={c p 1,c p 2,...,c p n } is the set of components in the high-level component model.

  • d e p e n d s(c 1,c 2,t,v) indicates that class c 1 has a dependency of type t with class c 2 in a given version v.

  • c o m p(c) is the component cp of a class c.

  • m o d(c) is the module m of a class c.

  • fi r s t(c) is the version in which class c was originally inserted in the repository.

  • H is the identifier of the last version of the system in the repository.

In a depends predicate, the pattern _ (underscore) matches any value. For example,

d e p e n d s(c 1,c 2,_,_) indicates that class c 1 depends on class c 2, despite the dependency type and the version.

1.2 Detecting Absences

D e p C o m p C l a s s(c,t,c p) is the set of classes in a component cp that—in the current version of the system—have a dependency of type t with a class c, as follows:

$$\begin{array}{@{}rcl@{}} DepCompClass(c,t,cp) = \left\{ x\in \mathcal{C}\ |\ depends(x,x,t,H) \wedge comp(x) = cp \right\} \end{array} $$

C l a s s C o m p(c p) is the set of classes in the component cp, as follows:

$$\begin{array}{@{}rcl@{}} ClassComp(cp) = \left\{ x \in \mathcal{C} \ | \ comp(x) = cp \right\} \end{array} $$

D e p S c a R a t e(c,t,c p) is the ratio between (i) the number of classes in component cp that have a dependency of type t with a target class c and (ii) the total number of classes in component cp, as follows:

$$\begin{array}{@{}rcl@{}} DepScaRate(c,t,cp)=\frac{|DepCompClass(c,t,cp)|}{|ClassComp(cp)|} \end{array} $$

C r e a t e d W i t h o u t D e p(c,t,c p) is the set of classes of a component cp that were committed in the repository for the first time without a dependency of type t with a target class c, as defined next:

$$\begin{array}{@{}rcl@{}} CreatedWithoutDep(c,t,cp) = \left\{x \in \mathcal{C}\ |\ comp(x) = cp\ \wedge \neg depends(x,c,t,first(x))\right\} \end{array} $$

D e p A d d(c,t,c p) is the set of classes in component cp initially created without a dependency of type t with a target class c but that later were maintained to include this dependency, as follows:

$$\begin{array}{@{}rcl@{}} DepAdd(c,t,cp) = \left\{x \in CreatedWithoutDep(c,t,cp)\ |\ depends(x,c,t,H)\right\} \end{array} $$

D e p I n s R a t e(c,t,c p) is the ratio between (i) the number of classes in the component cp originally created without a dependency of type t with a target class c, but that have this dependency in the last version of the system under analysis, and (ii) the total number of classes in component cp originally created without a dependency of type t with class c, as follows:

$$\begin{array}{@{}rcl@{}} DepInsRate(c,t,cp) = \frac{|DepAdd(c,t,cp)|}{|CreatedwithoutDep(c,t,p)|} \end{array} $$

Finally, the candidates for absences in a component cp are defined as follows:

$$\begin{array}{@{}rcl@{}} Absences(cp) = \left\{(x,c,t) \ |\ comp(x) = cp\ \wedge \neg depends(x,c,t,H)\ \wedge \right. \\ DepScaRate(c,t,p) \geq A_{sca}\ \wedge \\ \left. DepScaRate(c,t,p) \geq A_{ins}\right\} \end{array} $$

1.3 Detecting Divergences

1.3.1 Heuristic #1

D e p S y s M o d(m) is the set of classes in the current version of the system that have a dependency with classes of a module m, as follows:

$$\begin{array}{@{}rcl@{}} DepSysMod(m) = \left\{x \in \mathcal{C}\ |\ depends(x,c,\_,H)\ \wedge\ mod(c)= m\right\} \end{array} $$

D e p C o m p M o d(m,c p) is the set of classes in component cp that have a dependency with a module m, as defined next:

$$\begin{array}{@{}rcl@{}} DepCompMod(m,cp) = \left\{ x \in DepSysMod(m)\ |\ comp(x) = cp\right\} \end{array} $$

D e p S c a R a t e(m,c p) is the ratio between (i) the number of classes in component cp that have a dependency with a module m and (ii) the total number of classes in the current version of the system that have a dependency with classes of m, as follows:

$$\begin{array}{@{}rcl@{}} DepScaRate(m,cp) = \frac{|DepCompMod(m,cp|)}{|DepSysMod(m)|} \end{array} $$

D e p A d d A n y(m,c p) is the set of classes in component cp that have established—in any version of the system—a dependency with a class in module m, as defined next:

$$\begin{array}{@{}rcl@{}} DepAddAny(m,cp) = \left\{x \in \mathcal{C}\ |\ comp(x) = cp\ \wedge\ depends(x,c,\_,\_)\ \wedge mod(c) = m \right\} \end{array} $$

D e p D e l(m,c p) is the set of classes returned by D e p A d d A n y(m,c p) that in the current version of the system no longer have a dependency with classes in module m, as defined next:

$$\begin{array}{@{}rcl@{}} DepDel(m,cp) = \left\{x \in DepAddAny(m,cp)\ |\ \neg depends(x,c,\_H)\ \wedge \mod(c) = m\right\} \end{array} $$

D e p D e l R a t e(m,c p) is the ratio between (i) the number of classes in component cp that no longer have a dependency with classes in module m and (ii) the total number of classes in component cp that have established a dependency with any class in module m, as defined next:

$$\begin{array}{@{}rcl@{}} DepDelRate(m,cp)= \frac{|DepDel(m,cp)|}{|DepAddAny(m,cp)|} \end{array} $$

H e a v y U s e r(m) is a function that returns the component whose classes mostly depend on classes located in module m, i.e., the component cp that provides the following maximal value:

$$\underset{\forall cp\ \in \ CP}{\max} \left(\frac{|DepCompMod(m,cp)|}{|DepSysMod(m)|}\right) $$

However, this maximal value must be greater than 0.5. Otherwise, the function H e a v y U s e r returns null.

Finally, the candidates for divergences in a given component cp are defined as follows:

$$\begin{array}{@{}rcl@{}} Div_{1}(cp) = \lbrace \ (x, c) & |\ \ comp(x) = cp\ \wedge\ mod(c) = m\ \wedge\ &depends(x, c,\!\_,H)\ \wedge \\ &{}DepScaRate(m, cp)\ \leq\ D_{sca} \wedge \\ &{\kern-2.5pt}DepDelRate(m, cp)\ \geq\ D_{del} \wedge \\ &{\kern-2.9pc}HeavyUser(m)\ \neq\ cp \ \rbrace \\ \end{array} $$

1.3.2 Heuristic #2

D e p A d d A n y(c,t,c p) is the set of classes in component cp that have established—in any version of the system—a dependency of type t with a class c, as defined next:

$$\begin{array}{@{}rcl@{}} DepAdd(c,t,cp) = \left\{x \in CreatedWithoutDep(c,t,cp)\ |\ depends(x,c,t,H)\right\} \end{array} $$

D e p D e l(c,t,c p) is the set of classes returned by D e p A d d A n y(c,t,c p) that no longer have a dependency of type t with a class c (i.e., the dependencies were removed), as defined next:

$$DepDel(c,t,cp)\,=\, \lbrace \; x \in DepAddAny(c,t,cp) \, | \, comp(x)\! =\! cp \ \wedge \ \neg depends(x, c, t,H) \; \rbrace $$

Additionally, D e p D e l R a t e(c,t,c p) is the ratio between (i) the number of classes in component cp that no longer have a dependency of type t with a class c, and (ii) the total number of classes in component cp that have established a dependency of type t with a class c, as defined next:

$$DepDelRate(c,t,cp) = \frac{|DepDel(c,t,cp)|}{|DepAddAny(c,t,cp)|} $$

Finally, the candidates for divergences in a given component cp are defined as follows:

$$\begin{array}{@{}rcl@{}} Div_{2}(cp) = \lbrace & (x, c, t)\; | \;comp(x) = cp \ \wedge \ depends(x, c, t,H) \wedge\\ &{\kern.1pc}DepScaRate(c,t,cp) \leq D_{sca} \wedge\\ &{\kern-.2pc}DepDelRate(c,t,cp) \geq D_{del} \;\rbrace \end{array} $$

1.3.3 Heuristic #3

This heuristic assumes that r f(c p 1,c p 2) denotes the number of references from classes in component c p 1 to classes in component c p 2, as defined next:

D e p D i r W e i g h t(c p 1,c p 2) is defined as follows:

$$\begin{array}{@{}rcl@{}} DepDirWeight(cp_{1},cp_{2})=\frac{rf(cp_{1},cp_{2})}{rf(cp_{1},cp_{2})+rf(cp_{2},cp_{1})} \end{array} $$

Finally, the candidates for divergences in a given component cp are defined as follows:

$$\begin{array}{@{}rcl@{}} Div_{3}(cp_{1}) = \left\{ (x,c)\ |\ comp(x) = cp_{1}\ \wedge comp(c) = cp_{2}\ \wedge cp_{1} \neq cp_{2}\ \wedge \right. \\ depends(x,c,\_,H)\ \wedge \\ \left. D_{dir} \leqslant DepDirWeight(cp_{1},cp_{2}) < 0.5 \right\} \end{array} $$

Appendix B: M2M Conformance Process

In this section, we show the results achieved after each iteration when detecting architectural violations in the M2M system. Table 14 shows the iterations performed for detecting absences. Tables 15 and 16 shows the results achieved by the second and third heuristics for detecting divergences, respectively. Heuristic #1 for divergences did not report warnings in the M2M system.

Table 14 Detecting absences in the M2M system
Table 15 Detecting divergences in the M2M system using Heuristic #2
Table 16 Detecting divergences in the M2M system using Heuristic #3

Appendix C: Lucene Conformance Process

In this section, we show the results achieved after each iteration when detecting architectural violations in the Lucene system. Tables 17, 18, and 19 shows the results achieved by the first, second and third heuristics for detecting divergences, respectively. The heuristic for absences did not report warnings in the Lucene system.

Table 17 Detecting divergences in Lucene using Heuristic #1
Table 18 Detecting divergences in Lucene using Heuristic #2
Table 19 Detecting divergences in Lucene using Heuristic #3

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maffort, C., Valente, M.T., Terra, R. et al. Mining architectural violations from version history. Empir Software Eng 21, 854–895 (2016). https://doi.org/10.1007/s10664-014-9348-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-014-9348-2

Keywords

Navigation