Model inference for spreadsheets | Automated Software Engineering Skip to main content
Log in

Model inference for spreadsheets

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Many errors in spreadsheet formulas can be avoided if spreadsheets are built automatically from higher-level models that can encode and enforce consistency constraints in the generated spreadsheets. Employing this strategy for legacy spreadsheets is difficult, because the model has to be reverse engineered from an existing spreadsheet and existing data must be transferred into the new model-generated spreadsheet. We have developed and implemented a technique that automatically infers relational schemas from spreadsheets. This technique uses particularities from the spreadsheet realm to create better schemas. We have evaluated this technique in two ways: first, we have demonstrated its applicability by using it on a set of real-world spreadsheets. Second, we have run an empirical study with users. The study has shown that the results produced by our technique are comparable to the ones developed by experts starting from the same (legacy) spreadsheet data. Although relational schemas are very useful to model data, they do not fit spreadsheets well, as they do not allow expressing layout. Thus, we have also introduced a mapping between relational schemas and ClassSheets. A ClassSheet controls further changes to the spreadsheet and safeguards it against a large class of formula errors. The developed tool is a contribution to spreadsheet (reverse) engineering, because it fills an important gap and allows a promising design method (ClassSheets) to be applied to a huge collection of legacy spreadsheets with minimal effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. More information about the degree can be found in http://mei.di.uminho.pt/?q=en/1213/ssd-uk.

  2. One spreadsheet, c6/Adbudget6.xls, was unreadable.

  3. We omit here the column labels, whose names depend on the number of columns in the generated table.

References

  • Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: ICSE ’06 Proceedings of the 28th International Conference on Software Engineering, pp. 182–191. ACM, New York, NY, USA (2006)

  • Abraham, R., Erwig, M., Kollmansberger, S., Seifert, E.: Visual specifications of correct spreadsheets. In: Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VLHCC ’05), IEEE Computer Society, pp. 189–196 (2005)

  • Alhajj, R.: Extracting the extended entity-relationship model from a legacy relational database. Inf. Syst. 28(6), 597–618 (2003)

    Article  MATH  Google Scholar 

  • Cheng, T., Rival, X.: An abstract domain to infer types over zones in spreadsheets. In: Miné, A., Schmidt, D. (eds.) Static Analysis. Lecture Notes in Computer Science, vol. 7460, pp. 94–110. Springer, Berlin (2012)

    Chapter  Google Scholar 

  • Codd, E.F.: A relational model of data for large shared data banks. Commun ACM 13(6), 377–387 (1970)

    Article  MATH  Google Scholar 

  • Connolly, T., Begg, C.: Database Systems, A Practical Approach to Design, Implementation, and Management, 3rd edn. Addison-Wesley, Harlow (2002)

    MATH  Google Scholar 

  • Cunha, J.: Model-based spreadsheet engineering. PhD thesis, Universidade do Minho (2011)

  • Cunha, J., Saraiva, J., Visser, J.: Discovery-based edit assistance for spreadsheets. In: 2009 IEEE Symposium on Visual Languages and Human-Centric Computing, IEEE, pp. 233–237 (2009a)

  • Cunha, J., Saraiva, J., Visser, J.: From spreadsheets to relational databases and back. In: PEPM ’09 Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation. ACM, New York, NY, USA, pp. 179–188 (2009b)

  • Cunha, J., Erwig, M., Saraiva, J. Automatically inferring ClassSheet models from spreadsheets. In: Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing, IEEE Computer Society, VLHCC ’10, pp. 93–100 (2010)

  • Cunha J, Mendes, J., Fernandes, J.P., Saraiva, J.: Embedding and evolution of spreadsheet models in spreadsheet systems. In: IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’11), IEEE Computer Society, pp. 186–201 (2011a)

  • Cunha, J., Visser, J., Alves, T., Saraiva, J.: Type-safe evolution of spreadsheets. In: Giannakopoulou, D., Orejas, F. (eds.) Fundamental Approaches to Software Engineering. Lecture Notes in Computer Science, vol. 6603, pp. 186–201. Springer, Berlin (2011b)

    Chapter  Google Scholar 

  • Cunha, J., Fernandes, J.P., Mendes, J., Pacheco, H., Saraiva, J.: Bidirectional transformation of model-driven spreadsheets. In: Hu, Z., de Lara, J. (eds.) Theory and Practice of Model Transformations. Lecture Notes in Computer Science, vol. 7307, pp. 105–120. Springer, Berlin (2012a)

    Chapter  Google Scholar 

  • Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: MDSheet: a framework for model-driven Spreadsheet Engineering. In: Proceedings of the 34rd ACM International Conference on Software Engineering (ICSE’12), pp. 1412–1415 (2012b)

  • Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: Towards an evaluation of bidirectional model-driven spreadsheets. User evaluation for Software Engineering Researchers, ACM Digital Library, USER’ 12, 25–28 (2012c)

  • Cunha, J., Fernandes, J.P., Saraiva, J.: From Relational ClassSheets to UML+OCL. The Software Engineering Track at the 27th Annual ACM Symposium On Applied Computing (SAC 2012), Riva del Garda (Trento), pp. 1151–1158. ACM, Italy (2012d)

  • Cunha, J., Mendes, J., Saraiva, J., Visser, J.: Model-based programming environments for spreadsheets. Sci. Comput. Program. (2014a). doi:10.1016/j.scico.2014.02.002

  • Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: Embedding, evolution, and validation of spreadsheet models in spreadsheet systems. Tech. Rep. TR-HASLab:01:2014. HASLab—High-Assurance Software Laboratory & Universidade do Minho (2014b). http://alfa.di.uminho.pt/~jacome/down/tr_embedding.pdf

  • Engels, G., Erwig, M.: ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications. In: Redmiles, D., Ellman, T., Zisman, A. (eds.) 20th IEEE/ACM International Conference on Automated Software Engineering, Long Beach, USA, pp. 124–133. ACM (2005)

  • Erwig, M.: Software engineering for spreadsheets. IEEE Softw. 29(5), 25–30 (2009)

    Article  Google Scholar 

  • EuSpRIG. European Spreadsheet Risks Interest Group (2010) http://www.eusprig.org/

  • Hermans, F., Pinzger, M., van Deursen, A.: Automatically extracting class diagrams from spreadsheets. In: Proceedings of the 24th European conference on Object-oriented programming, Springer-Verlag, Berlin, Heidelberg, ECOOP’10, pp. 52–75 (2010)

  • Höst, M., Regnell, B., Wohlin, C.: Using students as subjects–a comparative study of students and professionals in lead-time impact assessment. Empir. Softw. Eng. 5(3), 201–214 (2000). doi:10.1023/A:1026586415054

    Article  MATH  Google Scholar 

  • Isakowitz, T., Schocken, S., Lucas Jr, H.C.: Toward a logical/physical theory of spreadsheet modeling. ACM Trans. Inf. Syst. 13(1), 1–37 (1995)

    Article  Google Scholar 

  • Lämmel, R., Visser, J. : A Strafunski application letter. In: Dahl, V., Wadler, P. (eds) Proceedings of Practical Aspects of Declarative Programming (PADL’03), LNCS, vol. 2562, pp. 357–375. Springer (2003)

  • Maier, D.: The Theory of Relational Databases. Computer Science Press, Rockville (1983)

    MATH  Google Scholar 

  • Novelli, N., Cicchetti, R.: Fun: an efficient algorithm for mining functional and embedded dependencies. In: ICDT ’01: Proceedings of the 8th International Conference on Database Theory, pp. 189–203. Springer-Verlag, London, UK (2001)

  • Panko, R.: Spreadsheet errors: what we know. what we think we can do. In: Proceedings of the European Spreadsheet Risks Interest Group Annual Conference (EuSpRIG), pp. 7–17 (2000)

  • Perry, D.E., Porter, A.A., Votta, L.G.: Empirical studies of software engineering: a roadmap. In: ICSE ’00 Proceedings of the Conference on The Future of Software Engineering, pp. 345–355. ACM, New York, NY, USA (2000)

  • Powell, S.G., Baker, K.R.: The Art of Modeling with Spreadsheets. Wiley, New York (2003)

    Google Scholar 

  • Rajalingham, K., Chadwick, D., Knight, B.: Classification of spreadsheet errors. In: Proceedings of the European Spreadsheet Risks Interest Group Annual Conference (EuSpRIG), pp. 23–34 (2001)

  • Ullman, J.D., Widom, J.: A First Course in Database Systems. Prentice Hall, Upper Saddle River (1997)

    Google Scholar 

  • Visser, E.: A survey of strategies in rule-based program transformation systems. J. Symb. Comput. 40, 831–873 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Visser, J., Saraiva, J.: Tutorial on strategic programming across programming paradigms. In: 8th Brazilian Symposium on Programming Languages, Niteroi, Brazil (2004)

Download references

Acknowledgments

We would like to thank Orlando Belo for his help on running and analyzing the empirical study. We would also like to thank Paulo Azevedo for his help in conducting the statistical analysis of our empirical study. We would also like to thank the anonymous reviewers for their suggestions which helped us to improve the paper. This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124- FEDER-010048. The first author was also supported by FCT grant SFRH/BPD/73358/2010.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jácome Cunha.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cunha, J., Erwig, M., Mendes, J. et al. Model inference for spreadsheets. Autom Softw Eng 23, 361–392 (2016). https://doi.org/10.1007/s10515-014-0167-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-014-0167-x

Keywords

Navigation