Abstract
Many errors in spreadsheet formulas can be avoided if spreadsheets are built automatically from higher-level models that can encode and enforce consistency constraints in the generated spreadsheets. Employing this strategy for legacy spreadsheets is difficult, because the model has to be reverse engineered from an existing spreadsheet and existing data must be transferred into the new model-generated spreadsheet. We have developed and implemented a technique that automatically infers relational schemas from spreadsheets. This technique uses particularities from the spreadsheet realm to create better schemas. We have evaluated this technique in two ways: first, we have demonstrated its applicability by using it on a set of real-world spreadsheets. Second, we have run an empirical study with users. The study has shown that the results produced by our technique are comparable to the ones developed by experts starting from the same (legacy) spreadsheet data. Although relational schemas are very useful to model data, they do not fit spreadsheets well, as they do not allow expressing layout. Thus, we have also introduced a mapping between relational schemas and ClassSheets. A ClassSheet controls further changes to the spreadsheet and safeguards it against a large class of formula errors. The developed tool is a contribution to spreadsheet (reverse) engineering, because it fills an important gap and allows a promising design method (ClassSheets) to be applied to a huge collection of legacy spreadsheets with minimal effort.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
More information about the degree can be found in http://mei.di.uminho.pt/?q=en/1213/ssd-uk.
One spreadsheet, c6/Adbudget6.xls, was unreadable.
We omit here the column labels, whose names depend on the number of columns in the generated table.
References
Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: ICSE ’06 Proceedings of the 28th International Conference on Software Engineering, pp. 182–191. ACM, New York, NY, USA (2006)
Abraham, R., Erwig, M., Kollmansberger, S., Seifert, E.: Visual specifications of correct spreadsheets. In: Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VLHCC ’05), IEEE Computer Society, pp. 189–196 (2005)
Alhajj, R.: Extracting the extended entity-relationship model from a legacy relational database. Inf. Syst. 28(6), 597–618 (2003)
Cheng, T., Rival, X.: An abstract domain to infer types over zones in spreadsheets. In: Miné, A., Schmidt, D. (eds.) Static Analysis. Lecture Notes in Computer Science, vol. 7460, pp. 94–110. Springer, Berlin (2012)
Codd, E.F.: A relational model of data for large shared data banks. Commun ACM 13(6), 377–387 (1970)
Connolly, T., Begg, C.: Database Systems, A Practical Approach to Design, Implementation, and Management, 3rd edn. Addison-Wesley, Harlow (2002)
Cunha, J.: Model-based spreadsheet engineering. PhD thesis, Universidade do Minho (2011)
Cunha, J., Saraiva, J., Visser, J.: Discovery-based edit assistance for spreadsheets. In: 2009 IEEE Symposium on Visual Languages and Human-Centric Computing, IEEE, pp. 233–237 (2009a)
Cunha, J., Saraiva, J., Visser, J.: From spreadsheets to relational databases and back. In: PEPM ’09 Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation. ACM, New York, NY, USA, pp. 179–188 (2009b)
Cunha, J., Erwig, M., Saraiva, J. Automatically inferring ClassSheet models from spreadsheets. In: Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing, IEEE Computer Society, VLHCC ’10, pp. 93–100 (2010)
Cunha J, Mendes, J., Fernandes, J.P., Saraiva, J.: Embedding and evolution of spreadsheet models in spreadsheet systems. In: IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’11), IEEE Computer Society, pp. 186–201 (2011a)
Cunha, J., Visser, J., Alves, T., Saraiva, J.: Type-safe evolution of spreadsheets. In: Giannakopoulou, D., Orejas, F. (eds.) Fundamental Approaches to Software Engineering. Lecture Notes in Computer Science, vol. 6603, pp. 186–201. Springer, Berlin (2011b)
Cunha, J., Fernandes, J.P., Mendes, J., Pacheco, H., Saraiva, J.: Bidirectional transformation of model-driven spreadsheets. In: Hu, Z., de Lara, J. (eds.) Theory and Practice of Model Transformations. Lecture Notes in Computer Science, vol. 7307, pp. 105–120. Springer, Berlin (2012a)
Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: MDSheet: a framework for model-driven Spreadsheet Engineering. In: Proceedings of the 34rd ACM International Conference on Software Engineering (ICSE’12), pp. 1412–1415 (2012b)
Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: Towards an evaluation of bidirectional model-driven spreadsheets. User evaluation for Software Engineering Researchers, ACM Digital Library, USER’ 12, 25–28 (2012c)
Cunha, J., Fernandes, J.P., Saraiva, J.: From Relational ClassSheets to UML+OCL. The Software Engineering Track at the 27th Annual ACM Symposium On Applied Computing (SAC 2012), Riva del Garda (Trento), pp. 1151–1158. ACM, Italy (2012d)
Cunha, J., Mendes, J., Saraiva, J., Visser, J.: Model-based programming environments for spreadsheets. Sci. Comput. Program. (2014a). doi:10.1016/j.scico.2014.02.002
Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: Embedding, evolution, and validation of spreadsheet models in spreadsheet systems. Tech. Rep. TR-HASLab:01:2014. HASLab—High-Assurance Software Laboratory & Universidade do Minho (2014b). http://alfa.di.uminho.pt/~jacome/down/tr_embedding.pdf
Engels, G., Erwig, M.: ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications. In: Redmiles, D., Ellman, T., Zisman, A. (eds.) 20th IEEE/ACM International Conference on Automated Software Engineering, Long Beach, USA, pp. 124–133. ACM (2005)
Erwig, M.: Software engineering for spreadsheets. IEEE Softw. 29(5), 25–30 (2009)
EuSpRIG. European Spreadsheet Risks Interest Group (2010) http://www.eusprig.org/
Hermans, F., Pinzger, M., van Deursen, A.: Automatically extracting class diagrams from spreadsheets. In: Proceedings of the 24th European conference on Object-oriented programming, Springer-Verlag, Berlin, Heidelberg, ECOOP’10, pp. 52–75 (2010)
Höst, M., Regnell, B., Wohlin, C.: Using students as subjects–a comparative study of students and professionals in lead-time impact assessment. Empir. Softw. Eng. 5(3), 201–214 (2000). doi:10.1023/A:1026586415054
Isakowitz, T., Schocken, S., Lucas Jr, H.C.: Toward a logical/physical theory of spreadsheet modeling. ACM Trans. Inf. Syst. 13(1), 1–37 (1995)
Lämmel, R., Visser, J. : A Strafunski application letter. In: Dahl, V., Wadler, P. (eds) Proceedings of Practical Aspects of Declarative Programming (PADL’03), LNCS, vol. 2562, pp. 357–375. Springer (2003)
Maier, D.: The Theory of Relational Databases. Computer Science Press, Rockville (1983)
Novelli, N., Cicchetti, R.: Fun: an efficient algorithm for mining functional and embedded dependencies. In: ICDT ’01: Proceedings of the 8th International Conference on Database Theory, pp. 189–203. Springer-Verlag, London, UK (2001)
Panko, R.: Spreadsheet errors: what we know. what we think we can do. In: Proceedings of the European Spreadsheet Risks Interest Group Annual Conference (EuSpRIG), pp. 7–17 (2000)
Perry, D.E., Porter, A.A., Votta, L.G.: Empirical studies of software engineering: a roadmap. In: ICSE ’00 Proceedings of the Conference on The Future of Software Engineering, pp. 345–355. ACM, New York, NY, USA (2000)
Powell, S.G., Baker, K.R.: The Art of Modeling with Spreadsheets. Wiley, New York (2003)
Rajalingham, K., Chadwick, D., Knight, B.: Classification of spreadsheet errors. In: Proceedings of the European Spreadsheet Risks Interest Group Annual Conference (EuSpRIG), pp. 23–34 (2001)
Ullman, J.D., Widom, J.: A First Course in Database Systems. Prentice Hall, Upper Saddle River (1997)
Visser, E.: A survey of strategies in rule-based program transformation systems. J. Symb. Comput. 40, 831–873 (2005)
Visser, J., Saraiva, J.: Tutorial on strategic programming across programming paradigms. In: 8th Brazilian Symposium on Programming Languages, Niteroi, Brazil (2004)
Acknowledgments
We would like to thank Orlando Belo for his help on running and analyzing the empirical study. We would also like to thank Paulo Azevedo for his help in conducting the statistical analysis of our empirical study. We would also like to thank the anonymous reviewers for their suggestions which helped us to improve the paper. This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124- FEDER-010048. The first author was also supported by FCT grant SFRH/BPD/73358/2010.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cunha, J., Erwig, M., Mendes, J. et al. Model inference for spreadsheets. Autom Softw Eng 23, 361–392 (2016). https://doi.org/10.1007/s10515-014-0167-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-014-0167-x