A UML Profile for the Design, Quality Assessment and Deployment of Data-intensive Applications | Software and Systems Modeling Skip to main content
Log in

A UML Profile for the Design, Quality Assessment and Deployment of Data-intensive Applications

  • Regular Paper
  • Published:
Software and Systems Modeling Aims and scope Submit manuscript

Abstract

Big Data or Data-Intensive applications (DIAs) seek to mine, manipulate, extract or otherwise exploit the potential intelligence hidden behind Big Data. However, several practitioner surveys remark that DIAs potential is still untapped because of very difficult and costly design, quality assessment and continuous refinement. To address the above shortcoming, we propose the use of a UML domain-specific modeling language or profile specifically tailored to support the design, assessment and continuous deployment of DIAs. This article illustrates our DIA-specific profile and outlines its usage in the context of DIA performance engineering and deployment. For DIA performance engineering, we rely on the Apache Hadoop technology, while for DIA deployment, we leverage the TOSCA language. We conclude that the proposed profile offers a powerful language for data-intensive software and systems modeling, quality evaluation and automated deployment of DIAs on private or public clouds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Notes

  1. TOSCA is a language to specify deployable blueprints in line with the emerging Infrastructure-as-Code (IasC) paradigm [18].

  2. Modeling and Analysis of Real-Time Embedded Systems.

  3. Dependability Analysis and Modeling.

  4. The DIA library is described in the technical “Appendix B”.

  5. See “Appendix B” for details on data types.

  6. In Fig. 5, stereotypes with dark gray background have been taken from MARTE and the light gray ones from DAM.

  7. Yet Another Resource Negotiator.

  8. https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per_report_B2.

  9. https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os.

  10. https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryOverview.htm.

  11. https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster.

  12. We say “at least” because we use Erlang-k distributions for the firing times, which are possible to be represented in CTMC, although increasing even further the number of states in function of the number of Erlang-k transitions and the value of k.

  13. TOSCA is a language to specify deployable blueprints in line with the IaC paradigm [35]. See Appendix 1 for TOSCA details.

  14. http://cloudify.co/.

  15. http://cassandra.apache.org/.

  16. https://www.chef.io/.

  17. https://jujucharms.com/.

  18. http://getcloudify.org.

  19. http://ariatosca.org/.

  20. https://www.indigo-datacloud.eu/.

  21. https://brooklyn.apache.org/learnmore/.

References

  1. Ajmone-Marsan, M., Balbo, G., Conte, G., Donatelli, S., Franceschinis, G.: Modeling with Generalized Stochastic Petri Nets. Wiley, New York (1994)

    MATH  Google Scholar 

  2. Ardagna, D., Bernardi, S., Gianniti, E., Karimian Aliabadi, S., Perez-Palacin, D., Requeno, J.I.: Modeling performance of hadoop applications: a journey from queueing networks to stochastic well formed nets. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 599–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49583-5_47

  3. Ardagna, D., Di Nitto, E., Casale, G., Petcu, D., Mohagheghi, P., Mosser, S., Matthews, P., Gericke, A., Ballagny, C., D’Andria, F., Nechifor, C.-S., Sheridan, C.: Modaclouds: a model-driven approach for the design and execution of applications on multiple clouds. In: Proceedings of the 4th International Workshop on Modeling in Software Engineering, MiSE’12, pp. 50–56. IEEE Press, Piscataway, NJ (2012). http://dl.acm.org/citation.cfm?id=2664431.2664439

  4. Artac, M., Borovsak, T., Di Nitto, E., Guerriero, M., Perez-Palacin, D., Tamburri, D.A.: Infrastructure-as-code for data-intensive architectures: a model-driven development approach. In: IEEE International Conference on Software Architecture, ICSA 2018, Seattle, WA, April 30–May 4, 2018, pp. 156–165. IEEE Computer Society (2018). https://doi.org/10.1109/ICSA.2018.00025

  5. ATC. Athens Technology Center Website (2018). https://www.atc.gr/default.aspx?page=home. Accessed Dec 2018

  6. Baresi, L., Guinea, S., Quattrocchi, G., Tamburri, D.A.: Microcloud: A container-based solution for efficient resource management in the cloud. In: 2016 IEEE International Conference on Smart Cloud (SmartCloud), pp. 218–223, Nov 2016. https://doi.org/10.1109/SmartCloud.2016.42

  7. Bell, G., Hey, T., Szalay, A.: Beyond the data deluge. Science 323(5919), 1297–1298 (2009)

    Article  Google Scholar 

  8. Bernardi, S., Dominguez, J.L., Gómez, A., Joubert, C., Merseguer, José, Perez-Palacin, D., Requeno, J.I., Romeu, A.: A systematic approach for performance assessment using process mining. Empir. Softw. Eng. (2018) (accepted for publication). https://doi.org/10.1007/s10664-018-9606-9

  9. Bernardi, S., Requeno, J.I., Joubert, C., Romeu, A.: A systematic approach for performance evaluation using process mining: the Posidonia Operations case study. In: Proceedings of the 2nd International Workshop on Quality-Aware DevOps, QUDOS 2016, pp. 24–29. ACM, New York, NY (2016). https://doi.org/10.1145/2945408.2945413

  10. Bernardi, S., Merseguer, J., Petriu, D.C.: A dependability profile within MARTE. Softw. Syst. Model. 10(3), 313–336 (2011)

    Article  Google Scholar 

  11. Bernardi, S., Merseguer, J., Petriu, D.C.: Model-Driven Dependability Assessment of Software Systems. Springer, New York (2013)

    Book  MATH  Google Scholar 

  12. Blu Age. Blu Age, Make IT Digital (2018). https://www.bluage.com. Accessed Dec 2018

  13. Casale et al., G.: DICE: Quality-driven development of data-intensive cloud applications. In: Proceedings of the Seventh International Workshop on Modeling in Software Engineering, pp. 78–83, IEEE Press, NJ (2015). http://dl.acm.org/citation.cfm?id=2820489.2820507

  14. Chandrasekaran, K., Santurkar, S., Arora, A.: Stormgen—a domain specific language to create ad-hoc storm topologies. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M. (eds.) FedCSIS, pp. 1621–1628 (2014). http://dblp.uni-trier.de/db/conf/fedcsis/fedcsis2014.html#ChandrasekaranSA14

  15. Chen, C.L.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  16. Chiola, G., Dutheillet, C., Franceschinis, G., Haddad, S.: Stochastic well-formed colored nets and symmetric modeling applications. IEEE Trans. Comput. 42(11), 1343–1360 (1993). https://doi.org/10.1109/12.247838

    Article  Google Scholar 

  17. Clements, P., Kazman, R., Klein, M.: Evaluating Software Architectures: Methods and Case Studies. Addison-Wesley, Boston (2001)

    Google Scholar 

  18. Cois, C.A., Yankel, J., Connell, A.: Modern devops: optimizing software development through effective system interactions. In: IPCC, pp. 1–7. IEEE (2014). http://dblp.uni-trier.de/db/conf/ipcc/ipcc2014.html#CoisYC14

  19. Colas, M., Finck, I., Buvat, J., Nambiar, R., Singh, R.R.: Cracking the data conundrum: how successful companies make big data operational. Technical report, Capgemini consulting (2015). https://www.capgemini-consulting.com/cracking-the-data-conundrum

  20. Cortellessa, V., Di Marco, A., Inverardi, P.: Model-Based Software Performance Analysis. Springer, New York (2011)

    Book  Google Scholar 

  21. Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  22. Di Nitto, E., Mattew, P., Petcu, D., Solberg, A. (eds.): Model-Driven Development and Operation of Multi-Cloud Applications. PoliMI SpringerBriefs. Springer, New York (2017)

  23. Dipartamento di informatica, Università di Torino. GRaphical Editor and Analyzer for Timed and Stochastic Petri Nets, Dec 2015. www.di.unito.it/~greatspn/index.html

  24. Gilmore, S., Hillston, J., Kloul, L., Ribaudo, M.: Pepa nets: a structured performance modelling formalism. Perform. Eval. 54(2), 79–104 (2003). https://doi.org/10.1016/S0166-5316(03)00069-5

    Article  MATH  Google Scholar 

  25. Gómez, A., Merseguer, J., Di Nitto, E., Tamburri, D.A.: Towards a uml profile for data intensive applications. In: Proceedings of the 2Nd International Workshop on Quality-Aware DevOps, QUDOS 2016, pp. 18–23, ACM, New York, NY (2016). https://doi.org/10.1145/2945408.2945412

  26. Juniper Project: Experimental: models for big data stream processing (2015). Juniper Project Tutorial. http://forge.modelio.org/projects/juniper/wiki/Tutorial_on_Models_for_Big_Data_stream_processing. Accessed Dec 2018

  27. Kroß, J., Brunnert, A., Krcmar, H.: Modeling big data systems by extending the palladio component model. Softwaretechnik-Trends 35(3) (2015)

  28. Kroß, J., Krcmar, H.: Modeling and simulating Apache Spark streaming applications. Softwaretechnik-Trends 36(4), 1–3 (2016)

    Google Scholar 

  29. Lagarde, F., Espinoza, H., Terrier, F., Gérard, S.: Improving UML profile design practices by leveraging conceptual domain models. In: 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007), Atlanta (USA), ACM, Nov 2007, pp. 445–448

  30. Langheinrich, M.: Privacy by design. In: Abowd, G.D., Brumitt, B., Shafer, A. (eds.) UBICOMP 2001, pp. 273–291. Springer, New York (2001)

    Chapter  Google Scholar 

  31. Lazowska, E.D., Zahorjan, J., Scott Graham, G., Sevcik, C.: Quantitative System Performance: Computer System Analysis Using Queueing Network models. Prentice-Hall, Upper Saddle River (1984)

    Google Scholar 

  32. Lipton, P., Palma, D., Rutkowski, M., Tamburri, D.A.: TOSCA solves big problems in the cloud and beyond. IEEE Cloud 21(11), 31–39 (2016)

    Google Scholar 

  33. López-Grao, J.P., Merseguer, J., Campos, J.: From UML activity diagrams to stochastic petri nets: application to software performance engineering. In: Proceedings of the 4th International Workshop on Software and Performance, WOSP’04, pp. 25–36, ACM, New York, NY (2004). https://doi.org/10.1145/974044.974048

  34. Morris, K.: Infrastructure As Code: Managing Servers in the Cloud. Oreilly & Associates Incorporated, Sebastopol (2016)

    Google Scholar 

  35. Palma, D., Rutkowski, M., Spatzier, T.: Tosca simple profile in YAML version 1.0. Technical report, OASIS Committee Specification (2016). http://docs.oasis-open.org/tosca/TOSCA-Simple-Profile-YAML/v1.0/cs01/TOSCA-Simple-Profile-YAML-v1.0-cs01.html

  36. Perez-Palacin, D, Ridene, Y., Merseguer, J.: Quality assessment in DevOps: automated analysis of a tax fraud detection system. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion, ICPE’17 Companion, pp. 133–138, ACM, New York, NY (2017)

  37. Petriu, D.C., Alhaj, M., Tawhid, R.: Software Performance Modeling. Lecture Notes in Computer Science, vol. 7320. Springer, Berlin (2012)

    Google Scholar 

  38. Prodevelop: Prodevelop-Integrating Tech (2018). https://www.prodevelop.es/en. Accessed Dec 2018

  39. Rajbhoj, A., Kulkarni, V., Bellarykar, N.: Early experience with model-driven development of MapReduce based big data application. In: 2014 21st Asia-Pacific Software Engineering Conference (APSEC), vol. 1, pp. 94–97 (Dec 2014). https://doi.org/10.1109/APSEC.2014.23

  40. Ranjan, R.: Modeling and simulation in performance optimization of big data processing frameworks. IEEE Cloud Comput. 1(4), 14–19 (2014)

    Article  Google Scholar 

  41. Requeno, J.I., Merseguer, J., Bernardi, S., Perez-Palacin, D., Giotis, G., Papanikolaou, V.: Quantitative analysis of apache storm applications: the NewsAsset case study. Inf. Syst. Front. (2018) (accepted for publication). https://doi.org/10.1007/s10796-018-9851-x

  42. Requeno, J.-I., Merseguer, J., Bernardi, S.: Performance analysis of apache storm applications using stochastic petri nets. In: IEEE International Conference on Information Reuse and Integration (IRI), pp. 411–418 (2017). http://ieeexplore.ieee.org/document/8102965/, https://doi.org/10.1109/IRI.2017.64

  43. Sanders, W.H., Meyer, J.F.: Stochastic Activity Networks: Formal Definitions and Concepts. Lecture Notes in Computer Science, vol. 2090. Springer, Berlin (2001)

    Google Scholar 

  44. Sandmann, G., Thompson, R.: Development of AUTOSAR software components within model-based design. SAE Technical Paper 04 (2008). https://doi.org/10.4271/2008-01-0383

  45. Santurkar, S., Arora, A., Chandrasekaran, K.: Stormgen—a domain specific language to create ad-hoc storm topologies. In: 2014 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1621–1628 (Sept 2014). https://doi.org/10.15439/2014F278

  46. Scheidgen, M., Zubow, A:. Map/reduce on emf models. In: MDHPCL@MoDELS. ACM (2012). http://dblp.uni-trier.de/db/conf/models/mdhpcl2012.html#ScheidgenZ12

  47. Selic, B.: A systematic approach to domain-specific language design using UML. In: Tenth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC 2007), 7–9 May 2007, Santorini Island, Greece, pp. 2–9 Computer Society (2007)

  48. Selic, B., Gerard, S. (eds.): Modeling and Analysis of Real-Time and Embedded Systems with UML and MARTE. Morgan Kaufmann, Boston (2014)

    Google Scholar 

  49. Smith, C.U., Williams, L.G.: Performance Solutions: A Practical Guide to Creating Responsive. Scalable Software. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA (2002)

    Google Scholar 

  50. The Apache Software Foundation. Apache Cassandra. http://cassandra.apache.org/. Accessed Dec 2018

  51. The Apache Software Foundation. Apache Hadoop. http://hadoop.apache.org/. Accessed Dec 2018

  52. The Apache Software Foundation. Apache Kafka. http://kafka.apache.org/. Accessed Dec 2018

  53. The Apache Software Foundation. Apache Spark. http://spark.apache.org/. Accessed Dec 2018

  54. The Apache Software Foundation. Apache Storm. http://storm.apache.org/. Accessed Dec 2018

  55. The Apache Software Foundation. Apache Tez. http://tez.apache.org/. Accessed Dec 2018

  56. The DICE Consortium. DICE Models Repository, Jan 2017. https://github.com/dice-project/DICE-Models

  57. The DICE Consortium. DICE Profiles Repository, Sept 2017. https://github.com/dice-project/DICE-Profiles

  58. The DICE Consortium. DICE Profiles, Sept 2017. https://github.com/dice-project/DICE-Profiles

  59. The DICE Consortium. DICE Simulation tool, Oct 2017. https://github.com/dice-project/DICE-Simulation

  60. The DICE Consortium. DICE-Rollout, Sept 2017. https://github.com/dice-project/DICER

  61. The Object Management Group (OMG): Model-Driven Architecture Specification and Standardisation. Technical report (2018). http://www.omg.org/mda/

  62. The DICE Consortium. DICE simulation tools. Technical report, European Union’s Horizon 2020 research and innovation programme (2017). http://wp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2017/08/D3.4_DICE-simulation-tools-Final-version.pdf

  63. The DICE Consortium. DICE transformations to Analysis Models. Technical report, European Union’s Horizon 2020 research and innovation programme (2016). http://wp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2016/08/D3.1_Transformations-to-analysis-models.pdf

  64. UML Profile for MARTE: Modeling and Analysis of Real-Time and Embedded Systems (June 2011). Version 1.1, OMG document: formal/2011-06-02

  65. Unified Modeling Language: Infrastructure, 2017. Version 2.5.1, OMG document: formal/2017-12-05

  66. Wang, K., Khan, M.M.H.: Performance prediction for Apache Apark platform. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), and 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), pp. 166–173 (2015)

  67. Wettinger, J., Breitenbücher, U., Leymann, F.: Standards-based DevOps automation and integration using TOSCA. In: 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 59–68, Dec 2014. https://doi.org/10.1109/UCC.2014.14

  68. WikiMedia project. Wikistats, Dec 2016. https://www.mediawiki.org/wiki/Analytics/Wikistats

  69. Wille, R.: Formal concept analysis as mathematical theory of concepts and concept hierarchies. In: Formal Concept Analysis, pp. 1–33 (2005)

  70. Woodside, C.M., Petriu, D.C., Merseguer, J., Petriu, D.B., Alhaj, M.: Transformation challenges: from software models to performance models. Softw. Syst. Model. 13(4), 1529–1552 (2014). https://doi.org/10.1007/s10270-013-0385-x

    Article  Google Scholar 

  71. XLAB. XLAB, R&D (2018). https://www.xlab.si. Accessed Dec 2018

Download references

Acknowledgements

This work is supported by the European Commission Grant No. 644869 (H2020, Call 1), DICE. D. Perez-Palacin, J. Merseguer and J.I. Requeno have been supported by the project CyCriSec [TIN2014-58457-R] and Aragon Government Ref. T27-DISCO Research Group.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Merseguer.

Additional information

Communicated by Prof. Dorina Petriu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: MARTE and DAM profiles

MARTE [64] is a standard profile that extends UML for the performance and schedulability analysis of a system. MARTE consists of three main parts: MARTE Foundations, MARTE Design Model and MARTE Analysis Model. The Analysis Model is of our interest since it enables the QoS assessment by allowing the definition of QoS metrics and properties. The Analysis Model consists of a Generic Quantitative Analysis and Modelling (GQAM) profile and its specialization, the Performance Analysis and Modelling (PAM) profile. In addition to this, two other features are also important for our DIA profile.

Fig. 24
figure 24

Transformation of Apache Hadoop specific stereotypes to Petri nets

The first one is that MARTE enables the specification of quantitative non-functional properties (NFP) in UML models through its Value Specification Language (VSL). The VSL is useful for specifying the values of constraints, properties and stereotype attributes, particularly related to NFPs. Moreover, VSL allows to express basic types, data types, values (such as time and composite values), as well as variables, constants and expressions. This means that using VSL we can define complex metrics and requirements to express, for example, response times, utilizations or throughputs. MARTE also defines a library of primitive data types, a set of predefined NFP types and units of measures. Hence, our DIA profile inherits the VSL altogether.

For understanding the VSL expressions that appear in this paper, it is of interest to briefly recall its syntax. An example of VSL expression for a host demand tagged value of type NFP_Duration is:

figure i

This expression specifies that the Reducing activity in Fig. 13 demands 6 (1) milliseconds (2) of processing time, whose mean value (3) is obtained from an estimation in the real system (4). We could replace, for example, the value 6 for a variable $\(host\_dem\) to parameterize the analysis of the model with different values for this host demand.

The second feature is that the DAM [10] profile specializes MARTE-GQAM for dependability analysis (i.e., availability, reliability, safety and maintainability). Consequently, the DAM profile also inherits the VSL. As MARTE, DAM consists of a library and a set of extensions to be applied at model specification level. Our DIA profile inherits DAM with the purpose of addressing reliability analysis for DIA.

Appendix B: DIA Profile Library

In this Appendix, we present the DIA library. The library defines the data types, basic and complex, used in the attributes of the stereotypes proposed for the three abstraction levels, DPIM, DTSM and DDSM. Basic types appear in Fig. 22, while complex ones in Fig. 23. From DAM, we have imported the DAM Library [10], which also imports the MARTE Library [64].

Appendix C: TOSCA

TOSCA provides a flexible and highly extensible DSL for modeling resources and software components. TOSCA blueprints are executable IasC composed of node templates and relationships, defining the topology of a hardware/software systems. Node templates and relationships are instances of node types and relationship types, that are either normative (i.e., defined in the standard), provided by the specific engine that executes a blueprint (the orchestrator), or an extension of one of the above, such as in our case, with DIA-specific node and relationship types. Node types are essentially used to describe hardware or virtual resources (machines or VMs) and software components. Relationship types predicate on the association between node types. For instance, a TOSCA node type representing Wordpress CMS must be associated with a node type presenting VMs through the relationship hosted_On. Each node type and relationship type also enables specifying interfaces, which are composed of operations that have to be carried out at specific stages of the deployment orchestration. Typical examples of interface operations include installing, configuring or starting of components, and may take form of Python/bash scripts, or pointers to Chef recipes. Node and relationship templates are free to provide their own interface operations, extending or overriding behavior defined in the corresponding types. TOSCA is being supported by a number of orchestrators that, given a TOSCA blueprint and all node and relationship types used there, are able to execute it deploying the corresponding system and managing its lifecycle. Examples of such orchestrators are Cloudify,Footnote 18 ARIA TOSCA,Footnote 19 Indigo,Footnote 20 Apache BrooklynFootnote 21 or ECoWare [6].

Table 4 Research questions on profile usability

Appendix D: Transformation of a DTSM design to a performance model

Stochastic Well-formed Nets (SWN) [16] are a modeling formalism suitable for performance analysis purposes. A SWN model is a bipartite graph formed by places and transitions. Places are graphically depicted as circles and may contain tokens. A token distribution in the places of a SWN, namely a marking, represents a state of the modeled system. The dynamic of the system is governed by the transition enabling and firing rules, where places represent pre- and post-conditions for transitions. In particular, the firing of a transition removes (adds) as many tokens from its input (output) places as the weights of the corresponding input (output) arcs. Transitions can be immediate, those that fire in zero time; or timed, those that fire after a delay which is sampled from a random variable with a given probability distribution function. Immediate transitions are graphically depicted as black thin bars, while timed ones are depicted as white thick bars. Tokens may also have an associated color, i.e., a data type, which enriches the expressiveness of the net and restricts the movement of tokens to compatible places and transitions.

Figure 24 depicts a schema of how Apache Hadoop stereotypes in UML-profiled models (left) are transformed into an analyzable model such as a SWN (right). Each stereotype is transformed into a sub-net by taking into account the information contained in the tags. For each transformation pattern in the Figure, the part of the Petri net inside the blue box corresponds to the part that the transformation creates. The part of the Petri net outside the blue box corresponds to referenced parts, which are in turn created by other stereotypes. Figure 24 depicts only the specific non-functional annotations for Apache Hadoop; the functional part of the UML diagram is transformed according to the works in [33, 70]. Eventually, all the sub-nets are composed into a single closed Petri net such as in Fig. 16.

A Hadoop cluster accepts several categories of users, whose jobs are probably subdivided into a different number of map–reduce tasks or have assigned a different number of hardware resources. Every user \(<i>\) has \(\$nC_i\) jobs waiting in the scheduler queue. Hadoop scheduler launches periodically a new job at a given \(\$rate\) following a scheduling policy defined by the scenario (e.g., a shared common FIFO queue for all users). By default, our transformation assumes an independent FIFO queue for each user and always guarantees to take a job of each user.

Jobs are labeled with the user they belong to (loop \(<i>\)-\(<i+1>\) in the net, where \(<i>\) represents each user). The scheduler waits for the assignment of resources to all tasks in the reduce phase of the precedent job \(<i>\) before launching the next job \(<i+1>\) (inhibitor arc section). This scheduling allows both concurrency among jobs and giving priority over resources to precedent jobs. Job \(<i>\) is divided in \(\$m_i\) map tasks and \(\$r_i\) reduce tasks, that run simultaneously in up to \(\$p_i\) cores (\(\sum _{i=1}^{n} \$p_i \ge \$host\), being \(\$host\) the total number of cores in the cluster). We use the notation \(<i>\) for expressing the color of a token. For instance, each user is represented by a different color in the SWN. Notation \(\$m_i\) is used for expressing numerical values; for instance, the number of map tasks in which a job of type \(<i>\) is divided.

Appendix E: Usability of the profile

The validation of the DIA profile has been carried out so far from the point of view of its adequacy to solve the QoS assessment and the deployment. However, we consider also important to learn about the usability of the profile, in terms of easiness of use for engineers. It uses to happen that tools, although offering the required functionalities, do not reach their expectations until a degree of maturity is accepted at this regard.

The DIA profile has been used by engineers in four organizations: Prodevelop [38], ATC [5], BluAge [12] and XLAB R&D [71]. We have prepared eight questions, see Table 4, for a total of eight engineers, who have extensively used the DIA profile in the context of the DICE project to carry out industrial applications. From the answers, we see that the profile has been useful for the engineers, specially for the automatic deployment. However, the main lack refers to the Papyrus implementation (see question #6) that also constraints the profile implementation. In fact, the advice of the engineers (see question #8) referred to improve the Papyrus implementation of the profile.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Perez-Palacin, D., Merseguer, J., Requeno, J.I. et al. A UML Profile for the Design, Quality Assessment and Deployment of Data-intensive Applications. Softw Syst Model 18, 3577–3614 (2019). https://doi.org/10.1007/s10270-019-00730-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10270-019-00730-3

Keywords

Navigation