Keywords

1 Introduction

The increasing complexity of distributed embedded systems have been a motivation for the use of Petri nets [1, 2]. This graphical modelling formalism enables the explicit specification of concurrent systems, their synchronization and conflicts, and the share of resources [3]. As a result, they ensure that systems behavior conforms to the intended specification so as not to endanger people.

IOPT-Tools, which are online available at http://gres.uninova.pt/IOPT-Tools/ supports the development of embedded systems controller using Petri nets [4, 5]. This framework offers a set of tools to support the creation of IOPT-net models [1], their verification, and the automatic code generation (C and VHDL) [6, 7]. To enable the verification, an automatic code generator is used to compute the models’ reachability graphs [8]. As real world applications can present exponential reachability graphs with a huge number of states, their generation can take a long time to compute [2], requiring high computational performance [9]. Some tools generate the condensed reachability graph of a Petri net model, preventing the exponential growth of the graph [10,11,12].

The work presented in this paper is focused on obtain the complete reachability graph of a model, based on a new model-checking algorithm to compute the reachability graphs of IOPT Petri net models using NVIDIA’s CUDA. NVIDIA’s CUDA supports CPU-GPU co-processing for parallel computing [13]. As a matter of fact, GPU calculates all the child nodes of the reachability graph and handles their storing with the support of a hash-table. In addition, the CPU schedules the threads to be launched on the GPU, which are needed to process a new set of unprocessed states of the graph.

Section 2 mentions how this paper contributes to life improvement. In Sect. 3 IOPT Petri nets and IOPT-Tools are briefly described, including the current reachability graph generation tool. In the following section, it is mentioned how CUDA Toolkit contributes to program and run parallel C++ applications on GPU. Section 5 presents the proposed algorithm, for the computation of reachability graphs, for IOPT Petri net models. In Sect. 6 are presented the results supported by an NVIDIA Titan V GPU, and finally in Sect. 7 the conclusions about the results and future work are presented.

2 Relationship to Life Improvement

Currently, technological advances enabled the creation of many types of distributed embedded systems that contribute to a significant improvement in people’s quality of life across different areas, ranging from appliances and home products, medical and health solutions, surveillance and security equipment, to transportation and communication systems, among others.

Concerning to safety-critical systems, to make sure that they are safe and free of development errors, they must be formally verified, by checking all possible interactions and potential unwanted properties [14,15,16]. There are large number of Petri net tools [17], which support not only the models edition and simulation, but also their formal verification and analysis, to ensure that the system specification conforms to the desired properties or has no unwanted properties. This work, addressing the construction of reachability graph for IOPT nets, aims to contribute for the safety-critical systems properties verification.

3 IOPT Nets

The IOPT nets are a low-level and non-autonomous Petri net class proposed to develop automation and embedded systems, allowing the rapid prototyping of system controllers through IOPT-Tools framework [18,19,20]. The IOPT Petri net relies on signals and events to specify the interaction of the models with the environment: while input signals and events constraint the evolution of the net, directly associated with transition firing, outputs are updated according with the marking of the net and transition firing. In detail, a transition fires if it is enabled from the point of view of place marking, the associated events occur and if the associated guards are verified. When more than one transition is enabled, but not all of them are allowed to firing, transition priorities and test arcs can be used [8]. As a result, each system’s state is composed of a vector of all places’ marking; and an output event signal vector, with the values of all signals associated with output events [19].

3.1 Reachability Graph Generator

The IOPT-Tools offer a tool to compute the reachability graphs [19] of IOPT Petri net models. During graph generation, the tool gathers information about the model, namely the influence of input signals and events on the firing of transitions, as well as all combinations of all enabled transitions. The automatic C code generated by the reachability graph generator provides libraries to compute a model’s reachability graph, managed with a hash-table that allows ordering multiple states for each key, to help search for repeated states. At the end of the computation, the resulting reachability graph is stored in a hierarchical XML file, in which the connections between the states are represented.

The reachability graph algorithm [21] initiates with the creation of the database and initial state, obtained from the initial marking M0 and the values of all output event signals presented on the net at that moment. After that, the initial node is added into the database and hash-table, and the algorithm proceeds from there or any other state by modifying the value of net’s initial marking. The algorithm will continue until all the unprocessed states are treated, or the graph reaches the maximum size, which is specified according to the computation platform available resources. Each evaluation of an unprocessed state is carried out with the calculation of its child states (the next unprocessed states), by executing a function that recursively analyzes all transitions that are enabled to firing. Then, all child nodes are stored in the database and sorted in the hash-table if they are not repeated states, whose marking and outputs refer to previously existing nodes. Finally, the new child nodes are added to the set of unprocessed states, waiting to be processed. The generated reachability graph includes the nodes, the arcs that connect them, and links that represent existing nodes.

4 CUDA Architecture

To increase the performance of the IOPT reachability graphs generation, the use of GPU is proposed in this paper. The GPU is used to improve the processing of each state, parallelizing the calculation and analysis of its child nodes. For that, it was used a NVIDIA GPU and the CUDA Toolkit, which enable the co-processing of C++ programs in platforms with CPU and GPU [22, 23]. The execution starts and ends in the CPU, which exchanges data with the GPU and launch the kernels to running on the GPU.

A kernel is a sequential program that runs in parallel as many times as the number of threads running on the GPU distributed by blocks in a grid. Although, there is a limit to the number of threads per block, a kernel can be executed by multiple blocks in parallel, so the total number of threads launched is equal to the number of threads per block times the number of blocks thar compose the grid. A set of functions, such as __threadfence, __syncthreads, and atomic operations, can be used to ensure the correct access to shared and global memory, avoiding hazards that can occur from simultaneous read and write operations at the same memory address [24].

5 Proposed Approach and Algorithm

The reachability graph generation at IOPT-Tools is mainly composed of two parts: for each state, it is calculated its child nodes; and for each child node it is inspected if it is a new independent state or if it is equal to a previously existing one. In the generator of the IOPT tool framework, this entire process is done sequentially in CPU, with each state being analyzed one at a time. An algorithm proposed in [25] reused part of IOPT-Tools generator code and adapted it to run on a GPU used to perform the calculation all the child nodes of unprocessed states in parallel, while the CPU schedules threads on the GPU, handles the hash-table and the categorization of states.

The algorithm proposed in this paper also uses co-processing, taking even more advantage of the GPU. For that, the initialization of the database and the creation of the initial state are handled by the CPU, and from that moment onwards the task of searching for new states it’s responsibility of the GPU until all graph it’s done. During the computation, the CPU receives feedback on the status of the graph through memory copies from the device to the host, receiving the update of the number of calculated states and the number of states to be processed. This allows the continuity of the algorithm that ends only when there are no more unprocessed states to process or the maximum number of states associated with the allocation of the database is reached. So, three kernels were implemented, each one responsible for a specific part of the algorithm, as described in Fig. 1.

Fig. 1.
figure 1

Sequence of actions of the algorithm.

When the kernel calc_ChildrenStates, presented in Algorithm 1, is invoked, the values of the number of states to be processed and the current number of states stored in the database are passed. The first instruction performed is the assignment of an unprocessed state from the array states to the threads of a block. At this point, CPU has been launched as many blocks as the number of unprocessed states that need to be processed, and as many threads as the number of transitions of the Petri net. The way to provide each thread the state to process is using the unique index of the block, the size of the array states, that stores all the states, and the number of blocks launched.

figure a

Using the __syncthreads() function all the threads in the current block will synchronize, waiting for thread 0 to share the state that needs to be processed. The kernel continues with a recursively analysis of all enabled transitions of a state, calculating all the combinations between them. The transition with the highest priority will analyze the remaining ones; the second most priority will analyze the other ones except the first one, and so on. The latter transition will only consider itself. As there is one thread per transition, each one of them will analyze the combinations in parallel for each state, saving the founded new states inside the array childs.

After that, the kernel returns the number of child nodes calculated to proceed with the search of repeated children. The CPU launches a number of threads equal to the number of founded states, one for each thread to compare with the others. The comparison of the states is made at find_RepeatedChildren kernel, presented in Algorithm 2, by comparing the marking and outputs of the Petri net. If there is a repeated sibling, the value of the link flag and the id of the state are changed, followed by the storage of the state in the array links.

figure b

The last kernel, presented in Algorithm 3, compare the child nodes that have not been copied to the array links with all existing nodes. For that, it was implemented a multivalue hash-table where multiple values for the same key are represented by different key-state_id pairs. In the same way as the previous kernel, the CPU launches a number of threads equal to the number of founded states. Each thread will search on hash-table for repeated states starting by calculating the key based on the marking and outputs of the state, using it to limit the search to elements with the same key. Several threads could access the hash-table, including threads whose states have the same key. If there is no repeated state at the database, the value of the id of the child state is actualized, and the state is stored in the array states; if there is a repeated state it is stored in the array links.

figure c

6 Results of Experiments

The algorithm presented was applied to six IOPT-net models. These models were also used to document the results at [25], and are available online at http://gres.uninova.pt/IOPT-Tools/, in the user account “models”. Using these models, we could compare the results between the two approaches. The results are presented in Table 1. Considered the amount of time that GPU took to calculate the entire reachability graph and the time spent on co-processing we have obtained much better results with this algorithm. The results are, in some cases, two or three orders of magnitude better than the ones presented at [25].

Table 1. Results obtained with an GPU TITAN V.

The number of cycles and nodes presented express the graph with the respect to its size and format. This characteristic can affect the execution time depending on the number of states processed in parallel as well as the number of transitions of the net. The models PNSE−53b and ICIT13_denoise presented the same number of cycles processed; for the first was calculated 29 nodes and for the second 25. Although the number of transitions in the second is almost twice that of the first, the execution times obtained were approximately equal, such that parallel threads were launched and used at the same time to exploit the computing power of the GPU.

7 Conclusions and Future Work

The proposed algorithm presents an improvement in the computation of the reachability graph for IOPT Petri net models in GPU. Threads were used to analyze combinations of transitions in parallel, improving the calculation of child states; and the use of a hash-table implemented in GPU prevented the time spent with memory management. As future work we intend to analyze the impact of the proposed algorithm using GPU as a function of the number of global states obtained, namely as a function of the initial marking. Additionally, the impact of computation efforts necessary for evaluating transition enabling conditions and output expressions is intended to be analyzed.