1 Introduction

Modern computer networks are extremely complex, leading to many bugs and vulnerabilities which affect our daily life. Therefore, network verification is an increasingly important topic addressed by the programming languages and networking communities (e.g., see [9, 1418, 21, 30]). Previous network verification tools leverage a simple network forwarding model which renders the datapath immutable; i.e., normal packets going through the network do not change its forwarding behavior, and the control plane explicitly alters the forwarding state at relatively slow time scales. Thus, invariants can be verified before each control-plane initiated change and these invariants will be enforced until the next such change. While the notion of an immutable datapath supported by an assemblage of routers makes verification tractable, it does not reflect reality. Modern enterprise networks are comprised of roughly 2 / 3 routers and 1 / 3 middleboxes [31]. A simple example of a middlebox is a stateful firewall which permits traffic from untrusted hosts only after they have received a message from a trusted host. Middleboxes — such as firewalls, WAN optimizers, transcoders, proxies, load-balancers, intrusion detection systems (IDS) and the like — are the most common way to insert new functionality in the network datapath, and are commonly used to improve network performance and security. While useful, middleboxes are a common source of errors in the network [25], with middleboxes being responsible for over 40 % of all major incidents in networks.

This paper addresses the problem of verifying safety of networks with middleboxes, referred to as stateful networks. From a verification perspective, it is possible to view a middlebox as a procedure with local mutable state which is atomically changed every time a packet is transmitted. The local state determines the forwarding behavior.Footnote 1 Thus, the problem of network verification amounts to verifying the correctness of a specialized distributed system where each of the middleboxes operates atomically and the order of packet arrivals is arbitrary.

We model such a network as a finite undirected graph with two types of nodes: (i) hosts which can send packets, (ii) middleboxes which react to packet arrivals and forward modified packets. Each node in the network has a fixed number of ports, connected by network edges (links).

Real middleboxes are generally complex software programs implemented in several 100 s of thousands of lines of code. We follow [23, 24] in assuming that we are provided with middlebox models in the form of finite-state transducers. In our experience one can naturally model the behavior of most middleboxes this way. For every incoming packet, the transducer uses the packet header and the local state to compute the forwarding behavior (output) and to update state for future packets. The transducer can be non-deterministic to allow modelling of middleboxes like load-balancers whose behavior depends not just on state, but also on a random number source. We symbolically represent the local state of each middlebox by a fixed set of relations on finite elements, each with a fixed arity.

The Verification Problem. We define network safety by means of avoiding “bad” middlebox states (e.g., states from which a middlebox forwards a packet in a way that violates a network policy). Given a set of bad middlebox states, we are interested in showing that for all packet scenarios the bad states cannot be reached. This problem is hard since the number of packets is unbounded and the states of one middlebox can affect another via transmitted packets.

1.1 What Is Decidable About Middlebox Verification

In Sec. 3, we prove that for general stateful networks the verification problem is undecidable. This result relies on the observation that packet histories can be used to count, similarly to results in model checking of infinite ordered communication channels [8]. One may believe that undecidability arises from the presence of forwarding loops in the network which are usually avoided in real networks. However, we show that the verification problem is undecidable even for networks without forwarding loops.

In order to obtain decidability, we introduce an abstract semantics of networks where the order of packet processing on each channel (connecting two middleboxes or a middlebox and a host) is arbitrary, rather than FIFO. Thus, middlebox inputs are multisets of packets which can be processed in any order. This abstraction is conservative, i.e., whenever we verify that the network does not reach a bad state, it is indeed the case. However, the verification may fail even in correct networks. Since packets are atomically processed, we note that network designers can impose ordering even in this abstract model by sending acknowledgments for received packets. This is useful when enforcing authentication.

In fact, this abstraction closely corresponds to assumptions made by network engineers: since packets in modern networks can traverse multiple paths, be buffered, or be chosen for more complex analysis, network software cannot assume that packets sent from a source to a server are received by a server in order. Network protocols therefore commonly build on TCP, a protocol which uses acknowledgments and other mechanisms to ensure that servers receive packets in order. Since packet ordering is enforced by causality (by sending acknowledgments) and by software on the receiving end, rather than by the network semantics, correctness of such networks typically does not rely on the order of packet processing. Therefore we can successfully verify a majority of network applications despite our abstraction.

1.2 Complexity of Stateful Verification

In Sec. 6, we show that the problem of network verification when assuming a nondeterministic order of packet processing is complete for exponential space, i.e., it is decidable, and in the worst case, the decision procedure can take exponential space in terms of hosts and middleboxes. This is proved by showing that the network safety problem is equivalent to the coverability problem of Petri nets, which is known to be EXPSPACE-complete [26].

Fig. 1.
figure 1

Middlebox hierarchy.

Since the problem is complete, it is impossible to improve this upper-bound without further assumptions. Therefore, we also consider limited cases of middleboxes permitting more efficient verification procedures, as shown in Fig. 1. We identify four classes of middleboxes with increasing expressive power and verification complexity: (i) stateless middleboxes whose forwarding behavior is constant over time, (ii) increasing middleboxes whose forwarding behavior increases over time, (iii) progressing middleboxes whose forwarding behavior stabilizes after some fixed time, alternatively, the transition relation of the transducer does not include cycles besides self-cycles, and (iv) arbitrary middleboxes without any restriction. For example, NATs, Switches and simple ACL-based firewalls are stateless; hole-punching stateful firewalls are increasing; and learning-switches and cache-proxies are progressing and not increasing.

For stateless and increasing middleboxes, we prove that any packet which arrives once can arrive any number of times, leading to a polynomial-time verification algorithm, using dynamic programming. We note that efficient near linear-time algorithms for stateless verification are known (e.g., see [17]). Our result generalizes these results to increasing networks and is in line with the recent work in [13, 19].

For progressing middleboxes, we show that verification is coNP-complete. The main insight is that if a bad state is reachable then there exists a small (polynomial) input scenario leading to a bad state. This means that tools like SAT solvers which are frequently used for verification can be used to verify large networks in many cases but it also means that we cannot hope for a general efficient solution unless P=NP.

Finally, we note that unlike the known results in stateless networks, the absence of forwarding loops does not improve the upper bound, i.e., we show that our lower bounds also hold for networks without forwarding loops.

Packet Space Assumption. Previous works in stateless verification [14, 16] assume that packet headers have n-bits, simulating realistic packet headers which can be large in practice. This makes the complexity of checking safety of stateless networks PSPACE-hard. Our model avoids packet space explosion by only supporting three fields: source, destination, and packet tags. We make this simplification since our work primarily focuses on middlebox policies (rather than routing). As demonstrated in Sec. 5.1, middlebox policies are commonly specified in terms of the source and destination hosts of a packet and the network port (service) being accessed. For example, at the application level, firewalls may decide how to handle a packet according to a small set of application types (e.g., skype, ssh, etc.). Source, destination and packet tag are thus sufficient for reasoning about safety with respect to these policies. This simplification is also supported by recent works (e.g. [17]) which suggest that in practice the forwarding behavior depends only on a small set of bits.

Lossless Channels. Previous works on infinite ordered communication channels have introduced lossy channel systems [2] as an abstraction of ordered communication that recovers decidability. Lossy channel systems allow messages to be lost in transit, making the reachability problem decidable, but with a non-elementary lower bound on time complexity. In our model, packets cannot be lost. On the other hand, the order of packets arrival becomes nondeterministic. With this abstraction, we manage to obtain elementary time complexity for verification.

Initial Experience. We implemented a tool which accepts symbolic representations of middleboxes and a network configuration and verifies safety. For increasing (and stateless) networks, the tool generates a Datalog program and a query which holds iff a bad state is reachable. Then, the query is evaluated using existing Datalog engines [22].

For arbitrary networks (and for progressing networks), the tool generates a petri-net and a coverability property which holds iff the network reaches a bad state. To verify the coverability property we use LOLA [1, 28] — a Petri-Net model checker.

Main Results. The main contributions of the paper are: (i) We define a conservative abstraction of networks in which packets can be processed out of order, and show that the safety problem of stateful networks becomes decidable, but EXPSPACE-complete. (ii) We identify classes of networks, characterized by the forwarding behaviors of their middleboxes, which admit better complexity results (PTIME and coNP). We demonstrate that these classes capture real-world middleboxes. The upper bounds are made more realistic by stating them in terms of a symbolic representation of middleboxes. (iii) We present initial empirical results using Petri nets and Datalog engines to verify safety of networks. Due to space constraints, all proofs are omitted. More details and examples are provided in a technical report [34].

2 A Formal Model for Stateful Networks

In this section, we present a formal model of networks with stateful middleboxes.

A network \(\mathsf {N}\) is a finite undirected graph of hosts and middleboxes, equipped with a packet domain. Formally, \(\mathsf {N}= (H\cup M, E, P)\), where H is a finite set of hosts, M is a finite set of middleboxes, \(E \subseteq \{\{u,v\}\mid u,v\in H\cup M\}\) is the set of (undirected) edges and P is a set of packets. A host \(h \in H\) consists of a unique id and a set of packets \(h_P \subseteq P\) that it can send.

Packets. In real networks, a packet consists of a packet header and a payload. The packet header contains a source and destination host ids and additional arbitrary stream of control bits. The payload is the content of the packet and may consist of any arbitrary sequence of bits. In particular, the set of packets need not be finite. In this work, P is a set of abstract packets. An abstract packet \(p \in P\) consists of a header only in the form of a triple (sdt), where \(s,d\in H\) are the source and destination hosts (respectively) and t is a packet tag that ranges over a finite domain T. Intuitively, T stands for an abstract set of services or security policies. Therefore, \({P}= H\times H\times T\), making it a finite set. Middlebox behavior in our model is defined with respect to abstract packets and is oblivious of the underlying concrete packets.

2.1 Stateful Middleboxes

A middlebox \(m \in M\) in a network \(\mathsf {N}\) has a set of ports \(\mathsf {Pr}\), which consists of all the adjacent edges of m in the network \(\mathsf {N}\), and a forwarding transducer F.

The forwarding transducer of a middlebox is a tuple \(F = (\varSigma , \varGamma , Q_m, q^0_m, \delta _m)\) where \(\varSigma = P \times \mathsf {Pr}\) is the input alphabet in which each input letter consists of a packet and an input port, \(\varGamma = 2^{\varSigma }\) is the output alphabet describing (possibly empty) sets of packets over the different ports, \(Q_m\) is a possibly infinite set of states, \(q^0_m \in Q_m\) is the initial state, and \(\delta _m: Q_m \times \varSigma \rightarrow 2^{\varGamma \times Q_m}\) is the transition relation. Note that the alphabet \(\varSigma \) is finite (since abstract packets are considered). We extend \(\delta _m\) to sequences \(h \in (P\times \mathsf {Pr})^*\) in the natural way: \(\delta _m(q,\epsilon ) = \{(\epsilon , q)\}\) and \(\delta _m(q,h\cdot (p, pr )) = \{ (\gamma _i \cdot o', q') \mid \exists q_i \in Q_m.\, (\gamma _i,q_i) \in \delta _m(q,h) \wedge (o',q') \in \delta _m(q_i,(p, pr ))\}\). The language of a state \(q \in Q_m\) is \(L(q) = \{(h,\gamma ) \in (P\times \mathsf {Pr})^* \times (P\times \mathsf {Pr})^* \mid (\gamma ,q') \in \delta _m(q,h)\}\). The language of F, denoted L(F), is the language of \(q^0_m\). We also define the set of histories leading to \(q \in Q_m\) as \(h(q) = \{h \in (P\times \mathsf {Pr})^* \mid (\gamma ,q) \in \delta _m(q^0_m,h) \}\).

If F is deterministic, i.e., \(|\delta _m(q,(p, pr ))|\le 1\), then every history leads to at most one state and output, in which case F defines a possibly partial forwarding function \(\mathsf {f}: (P\times \mathsf {Pr})^* \times (P \times \mathsf {Pr}) \rightarrow 2^{P\times \mathsf {Pr}}\) where \(\mathsf {f}(h, (p, pr )) = o\) for the (unique) output o such that \((h \cdot (p, pr ), \gamma \cdot o) \in L(F)\). \(\mathsf {f}\) defines the (possibly empty) set of output packets (paired with output ports) that m will send to its neighbors following every history h of packets that m received in the past and input packet p arriving on input port pr. If F is nondeterministic, a forwarding relation is defined in a similar way.

Note that every forwarding function \(\mathsf {f}\) can be defined by an infinite-state deterministic transducer: \(Q_m\) will include a state for every possible history, with \(\epsilon \) as the initial state. \(\delta _m\) will map a state and an input packet to the set of output packets as defined by \(\mathsf {f}\), and will change the state by appending the packet to the history.

Finite-state Middleboxes. Arbitrary middlebox functionality, defined via infinite-state transducers, makes middleboxes Turing-complete, and hence impossible to analyze. To make the analysis tractable, we focus on abstract middleboxes, whose forwarding behavior is defined by finite-state transducers. Nondeterminsm can then be used to overapproximate the behavior of a concrete, possibly infinite-state, middlebox via a finite-state abstract middlebox, allowing a sound abstraction w.r.t. safety. Note that when nondeterministic transducers are considered, the correspondence between packet histories and transducer states no longer holds, as a single history might lead to multiple states.

In the sequel, unless explicitly stated otherwise, we consider abstract middleboxes. We identify a middlebox with its forwarding relation and the transducer that implements it, and use m to denote each of them.

Symbolic Representation of Middleboxes. We use a symbolic representation of finite-state middleboxes, where a state of m is described by the valuation of a finite set of relations \(R_1,\ldots , R_k\) defined over finite elements (e.g., packet header fields). The transition relation \(\delta _m\) is also described symbolically using (nondeterministic) update operations of the relations and output. Technically, we use guarded commands, where guards are Boolean expressions over relation membership predicates of the form \(\overline{e}\) in R and element equalities \(e_1 = e_2\). Each \(e_i\) is either a constant or a variable that refers to packet fields. Commands are of the form: (i) insert tuple \(\overline{e}\) to relation R, (ii) remove tuple \(\overline{e}\) from relation R, and (iii) output set of tuples.

Example 1

Figure 2a contains a symbolic representation of a hole-punching Firewall which uses a unary relation trusted. It assumes that port 1 connects hosts inside a private organization to the firewall and that port 2 connects public hosts. By default, messages from public hosts are considered untrusted and are dropped. trusted stores public hosts that become trusted once they receive a packet from private hosts.

Fig. 2.
figure 2

Symbolic representation of middleboxes.

Figure 2b contains a simplified, nondeterminitic, version of a Proxy server (or cache server). A proxy stores copies of documents (packet payloads) that passed through it. Subsequent requests for those documents are provided by the proxy, rather than being forwarded. Our modelling abstracts away the packet payloads and keeps only their types. Consequently we use nondeterminism to also account for different requests with the same type. The internal relation \(\texttt {cache}\) stores responses for packet types.

2.2 Concrete (FIFO) Network Semantics

The semantics of a network is given by a transition system defined over a set of configurations. In order to define the semantics we first need to define the notion of channels which capture the transmission of packets in the network. Formally, each (undirected) edge \(\{u,v\}\in E\) in the network induces two directed channels: (uv) and (vu). The channel (vu) is the ingress channel of u, as well as the egress channel of v. It consists of the sequence of packets that were sent from v to u and were not yet received by u (and similarly for the channel (uv)). The capacity of channels is unbounded, that is, the sequence of packets may be arbitrarily long.

Configurations and Runs. A configuration of a network consists of the content of each channel and the state of every middlebox. The initial configuration of a network consists of empty channels and initial states for all middleboxes. A configuration \(c_2\) is a successor of configuration \(c_1\) if it can be obtained by either: (i) some host h sending a sequence of packets \(p_1,\dots ,p_\ell \in h_P\) to a neighbor, thus appending these packets to the corresponding channel; or (ii) some middlebox m processing a packet p from the head of one of its ingress channels, changing its state to \(q'\) and appending output o to its egress channels if \((o,q')\in \delta _m(q,(p, pr ))\) (where q is the current state of m and \( pr \) is the port associated with the ingress channel). This model corresponds to asynchronous networks with non-deterministic event order.

A run of a network from configuration \(c_0\) is a sequence of configurations \(c_0,c_1,c_2,\dots \) such that \(c_{i+1}\) is a successor configuration of \(c_{i}\). A run is a run from the initial configuration. The set of reachable configurations from a configuration \(c_i\) is the set of all configurations that reside on a run from \(c_i\). The set of reachable configurations of a network is the set of reachable configurations from the initial configuration.

3 Verification of Safety Properties in Stateful Networks

In this section we define the safety verification problem in stateful networks, as well as the special case of isolation. We prove their undecidability w.r.t. the FIFO semantics.

To describe safety properties, we augment middleboxes with a special abort state that is reached whenever \(\delta _m(q,(p,pr)) = \emptyset \), i.e., the forwarding behavior is undefined (not to be confused with the case where \((\emptyset , q') \in \delta _m(q, (p, pr ))\) for some \(q' \in Q_m\)). This lets middleboxes function as “monitors” for safety properties. If \(\delta _m(q,(p,pr)) = \emptyset \), and \(h \in h(q)\), we say that m aborts on \(h \cdot (p, pr )\) (and every extension thereof). Similarly, we augment the symbolic representation with an abort command.

We define abort configurations as network configurations where at least one middlebox is in an abort state.

Safety. The input to the safety problem consists of a network \(\mathsf {N}\) (that possibly contains property middleboxes). The output is \(\text{ True }\) if no abort configuration is reachable in \(\mathsf {N}\), and \(\text{ False }\) otherwise.

Isolation. An important example of a safety property is isolation. In the isolation problem, the input is a network \(\mathsf {N}\), a set of hosts \(H_i \subseteq H\) and a forbidden set of packets \(P_i\subseteq P\). The output is \(\text{ True }\) if there is no run of \(\mathsf {N}\) in which a host from \(H_i\) receives a packet from \(P_i\), and \(\text{ False }\) otherwise. The isolation problem can be formulated as a safety problem by introducing an isolation middlebox \(m_{h_i}\) for every host \(h_i \in H_i\). The role of \(m_{h_i}\) is to monitor all traffic to \(h_i\), and abort if a forbidden packet \(p \in P_i\) arrives. All other packets are forwarded to \(h_i\). Clearly, isolation holds if and only if the resulting network is safe.

Example 2

Figure 3 shows several examples of interesting middlebox topologies for verification. In all of the topologies shown we want to verify a variant of the isolation property. In Fig. 3a we want to verify that A, a host, cannot send more than a fixed number of packets to B. Here \(r_1\) and \(r_2\) are rate limiters, i.e., they count the number of packets they have seen going from one host to the other, and lb is a load balancer that evenly spreads packets from A along both paths (to minimize the load on any one path). In Fig. 3b we want to ensure that host A cannot access data that originates in \(S_1\), but should be allowed to access data from \(S_2\), where f is a firewall and c is a proxy (cache) server. Finally in Fig. 3c we show a multi-tenant datacenter (e.g., Amazon EC2), where many independent tenants insert rules into firewalls (\(f_1\) and \(f_2\)) and we want to ensure that the overall behavior of these rules is correct. For example, we would like to ensure that \(pri_1^1\) cannot communicate with \(pri_2^1\), and \(pub_2^1\) communicates with \(pri_1^1\) only if \(pri_1^1\) initiates the connection.

Fig. 3.
figure 3

Interesting network topologies for verification.

Undecidability of Safety w.r.t. the FIFO Semantics. We prove undecidability even in networks with no forwarding loops. We show that undecidability holds for a network with a DAG topology (i.e., a network with uni-directional links and no directed cycles).

Theorem 1

The safety problem w.r.t. the FIFO network semantics is undecidable even for networks with finite-state middleboxes and without forwarding loops.

The proof of the theorem uses a reduction from the (undecidable) halting problem of a two-counter machine to the complement of the isolation problem. Interestingly, the reduction constructs a network with only three middleboxes, that do not change the packet header (namely, they just forward packets).

4 Abstract Network Semantics

In this section we define an abstract network semantics, called the unordered semantics, which recovers decidability of the safety problem.

In the concrete (FIFO) network semantics channels are ordered. In an ordered channel, if a packet \(p_1\) precedes a packet \(p_2\) in an ingress channel of some middlebox, then the middlebox will receive packet \(p_1\) before it receives packet \(p_2\). We abstract this semantics by an unordered network semantics, where the channels are unordered, i.e., there is no restriction on the order in which a middlebox receives packets from its ingress channel. In this case, the sequence of pending packets in a channel can be abstracted by a multiset of packets. Namely, the only relevant information is how many occurrences each packet has in the channel. The definitions of configurations and runs w.r.t. the unordered semantics are adapted accordingly.

Remark 1

Every run w.r.t. the FIFO network semantics is also a run w.r.t. the unordered semantics. Therefore, if safety holds w.r.t. the unordered semantics, then it also holds for the FIFO semantics, making the unordered semantics a sound abstraction of the FIFO semantics w.r.t. safety. The abstraction can introduce false alarms, where a violation exists w.r.t. the unordered semantics but not w.r.t. the concrete semantics. Still, in many cases, the abstraction is precise enough to enable verification. In particular, in Lemma 4 we show that for an important class of networks, the two semantics coincide w.r.t. safety.

Decidability of Safety w.r.t. the Unordered Semantics. In the unordered semantics, the network forms a special case of monotone transition systems: We define a partial order \(\le \) between network configurations such that \(c_1 \le c_2\) if the middlebox states in \(c_1\) and \(c_2\) are the same and \(c_2\) has at least the same packets (for every packet type) in every channel. The network is monotone in the sense that for every run from \(c_1\) there is a corresponding run from any bigger \(c_2\), since more packets over a channel can only add possible scenarios. The partial order is trivially a well-quasi-order (as the number of packets cannot be negative), and the predecessor relation is obviously computable. The classical results in [3, 12] prove that in monotone transition systems a backward reachability algorithm always terminates and thus, the safety problem is decidable. Formal arguments and complexity bounds are provided by Theorem 4.

5 Classification of Stateful Middleboxes

Encouraged by the decidability of safety w.r.t. the unordered semantics, we are now interested in investigating its complexity. As a first step, in this section, we identify three special classes of forwarding behaviors of middleboxes within the class of arbitrary middleboxes. Namely, stateless, increasing, and progressing middleboxes. We show that these classes capture the behaviors of real world middleboxes. The classes naturally extend to classes of networks: a network is stateless (respectively, increasing, progressing or arbitrary) if all of its middleboxes are. As we show in Sec. 6, each of these classes results in a different complexity of the safety problem.

Stateless Middlebox. A middlebox m is stateless if it can be implemented as a transducer with a single state (in addition to the abort state), i.e., its forwarding behavior does not depend on its history.

Increasing Middlebox. A middlebox m is increasing if its forwarding relation is monotonically increasing w.r.t. its history, where histories are ordered by the subsequence relationFootnote 2, denoted by \(\sqsubseteq \). Formally, a middlebox m is increasing if for every two histories \(h_1, h_2 \in (P\times \mathsf {Pr})^*\): if \(h_1 \sqsubseteq h_2\), then for every packet p and port \( pr \), if \((h_1 \cdot (p, pr ), \gamma _1 \cdot o_1) \in L_m\) then either m aborts on \(h_2\cdot (p, pr )\) or there is \(\gamma _2 \cdot o_2\) s.t. \((h_2 \cdot (p, pr ), \gamma _2 \cdot o_2) \in L_m\) and \(o_1 \subseteq o_2\), where \(L_m\) is the language of m’s transducer.

Progressing Middlebox. In order to define progressing middleboxes, we define an equivalence relation between middlebox states based on their forwarding behavior. States \(q,q'\) are equivalent, denoted \(q_1 \approx q_2\), if \(L(q_1) = L(q_2)\). A middlebox m is progressing if it can be implemented by a transducer in which whenever the state is changed into a non-equivalent state, it will never return to an equivalent state. Formally, if \((o',q') \in \delta _m(q,(p, pr ))\) and \(q' \not \approx q\) (where \(q,q'\) are reachable states of m) then for any history \(h\in (P\times \mathsf {Pr})^*\), if \((\gamma '', q'') \in \delta _m(q',h)\) then \(q'' \not \approx q\).

The next lemma summarizes the hierarchy of the classes (as illustrated by Fig. 1).

Lemma 1

  • Any stateless middlebox is also increasing.

  • Any increasing middlebox is also progressing.

Syntactic Characterization of Middlebox Classes. The classes of middleboxes defined above can be characterized via syntactic restrictions on their symbolic representation.

A middlebox representation is syntactically stateless if its representation does not use any insert or remove command on any relation. A middlebox representation is syntactically increasing if its representation does not use the remove command on any relation, and does not include any insert command under guards that include negated membership predicates. A middlebox representation is syntactically progressing if its representation does not use the remove command on any relation.

Lemma 2

A middlebox is stateless (respectively increasing, progressing) if and only if it has a stateless (respectively increasing, progressing) representation.

5.1 Examples

In this subsection, we introduce several middleboxes, each of which resides in one of the classes of the hierarchy presented above.

ACL Switches. An ACL switch has a fixed access control list (ACL) that indicates which packets it should forward and which packets it should discard. Typically the rules in the list refer to the port number or to hosts that are allowed to use a certain service. As such, the forwarding policy of an ACL switch is based only on the source host and/or ingress port of the current packet, and does not depend on previous packets. Hence, an ACL switch can be implemented by a stateless middlebox.

Hole-punching Firewalls. A hole-punching firewall is described in Example 1. As the set of trusted hosts depends on the history of the middlebox, a hole punching firewall cannot be captured by a stateless middlebox. (Formally, the same packet is handled differently when it follows different histories.) On the other hand, it is increasing. If for a certain history a host is trusted, then any additional packets (in the past or in the future) will not make it untrusted.

Learning Switch. A learning switch dynamically learns the topology of the network and constructs a routing table accordingly. Initially, the routing table of the switch is empty. For every host h the switch remembers the first port from which a packet with source h has arrived. When a packet arrives, if the port of the destination host is known, then the packet is forwarded to that port; otherwise, the packet is forwarded to all connected ports excluding the input-port.

A learning switch is a progressing middlebox. Intuitively, after the middlebox’s forwarding function has changed to incorporate the destination port for a certain host h, it will never revert to a state in which it has to flood a packet destined for h. A learning switch is however, not an increasing middlebox, as packets destined for a host whose location is not known are initially flooded, but after location of the host is learned, a single copy of all subsequent packets are sent.

Proxy Server. The Proxy server as described in Example 1 is an increasing middlebox. After it has stored a response, it nondeterministically replies with the stored response, or sends the request to the server again. However, in a concrete network model that does not abstract away the packet payload, a proxy is a progressing middlebox. Once a new request is responded by a proxy the forwarding behavior changes as it takes into account the new response, and it never returns to the previous forwarding behavior (as it does not “forget” the response). However, such a proxy is not an increasing middlebox: while it behaves in a monotonically increasing manner over its request port, it behaves in a monotonically decreasing manner over the response port.

Round-Robin Load Balancer. A load balancer is a device that distributes network traffic across a number of servers. In its simplest implementation, a round-robin balancer with n out-ports (each connected to a server) forwards the i-th packet it receives to out-port \(i\pmod n\). Round-robin load balancers are not progressing middleboxes, as the same forwarding function repeats after every cycle of n packets.

Remark 2

In practice, middlebox behavior can also be affected by timeouts and session termination. For example, in a firewall, a trusted host may become untrusted when a session terminates (which makes the firewall behavior no longer increasing). In this work, we do not model timeouts and session termination. In many practical cases, such as firewalls, resets can only prevent packets from being forwarded and therefore restrict reachability, thus not causing safety violations.

6 Complexity of Safety W.r.t. the Unordered Semantics

When considering the unordered network semantics, the safety problem becomes decidable for networks with finite-state middleboxes. In this section, we analyze its complexity. We provide tight bounds, as well as algorithms with matching complexity. The complexity bounds are w.r.t the input size, namely, (i) the number of hosts; (ii) number of middleboxes; and (iii) the encoding size of the middleboxes functionality, i.e., the size of the explicit state machine (if the encoding is explicit) or the number of characters in the symbolic representation (if the encoding is symbolic).

The following lemma summarizes the obtained lower bounds:

Lemma 3

The safety problem w.r.t. the unordered network semantics is coNP-hard for progressing networks, and EXPSPACE-hard for arbitrary stateful networks.

The coNP-hardness result is proved by a reduction from the complement of the Hamiltonian Path problem. The constructed network contains only stateless middleboxes and learning switches, making the coNP-hardness result apply already to such networks, which are used in practice. The second part of the lemma is proved by a reduction from the control state reachability problem of vector addition systems with states (VASS) which is known to be EXPSPACE-complete [10].

Upper Bounds. The rest of this section provides complexity upper bounds for the safety problem of stateful networks w.r.t. the unordered semantics of networks. Our complexity analysis considers symbolic representations of middleboxes (which might be exponentially more succinct than explicit-state representations). The obtained upper bounds match the lower bounds from Lemma 3 (hence, the bounds are tight).

Remark 3

The complexity upper bounds we present are under the assumption that all relations used to define middlebox states may have at most polynomial number of elements (polynomial in the size of the network and the size of the middlebox representation). To enforce this limitation we assume that the arity of relations is constant.

6.1 Unordered Safety of Increasing Networks is in PTIME

In this section, we show that safety of syntactically increasing networks is in PTIME. Further, we show that for increasing networks, safety w.r.t. the unordered semantics and the FIFO semantics coincide. As such, the polynomial upper bound applies to both.

Fig. 4.
figure 4

Safety checking of increasing networks.

Figure 4 presents a polynomial algorithm for determining safety of a syntactically increasing network. The algorithm performs a fixed-point computation of the set of all tuples present in middlebox relations in reachable middlebox states, as well as the set of all different packets transmitted in the network. For every middlebox \(m \in M\), the algorithm maintains the following sets:

  • \( StateData (m)\): a set of pairs of the form \((R,\overline{d})\) where R is a relation of m, and \(\overline{d}\) is a tuple in the domain of R, indicating that there is a run in which \(\overline{d}\in R\).

  • \( PacketData (m)\): a set of pairs of the form \((p, pr )\), where p is a packet and \( pr \) is a port of m, indicating that p can reach m from port \( pr \).

\( StateData (m)\) is initialized to reflect the initial values of all middlebox relations. \( PacketData (m)\) is initialized to include the packets that can be sent from neighbor hosts. As long as a fixed-point is not reached, the algorithm iterates over all middleboxes and their packet data. For each middlebox m and \((p, pr ) \in PacketData (m)\), m is run over \((p, pr )\) from the state q in which every relation R contains all the tuples \(\overline{d}\) such that \((R,\overline{d}) \in StateData (m)\). The sets \( StateData (m)\) and \( PacketData (m')\) for every neighbor \(m'\) of m, are updated to reflect the discovery of more elements in the relations (more reachable states), and more packets that can be transmitted.

As the algorithm only adds reachable states and packets, its running time is polynomial and bounded by \(|M|(|P||\mathsf {Pr}|\sum |R_i|)^2\).

The correctness of the algorithm relies on the property of increasing networks that if a packet is sent in some run from a reachable configuration, then a run where it is sent exists from every reachable configuration. The same goes for elements that are added to relations. Intuitively, this ensures that even though the algorithm considers “accumulative” middlebox states (by accumulating relation values) rather than exploring all possible reachable states, it does not miss any violation of safety. We conclude:

Theorem 2

The safety problem of syntactically increasing networks w.r.t. the unordered semantics is in PTIME.

Remark 4

If n-tag packet headers are allowed, i.e. \(P = H \times H \times T_1 \ldots \times T_n\), then |P| is no longer polynomial in the network representation, damaging the complexity analysis of the algorithm. In fact, in this case the safety problem w.r.t. the unordered semantics becomes PSPACE-hard even for stateless middleboxes (this is proved by reduction from the emptiness problem of the intersection of n automata).

Recall that in general, safety w.r.t. the FIFO semantics and the unordered semantics do not coincide. However, the following lemma shows that for increasing networks they do, making the same algorithm and complexity analysis applicable. The proof utilizes the property that in increasing networks if a packet p reaches a middlebox m once (in either semantics), then it can reach m again, thus enabling the simulation of unordered channels with ordered ones. The lemma applies also to infinite-state middleboxes.

Lemma 4

Let \(\mathsf {N}\) be an increasing network. Then the output of the safety problem in \(\mathsf {N}\) w.r.t. the FIFO semantics and w.r.t. the unordered semantics is identical.

6.2 Unordered Safety of Progressing Networks is in CoNP

We prove coNP-membership of the safety problem in syntactically progressing networks by proving that there exists a witness run for safety violation if and only if there exists a “short” witness run, where a witness run for safety violation is a run from the initial configuration in which at least one middlebox reaches an abort state. The key observation is formalized by the following lemma:

Lemma 5

Let \(\mathsf {N}\) be a syntactically progressing network whose middleboxes are defined via relations \(R_1,\dots ,R_n\) (in total). Then there is a run ending in an abort state if and only if there is such a run whose length is at most \((\sum _{i=1}^n |R_i|)^3|P||M|\).

The proof of the lemma considers the network states that arise in a run. A network state consists of the values of \((R_1,\dots ,R_n)\), i.e., it captures the states of all middleboxes (not to be confused with a network configuration, which also includes the content of every channel). In order to construct a shorter run, we bound both the number of different network states in a run and the number of steps in which a run stays in the same state. The former is bounded by \(\sum _{i=1}^n |R_i|\) due to the progress of the network. To provide a bound for the latter, we analyze the packets that “affect” the run, utilizing the property that steps that process packets that do not affect the run can be omitted.

Since the size of each relation is polynomial in the size of the network, and combined with the hardness result from Lemma 3, we conclude:

Theorem 3

The safety problem of syntactically progressing networks w.r.t. the unordered semantics is coNP-complete.

6.3 Unordered Safety of Arbitrary Networks is in EXPSPACE

In this section we show how to solve the non-safety problem of symbolic networks by a reduction to the coverability problem of vector addition systems (VAS), a.k.a. petri-nets, which is EXPSPACE-complete [26].

A VAS is a pair \((x_0\in \mathbb {N}^k, X\subset \mathbb {Z}^k)\), where \(x_0\) is the initial value vector and X is a set of transition vectors, each with k dimensions. A finite run in the VAS is a sequence of transitions \(x_1,x_2,\dots ,x_\ell \), such that for every \(i\in \{1,\dots ,\ell \}\) the sum \(x_0 + x_1 +\dots + x_i\) is non-negative in all dimensions. The coverability problem asks whether a VAS has a run \(x_1,x_2,\dots ,x_\ell \) with \(\sum _{i=0}^\ell x_i \ge y\), where y is an input vector.

VAS Construction. We sketch a polynomial encoding of a network as a VAS. Roughly speaking, the transitions of the VAS are used to simulate the processing of packets in the network. Their non-deterministic nature captures the non-deterministic order of network events. We first introduce the VAS dimensions and their roles in the simulation.

Channel Simulation: To keep track of the packets over the unbounded channels, we assign a packet dimension to every packet \(p\in P\) and every channel. The initial value of each packet dimension is 0, it is incremented whenever a packet is added to a channel, and decremented whenever a packet is processed.

Relation Simulation: To keep track of relation values, we assign two dimensions, active and inactive, to every relation R and every tuple \(\overline{d}\) in the domain of R. The active dimension indicates whether \(\overline{d} \in R\) and the inactive one indicates whether \(\overline{d} \not \in R\). Both dimensions will have only values of 0 or 1. We need two dimensions since the VAS semantics does not allow to encode negative (e.g., non-membership) conditions.

Single Step Simulation: To make sure that no two packets are simultaneously processed, we introduce a scheduler dimension. The scheduler dimension has initial value 1, it is decremented whenever a packet processing starts, and incremented when it ends. In addition, to keep track of which command needs to be executed, we assign a command dimension to every guard and command, including an abort dimension (if an abort command exists). The guard/command dimension has value 1 when the command needs to be executed. Finally, to keep track of values of variables (e.g., src, dst, tag, prt), we assign a dimension for every possible value d of variable \(e_i\). The dimension of \((e_i,d)\) has value 1 if and only if \(e_i\) has value d.

The VAS transitions increment and decrement these dimensions to simulate the start of a packet processing event, as well as the execution of each guarded command. In particular, decrements are used to enforce the execution of transitions only when the dimension has value 1 (and not 0).

Non-safety of the network then amounts to a run in the VAS where an abort dimension gets a positive value. The reduction, combined with the lower bound implies:

Theorem 4

The safety problem of arbitrary stateful networks w.r.t. the unordered semantics is EXPSPACE-complete.

7 Implementation and Case Studies

In this section, we describe a prototype implementation of a tool for verification of stateful networks, and describe our initial experience while running the tool on the networks listed in Example 2 and illustrated in Fig. 3. For the experiments we used quad core Intel Core i7-4790 CPU with 32 GB memory.

Increasing Middleboxes. Increasing networks are verified using LogicBlox, a Datalog based database system [5]. The Multi-Tenant Datacenter example is an increasing network. Our tool produced a datalog program with 35 predicates, 153 rules and 29 facts. LogicBlox successfully reached a fixed point in 3s, and proved all required properties.

Arbitrary Middleboxes. Progressing and Arbitrary networks are verified using LOLA, a Petri-Net model checker [1, 28]. In the Load Balancer and Rate Limiter example our tool created a P/T net with 243 places and 663 transitions; it was successfully verified in 30ms. In the Firewall and Proxy example our tool produced a P/T net with 530 places and 4447 transitions. LOLA successfully verified the resulting petri-net in 0.2s.

8 Conclusion and Related Work

In this paper, we investigated the complexity of reasoning about stateful networks. We developed three algorithms and several lower bounds. In the future we hope to develop practical verification methods utilizing the results in this paper. Below we survey some of the most closely related work.

Topology-independent Verification. The earliest use of formal verification in networking focused on proving correctness and checking security properties for protocols [11, 27]. Recent works such FlowLog [21] and VeriCon [6] also aim to verify the correctness of a given middlebox implementation w.r.t any possible network topology and configuration, e.g., flow table entries only contain forwarding rules from trusted hosts.

Immutable Topology-dependent Verification. Recent efforts in network verification [4, 9, 14, 16, 17, 20, 30, 32] have focused on verifying network properties by analyzing forwarding tables. Some of these tools including HSA [15], Libra [35] and VeriFlow [17]. These tools perform near real-time verification of simple properties, but they cannot handle dynamic (mutable) datapaths.

Mutable Topology-dependent Verification. SymNet [33] has suggested the need to extend these mechanisms to handle mutable datapath elements. In their mechanism the mutable middlebox states are encoded in the packet header. This technique is only applicable when state is not shared across a flow (i.e., the middlebox can punch holes, but do no more), and will not work for cache servers or learning switches.

The work in [24] is the most similar to our model. Their work considers Python-like syntax enriched with uninterpreted functions that model complicated functionality. However [24] do not define formal network semantic (e.g., FIFO vs ordered channels) and do not give any formal claim on the complexity of the solution.

Channel Systems. Channel systems, also called Finite State Communicating Machines, are systems of finite state automata that communicate via asynchronous unbounded FIFO channels [7, 8]. They are a natural model for asynchronous communication protocols. Verification of such systems in undecidable. Abdulla and Jonsson [2] introduced lossy channel systems where messages can be lost in transit. In their model the reachability problem is decidable but has a non-primitive lower bound [29].

In this work we use unordered (non-lossy) channels as a different relaxation for channel systems. The unordered semantics over-approximates the lossy semantics w.r.t. safety, as any violating run w.r.t. the lossy semantics can be simulated by a run w.r.t. the unordered semantics where “lost” packets are starved until the violation occurs. The unordered semantics admits verification procedures with elementary complexity, and turns out to be sufficiently precise for many network protocols.