Abstract
Static analysis of low-level programs (C or LLVM) requires modeling memory. To strike a good balance between precision and performance, most static analyzers rely on the C memory model in which a pointer is a numerical offset within a memory object. Finite partitioning of the address space is a common abstraction. For instance, the allocation-site abstraction creates partitions by merging all objects created at the same allocation site. Recency abstraction refines the allocation-site abstraction by distinguishing the most recent allocated memory object from the previous ones. Unfortunately, these abstractions are not often precise enough to infer invariants that are expressed over the contents of dynamically allocated data-structures such as linked lists. In those cases, more expensive abstractions such as shapes that consider connectivity patterns between memory locations are often needed.
Instead of resorting to expensive memory abstractions, we propose a new memory model, called region-based memory model (RBMM). RBMM is a refinement of the C memory model in which pointers have an extra component called regions. Thus, a memory object can spawn multiple regions which can greatly limit aliasing since regions are pairwise disjoint. Since RBMM requires that each memory instruction refers explicitly to a region, we first present a new intermediate representation (IR) based on regions which is the input of our abstract interpreter Crab. Second, we show how abstractions such as allocation-site and recency can be easily adapted to RBMM. Third, we evaluate Crab using our new IR and a simple allocation-site abstraction on widely-used C projects.
Jorge A. Navas has been supported in part by NSF grant 1816936. We also acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In this paper, we use the terms smashing and allocation-site interchangeably.
- 2.
Usually, the term reference is used as an alias for an address that precludes pointer arithmetic. We do not place such a restriction. We use reference as an alternative to pointer, to stress that a reference belongs to a statically known region.
- 3.
Memory is word-addressable if all reads and writes access to the same, fixed amount of bytes (the CPU word length). Note that any C program can be ported to word-addressable memory by adding extra instructions that fill a word with empty bytes before a memory write or extract some bytes from a word after a memory read.
- 4.
Informally, casting any pointer to a char* type is allowed but the opposite is not. Moreover, signedness and constness do not affect the strict aliasing rules.
- 5.
Crab proves the four properties using smashing abstraction and unrolling the loop one iteration. Finite loop unrolling would not help Infer and IKOS.
- 6.
For instance, integer casts and casts between Boolean and integers are omitted.
- 7.
Note that the cell \(\langle 0,0 \rangle \) can either mean the null reference or the integer 0.
- 8.
We abuse notation and use \([\![.]\!](.)\) to evaluate integer and Boolean expressions defined as usual: \([\![c]\!](\sigma ) = c\) for constant c, \([\![v]\!](\langle \_,\_,numEnv,\_,\_,\_\rangle ) = numEnv(v)\) for variable v, \([\![e_1 + e_2]\!](\sigma ) = [\![e_1]\!](\sigma ) + [\![e_2]\!](\sigma )\) for expressions \(e_1\) and \(e_2\), etc.
- 9.
We use the notation \(\lambda x. \bot \) to represent the undefined function.
- 10.
Publicly available at https://github.com/seahorn/crab.
- 11.
SeaDsa, Clam, and Crab are also integrated into the SeaHorn verification framework [13].
- 12.
Publicly available at https://github.com/seahorn/clam.
- 13.
References
Balakrishnan, G., Reps, T.: Recency-abstraction for heap-allocated storage. In: Yi, K. (ed.) SAS 2006. LNCS, vol. 4134, pp. 221–239. Springer, Heidelberg (2006). https://doi.org/10.1007/11823230_15
Blanchet, B., et al.: A static analyzer for large safety-critical software. In: PLDI, pp. 196–207 (2003)
Blazy, S., Bühler, D., Yakobowski, B.: Structuring abstract interpreters through state and value abstractions. In: Bouajjani, A., Monniaux, D. (eds.) VMCAI 2017. LNCS, vol. 10145, pp. 112–130. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52234-0_7
Bouillaguet, Q., Bobot, F., Sighireanu, M., Yakobowski, B.: Exploiting pointer analysis in memory models for deductive verification. In: Enea, C., Piskac, R. (eds.) VMCAI 2019. LNCS, vol. 11388, pp. 160–182. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11245-5_8
Brat, G., Navas, J.A., Shi, N., Venet, A.: IKOS: a framework for static analysis based on abstract interpretation. In: Giannakopoulou, D., Salaün, G. (eds.) SEFM 2014. LNCS, vol. 8702, pp. 271–277. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10431-7_20
Calcagno, C., Distefano, D.: Infer: an automatic program verifier for memory safety of C programs. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 459–465. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-5_33
Chow, F., Chan, S., Liu, S.-M., Lo, R., Streich, M.: Effective representation of aliases and indirect memory operations in SSA form. In: Gyimóthy, T. (ed.) CC 1996. LNCS, vol. 1060, pp. 253–267. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61053-7_66
Conway, C.L., Dams, D., Namjoshi, K.S., Barrett, C.: Pointer analysis, conditional soundness, and proving the absence of errors. In: Alpuente, M., Vidal, G. (eds.) SAS 2008. LNCS, vol. 5079, pp. 62–77. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69166-2_5
Cornish, J.R.M., Gange, G., Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Analyzing array manipulating programs by program transformation. In: Proietti, M., Seki, H. (eds.) LOPSTR 2014. LNCS, vol. 8981, pp. 3–20. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17822-6_1
Cuoq, P., Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.: Frama-C: a software analysis perspective. In: Eleftherakis, G., Hinchey, M., Holcombe, M. (eds.) SEFM 2012. LNCS, vol. 7504, pp. 233–247. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33826-7_16
Filliâtre, J.-C., Marché, C.: Multi-prover verification of C programs. In: Davies, J., Schulte, W., Barnett, M. (eds.) ICFEM 2004. LNCS, vol. 3308, pp. 15–29. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30482-1_10
Gopan, D.: Numeric program analysis techniques with applications to array analysis and library summarization. Ph.D. thesis, University of Wisconsin (2007)
Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The SeaHorn verification framework. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 343–361. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_20
Gurfinkel, A., Navas, J.A.: A context-sensitive memory model for verification of C/C++ programs. In: Ranzato, F. (ed.) SAS 2017. LNCS, vol. 10422, pp. 148–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66706-5_8
Illous, H., Lemerre, M., Rival, X.: A relational shape abstract domain. In: Barrett, C., Davies, M., Kahsai, T. (eds.) NFM 2017. LNCS, vol. 10227, pp. 212–229. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57288-8_15
Jourdan, J., Laporte, V., Blazy, S., Leroy, X., Pichardie, D.: A formally-verified C static analyzer. In: POPL, pp. 247–259 (2015)
Journault, M., Miné, A., Monat, R., Ouadjaout, A.: Combinations of reusable abstract domains for a multilingual static analyzer. In: Chakraborty, S., Navas, J.A. (eds.) VSTTE 2019. LNCS, vol. 12031, pp. 1–18. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41600-3_1
Journault, M., Miné, A., Ouadjaout, A.: Modular static analysis of string manipulations in C programs. In: Podelski, A. (ed.) SAS 2018. LNCS, vol. 11002, pp. 243–262. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99725-4_16
Kuderski, J., Navas, J.A., Gurfinkel, A.: Unification-based pointer analysis without oversharing. In: FMCAD, pp. 37–45 (2019)
Miné, A.: Field-sensitive value analysis of embedded C programs with union types and pointer arithmetics. In: LCTES, pp. 54–63 (2006)
Moy, Y.: Automatic modular static safety checking for C programs. Ph.D. thesis, Université Paris-Sud (2009)
Rakamarić, Z., Emmi, M.: SMACK: decoupling source language details from verifier implementations. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 106–113. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_7
Rakamarić, Z., Hu, A.J.: A scalable memory model for low-level code. In: Jones, N.D., Müller-Olm, M. (eds.) VMCAI 2009. LNCS, vol. 5403, pp. 290–304. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-93900-9_24
Sui, Y., Xue, J.: SVF: interprocedural static value-flow analysis in LLVM. In: CC, pp. 265–266 (2016)
Venet, A.: A scalable nonuniform pointer analysis for embedded programs. In: Giacobazzi, R. (ed.) SAS 2004. LNCS, vol. 3148, pp. 149–164. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27864-1_13
Wang, W., Barrett, C., Wies, T.: Cascade 2.0. In: McMillan, K.L., Rival, X. (eds.) VMCAI 2014. LNCS, vol. 8318, pp. 142–160. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54013-4_9
Warren, R., Hermenegildo, M.V., Debray, S.K.: On the practicality of global flow analysis of logic programs. In: ICLP, pp. 684–699 (1988)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Appendices
A Motivating Example from Sect. 2 in CrabIR
In this section, we show the translation of our motivating example from Sect. 2 to CrabIR. For readability purposes, variable names have been renamed to match those names used in the C program. We have also omitted types when they can be inferred from the context. We show two functions: mk_list in Fig. 11 and main in Fig. 12. The function init is translated in a very similar manner.
B Auxiliary Functions for Concrete Semantics
(See Fig. 14).
C Concrete Semantics of reftoint and inttoref
The concrete semantics of reftoint and inttoref is shown in Fig. 15. It is straightforward by delegating to the functions \(\mathsf {cell2Addr} \) and \(\mathsf {addr2Cell} \). The most interesting part occurs in \(ref := \texttt {inttoref}(rgn, x)\) where the program state (\(rgnAddrs\)) must add the new reference ref to the set of owned addresses by the region rgn.
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Gurfinkel, A., Navas, J.A. (2022). Abstract Interpretation of LLVM with a Region-Based Memory Model. In: Bloem, R., Dimitrova, R., Fan, C., Sharygina, N. (eds) Software Verification. NSV VSTTE 2021 2021. Lecture Notes in Computer Science(), vol 13124. Springer, Cham. https://doi.org/10.1007/978-3-030-95561-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-95561-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95560-1
Online ISBN: 978-3-030-95561-8
eBook Packages: Computer ScienceComputer Science (R0)