logo

SCIENCE CHINA Information Sciences, Volume 62, Issue 10: 200101(2019) https://doi.org/10.1007/s11432-018-9825-3

Evaluation of model checkers by verifying message passing programs

More info
  • ReceivedJul 30, 2018
  • AcceptedMar 6, 2019
  • PublishedSep 3, 2019

Abstract

Benchmarks and evaluation are important for the development of techniques and tools. Studies regarding evaluation of model checkers by large-scale benchmarks are few. The lack of such studies is mainly because of the language difference of existing model checkers and the requirement of intensive labor in building models. In this study, we present a large-scale benchmark for evaluating model checkers whose inputs are concurrent models. The benchmark consists of 2318 models that are generated automatically from real-world message passing interface (MPI) programs. The complexities of the models have been inspected to be well distributed and suitable for evaluating model checkers. Based on the benchmark, we have evaluated five state-of-the-art model checkers, i.e., PAT, FDR, Spin, PRISM, and NuSMV, by verifying the deadlock freedom property. The evaluation results demonstrate the ability and performance difference of these model checkers in verifying message passing programs.


Acknowledgment

This work was supported by National Key RD Program of China (Grant No. 2017YFB1001802) and National Natural Science Foundation of China (Garnt Nos. 61472440, 61632015, 61690203, 61532007).


References

[1] Clarke E M, Grumberg O, Peled D A. Model Checking. Cambridge: MIT Press, 2001. Google Scholar

[2] Frappier M, Fraikin B, Chossart R, et al. Comparison of model checking tools for information systems. In: Proceedings of the 12th International Conference on Formal Engineering Methods, 2010. 581--596. Google Scholar

[3] Pelánek R. BEEM: benchmarks for explicit model checkers. In: Proceedings of the 14th International SPIN Workshop on Model Checking Software, 2007. 263--267. Google Scholar

[4] Gopalakrishnan G, Kirby R M, Siegel S. Formal analysis of MPI-based parallel programs. Commun ACM, 2011, 54: 82-91 CrossRef Google Scholar

[5] Siegel S F. Verifying parallel programs with mpi-spin. In: Proceedings of Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2007. 13--14. Google Scholar

[6] Luo Z Q, Zheng M C, Siegel S F. Verification of MPI programs using CIVL. In: Proceedings of the 24th European MPI Users' Group Meeting, 2017. 6: 1--11. Google Scholar

[7] Yu H B, Chen Z B, Fu X J, et al. Combining symbolic execution and model checking to verify MPI programs. 2018,. arXiv Google Scholar

[8] King J C. Symbolic execution and program testing. Commun ACM, 1976, 19: 385-394 CrossRef Google Scholar

[9] Gibson-Robinson T, Armstrong P, Boulgakov A, et al. A Modern Refinement Checker for CSP. In: Proceedings of Tools and Algorithms for the Construction and Analysis of Systems, 2014. 187--201. Google Scholar

[10] Lattner C. Llvm and clang: next generation compiler technology. In: Proceedings of the BSD Conference, 2008. 1--2. Google Scholar

[11] Hoare C A R. Communicating Sequential Processes. Upper Saddle River: Prentice-Hall, 1985. Google Scholar

[12] Scattergood J B. The semantics and implementation of machine-readable CSP. Dissertation for Ph.D. Degree. Oxford: University of Oxford, 1998. Google Scholar

[13] McMillan K L. Symbolic model checking. Kluwer, 1993. Google Scholar

[14] Baier C, Katoen J. Principles of Model Checking. Cambridge: MIT Press, 2008. Google Scholar

[15] Siegel S F, Zirkel T K. TASS: The Toolkit for Accurate Scientific Software. MathComputSci, 2011, 5: 395-426 CrossRef Google Scholar

[16] Xue R N, Liu X Z, Wu M, et al. Mpiwiz: subgroup reproducible replay of mpi applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009. 251--260. Google Scholar

[17] Müller M, de Supinski B, Gopalakrishnan G, et al. Dealing with mpi bugs at scale: Best practices, automatic detection, debugging, and formal verification. 2011. Google Scholar

[18] Vakkalanka S. Efficient dynamic verification algorithms for MPI applications. 2010. Google Scholar

[19] Thompson J D, Higgins D G, Gibson T J. Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 1994. 22: 4673-4680. Google Scholar

[20] Lattner C, Adve V S. LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004. 75--88. Google Scholar

[21] Just R, Jalali D, Inozemtseva L, et al. Are mutants a valid substitute for real faults in software testing? In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2014. 654--665. Google Scholar

[22] Newman M E J. The Structure and Function of Complex Networks. SIAM Rev, 2003, 45: 167-256 CrossRef ADS Google Scholar

[23] Hermann L R. Laplacian-isoparametric grid generation scheme. J Eng Mech Div, 1976, 102: 749--907. Google Scholar

[24] Godefroid P. Partial-Order Methods for the Verification of Concurrent Systems - An Approach to the State-Explosion Problem. In: Lecture Notes in Computer Science. Berlin: Springer, 1996. Google Scholar

[25] McKeeman W M. Differential testing for software. Digit Tech J, 1998, 10: 100--107. Google Scholar

[26] Vakkalanka S S, Gopalakrishnan G, Kirby R M. Dynamic verification of MPI programs with reductions in presence of split operations and relaxed orderings. In: Proceedings of the 20th International Conference on Computer Aided Verification, 2008. 66--79. Google Scholar

[27] Vakkalanka S S, Sharma S, Gopalakrishnan G, et al. ISP: a tool for model checking MPI programs. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008. 285--286. Google Scholar

[28] Forejt V, Joshi S, Kroening D, et al. Precise predictive analysis for discovering communication deadlocks in MPI programs. ACM Trans Program Lang Syst, 2017, 39: 1--27. Google Scholar

[29] Blom S, van de Pol J, Weber M. Ltsmin: Distributed and symbolic reachability. In: Proceedings of the 22nd International Conference on Computer Aided Verification, 2010. 354--359. Google Scholar

[30] Lal A, Reps T. Reducing concurrent analysis under a context bound to sequential analysis. Form Methods Syst Des, 2009, 35: 73-97 CrossRef Google Scholar

[31] Laarman A, van de Pol J, Weber M. Boosting multi-core reachability performance with shared hash tables. In: Proceedings of the 10th International Conference on Formal Methods in Computer-Aided Design, 2010. 247--255. Google Scholar

[32] Kwiatkowska M Z, Norman G, Parker D. The PRISM benchmark suite. In: Proceedings of the 9th International Conference on Quantitative Evaluation of Systems, 2012. 203--204. Google Scholar

[33] Atiya A, Catao N, Lttgen G. Towards a benchmark for model checkers of asynchronous concurrent systems. University of Warwick United Kingdom, 2005. Google Scholar

  • Figure 1

    Framework of benchmark and evaluation.

  • Figure 9

    Framework of mpisv[7].

  • Figure 12

    Cumulative distribution function.

  • Figure 13

    (Color online) Complexity in programs. Complexity under (a) 2 process; (b) 4 process; (c) 6 process; (d) 8 process; (e) 10 process.

  • Figure 14

    (Color online) Percentage of wildcard receive operations.

  • Table 1   An example of an MPI program.
    $P_0$$P_1$$P_2$
    verb"IRecv(*, 1)"verb"ISend(0, 1)"verb"ISend(0, 1)"
    verb"Recv(1, 1)" verb"Barrier" verb"Barrier"
    verb"Barrier"
  •   

    Algorithm 1 Benchmark translation procedure

    Require:A benchmark model $M$, the number of processes $n$.

    Output:A tool-specific model $M'$.

    for $i~\leftarrow~(0,\ldots,n-1)$

    $P'_i$ := skip; // the tool specific model for $P_i$.

    for ${\rm~op}~\leftarrow~{P}_i$

    if ${\rm~op}~=~\mathsf{Barrier}$ then

    ${P}'_i$ := ${P}'_i~~;~~B$;ELSIF${\rm~op}~=~\mathsf{Wait}({\rm~req})$

    ${P}'_i$ := ${P}'_i~~;~~W_{\rm~req}?0$;ELSIF${\rm~op}~=~\mathsf{Ssend}({\rm~obj,~tag})$

    ${P}'_i$ := ${P}'_i~;~C_{\rm~obj}~!~0$;ELSIF${\rm~op}~=~\mathsf{ISend}({\rm~obj,~tag,~req})$

    ${P}'_i$ := ${P}'_i~~;~~D_{\rm~obj}~!~0~~;~~W_{\rm~req}!0$;ELSIF${\rm~op}~=~\mathsf{IRecv}({\rm~obj,~tag,~req})$

    $P_r$ := ROM$(M,~{\rm~op},~i)$;

    ${P}'_i$ := ${P}'_i~\parallel~P_r$;ELSIF${\rm~op}~=~\mathsf{Recv}({\rm~obj,~tag})$

    $P_r$ := ROM$(M,~{\rm~op},~i)$;

    ${P}'_i$ := ${P}'_i~~;~~P_r$;

    end if

    end for

    end for

    $M'~:=~\parallel~\{~P'_i~\mid~0~\leq~i~\leq~(n-1)~\}$; // all the $P'_i$ models synchronize on $B$.

    return $M'$.

  •   

    Algorithm 2 ROM($M$, op, pid) //Receive operation modeling

    ELSIF${\rm~obj}~=~k$ ${\rm~match}_s~:=~{\rm~match}_s~\cup~\{~\mathsf{send}({\rm~pid})~\mid~\mathsf{send}({\rm~pid})~\in~P_{k}\}$; // $\mathsf{send}({\rm~pid})$ can be matched with op.

    Require:benchmark model $M$, operation ${\rm~op}~=~\mathsf{recv}({\rm~obj})$, and process number pid.

    Output:the model for the receive operation.

    ${\rm~match}_s~:=~\emptyset$;

    if $obj~=~*$ then

    for $j~\leftarrow~(0,\ldots,~n-1)$

    ${\rm~match}_s~:=~{\rm~match}_s~\cup~\{~\mathsf{send}({\rm~pid})~\mid~\mathsf{send}({\rm~pid})~\in~P_{j}\}$; // $\mathsf{send}({\rm~pid})$ can be matched with op.

    end for

  • Table 2   Programs for benchmark generation
    Program Line of code Brief descripttion
    verb"DTG" 90 Dependence transition group
    verb"Integrate_mw" 181 Integral computing
    verb"Diffusion2d" 197 Simulation of diffusion equation
    verb"Gauss_elim" 341 Gaussian elimination
    verb"Heat" 613 Heat equation solver
    verb"Pingpong" 220 Comm performance testing
    verb"Mandelbrot" 268 Mandelbrot set drawing
    verb"Image_manip" 360 Image manipulation
    verb"Kfray" 12728KF-Ray parallel raytracer
    verb"ClustalW" 23265 Multiple sequence alignment
    Total 38263 10 open source MPI programs
  • Table 3   Number of models extracted from different mutants
    Program o m1 m2 m3 m4 m5 Total
    verb"DTG" 1 2 1 1 1 1 7
    verb"Integrate_mw" 5 5 67 77
    verb"Diffusion2d" 2 2 18 80 3 2 107
    verb"Gauss_elim" 5 6 11
    verb"Heat" 5 8 6 6 6 6 37
    verb"Pingpong" 0 28 28 547 28 28 659
    verb"Mandelbrot" 39 330 327 696
    verb"Image_manip" 20 5 5 30
    verb"Kfray" 1 5 641 5 652
    verb"ClustalW" 5 5 5 17 5 5 42
    Total 83 396 1098 656 43 42 2318
  • Table 4   Number of models in different scales of parallelism
    Program 2 4 5 6 8 10 Total
    verb"DTG" 7 7
    verb"Integrate_mw" 3 3 3 59 9 77
    verb"Diffusion2d" 37 70 107
    verb"Gauss_elim" 2 2 2 3 2 11
    verb"Heat" 13 6 6 6 6 37
    verb"Pingpong" 659 659
    verb"Mandelbrot" 48 243 169 150 86 696
    verb"Image_manip" 6 6 6 6 6 30
    verb"Kfray" 292 208 138 11 3 652
    verb"ClustalW" 6 9 15 6 6 42
    Total 1029 514 7 409 241 118 2318
  • Table 5   Complexity of programs
    Program 2 4 6 8 10
    verb"Integrate_mw" 3.5 15.25 35.17 113.24 131.82
    verb"Diffusion2d" 48.28 110.67
    verb"Gauss_elim" 4.38 28.65 75.39 146.30 240.37
    verb"Heat" 12.93 42.51 85.80 141.46 211.41
    verb"Pingpong" 23.75
    verb"Mandelbrot" 10.26 38.10 67.59 97.49 149.37
    verb"Image_manip" 5.87 18.58 29.65 40.24 50.61
    verb"Kfray" 25.34 83.61 137.25 182.61 207.53
    verb"ClustalW" 5.79 60.44 144.47 256.61 421.95
  • Table 6   Experimental results
    Program Model checker Verified models Average time (s)
    DTG (7)PAT $\bf~7$ 0.29
    FDR $\bf~7$ 0.20
    SPIN $\bf~7$ 1.17
    PRISM $\bf~7$ 1.91
    NuSMV $\bf~7$ $\bf~0.02$
    Integrate_mw (77)PAT 73 2.13
    FDR 75 8.27
    SPIN $\bf~77$ $\bf~1.82$
    PRISM 68 126.62
    NuSMV 9 15.39
    Diffusion2d (107)PAT $\bf~107$ $\bf~1.05$
    FDR 2 31.61
    SPIN $\bf~107$ 3.18
    PRISM 0 (Memory_Error)
    NuSMV 0 (Seg_fault)
    Gauss_elim (11)PAT 10 5.11
    FDR 9 13.12
    SPIN $\bf~11$ 5.26
    PRISM 2 $\bf~1.61$
    NuSMV 4 1468.03
    Heat (37)PAT $\bf~37$ $\bf~0.30$
    FDR 24 9.60
    SPIN $\bf~37$ 2.24
    PRISM 13 2.76
    NuSMV 13 5.52
    Pingpong (659)PAT $\bf~659$ $\bf~0.31$
    FDR 0 Timeout
    SPIN $\bf~659$ 3.36
    PRISM 0 (Memory_Error)
    NuSMV 0 (Seg_fault)
    Mandelbrot (696)PAT 694 $\bf~0.75$
    FDR 610 7.41
    SPIN $\bf~696$ 1.59
    PRISM 530 97.41
    NuSMV 428 265.73
    Image_manip (30)PAT $\bf~30$ $\bf~0.41$
    FDR 25 0.67
    SPIN $\bf~30$ 1.34
    PRISM $\bf~30$ 7.00
    NuSMV 22 253.91
    Kfray (652)PAT 650 3.80
    FDR 0 Timeout
    SPIN $\bf~652$ $\bf~3.36$
    PRISM 0 Timeout
    NuSMV 0 Timeout
    ClustalW (42)PAT 41 10.88
    FDR 25 89.68
    SPIN $\bf~42$ 5.66
    PRISM 6 1.59
    NuSMV 6 $\bf~0.05$

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1