03 Sep 2020

Discussion meeting

This is a summary of two group meetings on 27 August and 03 September 2020.

All papers discussed are from CAV 2020.

Pengfei: Yizhak Yisrael Elboher, Justin Gottschlich, and Guy Katz. An Abstraction-Based Framework for Neural Network Verification

In this paper, they propose a general framework for over-approximating and refining DNNs, and resent an abstraction-refinement algorithm for DNN verification. Experimental results show that the abstraction-refinement approach can help state-of-art SMT based tools like Marabou to improve its efficiency.

How to find the indicator? Randomly chosen, or follow some convergence.
Abstraction-refinement solves fewer robustness verification problem.

One reason is that the robustness region is small, so the abstraction may be too imprecise, and many refinement steps are needed.

Depeng: Nima Roohi , Yu Wang et al. STMC: Statistical Model Checker with Stratified and Antithetic Sampling

用分层采样与对偶采样的方法做统计验证，成功率比符号验证与显式验证远远超出，采样数比一般统计验证方法大大减少。

一些思路：

Can statistical verification be used to verify differential privacy? Use the tool to test dp.
做采样，buechi inclusion checking；
准确度问题，prism interval valuation和其他值迭代算法(CAV ‘20)等等，探讨准确度。

Song: Sahai S, Subramanyan P, Sinha R. Verification of Quantitative Hyperproperties Using Trace Enumeration Relations. CAV2020

Quantitative Hyperproperties(QHP)是一类描述满足某些性质的trace的数量（上下界）的hyperproperty，本文通过几个QHP的例子(ZK_Hats, Path_ORAM等)介绍了：

specification and satisfication semantics for Quantitative HyperLTL over symbolic transition system
semi-automated QHP verification methodology；
model counting via SMT solver: inference rules for bounding the number of satisfiable solutions of FOL formula

具体思路：
1. construct transition system model
2. construct a relation \mathscr{U} and a predicate \mathscr{V}
3. prove \mathscr{U}, \mathscr{V} are enumerations
4. counting number of satisfying assignments to \mathscr{V}
优点：
1. 直接验证QHP需要同时对指数数量的traces进行推理，这个方法里只需要对2-3个trace上的non-Quantitative Hyperproperties进行验证
2. 用model counting of FOL formula进行trace数量上的估计，有成熟的SMT工具可以利用
Example cases of QHP:
1. Quantitative Non-interference: It states that information leakage to an adversary should be less than n bits. It can be expressed in HyperLTL, or equivalently with a counting operator,
  - Example case: password checker When the password checker leaks k bits of the secret password, at most 2^k traces of information is leaked.
2. Quantitative Deniability: It states that for every infinitely-long sequence of observations that an adversary makes, there are (exponentially) many different secrets that could have resulted in exactly these observations.
  - Example case: electronic purse Adversary wants to know the initial balance (the secret), by debiting a fixed amount from the purse until the balance is insufficient for another transaction. We can give a deniability property: there is a sufficient number of traces with identical attacker observations but different initial balances.

Ivy来自 Microsoft Research 的针对分布式协议和算法的设计和实现的一种模态验证工具，可以验证一阶线性时序逻辑(FO-LTL)的性质。其规范、实现和证明都具有模块化的特点。 Ivy支持通过三种模式证明参数化和无限状态系统的安全性和活性。三种模式分别是：SMT求解器 Z3 （基于VC和推理的验证），通过饥饿抽象调用ABC模型检测器以及自然演绎（手动证明）。并且由一个策略引擎将三种模式组合使用，也可以说 Ivy 是一个自动定理证明器。并且除了支持验证设计，Ivy还可以抽取得到 C++ 代码来获得可执行的分布式程序。 Ivy的设计原则包括可判定的自动推理，证明稳定性，失败透明性（失败时给出规范字哪个子公式导致问题超出可判定界）。

问：Invariant Checker 是做什么的，什么情况会失败，失败了会怎么处理？

答：用户在 .ivy 文件中书写 invariant，invariant check 去检查模型是否满足用户书写的 invariant，invariant 可以是一阶逻辑公式和线性时序逻辑公式。失败有两种可能，第一是超出可判定界限，这时失败透明性特性会工作，会建议用户将导致不可判定的子公式进行修改。第二是用户书写的 invariant 不正确，会有图形界面给出有限的反例，帮助用户修改 invariant。如果 invariant check 成功，invariant 就可以作为得到证明的定理被其他两种模式使用。

问：这个模型验证的性质是什么？

答：时序的安全性活性，一阶线性时序逻辑(FO-LTL)。在 .ivy 中在 ensure 和 temporal property 关键字后给出。

问：第三个模式 Logical Tactics 是做什么用的？

答：指的是提供了一个手写证明的 tactic language，这个证明引擎是基于 lambda-pi-calculus 和确定性的二阶匹配算法，有点类似 Isabelle/FOL。用户可以在 ensure 和 temporal property 后面通过 proof 关键字使用人工书写全部或者部分证明的功能，这些证明可以帮助自动定理证明的部分工作。

问：三个模式是如何工作的，轮流工作，一个不行用另一个吗？

答：不是的是由 tactic engine 管理，相互协调工作的，相互之间可以为其他模式提供证明或者传递证明任务（提供帮助和索取帮助）。其中 Invariant checker 和 eager abstraction 共享 VC generator，这是自动证明的部分，Logical tactic 指的是人工证明的部分。

Weizhi: Arnd Hartmanns and Benjamin Lucien Kaminski. Optimistic Value Iteration

主要贡献：

这篇文章提出了一种新的算法optimistic value iteration（OVI）来计算sound reachability probabilities和expected rewards。算法通过标准的value iteration方法来获得一个lower bound，并用结果来猜测一个upper bound，并证明后者的正确性。

相比标准value iteration算法，OVI算法可以以适度开销提供一个可靠性的结果；相比sound value iteration（SVI）和interval iteration（II），OVI算法的运行时间更快。

问题和讨论：

park induction：今天讨论中提到应用park induction来证明猜测的u为upper bound正确性的问题，的情况，有，视频中的assume gfp=lfp在文中解释是考虑unique fixed points的情况，此时，于是, u不是upper bound；其他情况下我们不知道的关系，因此依然需要修改, 重新迭代。
能处理多大状态数的模型使用的测试模型来自the Quantitative Verification Benchmark Set，没有具体说处理的模型状态数，但文中提到使用了“large and excluding clearly acyclic benchmarks”。
讨论读CAV17讲interval iteration的文章，进行比较。

Ying: Daniel J. Fremont, Johnathan Chiu, Dragos D. Margineantu, Denis Osipychev, and Sanjit A. Seshia.Formal Analysis and Redesign of a Neural Network-Based Aircraft Taxiing System with VerifAI

VerifAI is an integrated design and analysis tool chain which can be successfully applied to realistic industrial autonomous systems. They used verifAI to find several failure cases in taxinet, diagnose their root causes and eliminate them through better training set design. verifAI scenic and the interface they used are open source.

The authors plan to explore a variety of ways might further improve performance and further automate error analysis, building other techniques available with VerifAI and Scenic.

The definition of robustness of an MTL formula ρ? Definition in the paper:” VerifAI uses a continuous quantity ρ called the robustness of an MTL formula [4]. Its crucial property is that ρ ≥ 0 when the formula is satisﬁed, while ρ ≤ 0 when the formula is violated, so that ρ provides a metric of how close the system is to violating the property. The exact deﬁnition of ρ is not important here, but as an illustration, for it is (the negation of) the greatest deviation beyond the allowed 1.5m achieved over the whole simulation. ”

Definition in the [4] Robustness of temporal logic specifications:

How does the neural network process the images? I’ll search for more details about it.
VerifAI framework:

definition of the rubustness ρ:

Pengfei: Fourier Transformation in Image Processing

This report introduces the mathematical definition of Fourier transformation, and how it works in image processing.

What is the domain of F(u,v)? Actually [0,M]\times[0,N]. An observation is that F(u,v)=F(u+M,v) for any u,v. It is periodic on both dimensions.
Why is the high frequency part usually black? High frequency corresponds to high speed of grey scale changes, and for many pictures, such part is a minor. Can Fourier transformation be used to process images from space? If the frequency information corresponds to the features we are interested in, it is definitely yes.

Depeng: Global PAC Bounds for Learning Discrete Time Markov Chains (Hugo Bazille, Blaise Genest, Cyrille Jegourel, and Jun Sun)

This paper offers global PAC bounds for learning DTMC, which is used in describing the closeness of behaviors between the system and the learnt model. The behaviors are described as LTL and CTL. The learning assumption is that the structure of DTMC is known before, without knowing the transition matrix. The assumption is different with previous methods most of which require to know a minimal probability bound on all the transitions. It proves that learning cannot offer any global PAC bound for LTL formula, but can offer one for Time-To-Failure properties and the whole CTL formula.

Compared with Statistical Model Checking:

SMC uses hypothesis testing to learn the probability of a property in a system, while learning method learns a complete model. For a given TTF property, SMC and learning method takes almost the same number of samples; However, learning model can offer bounds for the whole CTL formula (definitely more samples), which is not the case for SMC. The experiments comparison (w.r.t a given unbounded property) are listed below, where the method in [31] (published in 2011) takes significantly fewer number of samples and shorter path lengths. But it meets TO and MO. (I think the implementation method matters and the author explains that they didn’t compare running time for the reason of different platforms and nature of results.)

Different platforms and implementations and different kinds of outputs are part of reasons for difference in experiments. More concrete, SMC experiment results from paper《Faster Statistical Model Checking for Unbounded Temporal Properties》in 2017, where hypothesis testing combined with SMC are used to evaluate the LTL and quantitative properties (mean payoff) by analyzing the reachability of BSCC. I can’t further explain the difference since all the models in Table 2 of MO and TO also appear in the testing paper, where the results don’t show a MO or TO for these models. The property to test in Table 2 is not clear (“We consider three test-cases from PRISM, satisfying the property that the sample stops with a conclusion (yes or no) with probability 1”).

Used for DP:

There are reasons why this method is not suitable for learning DP. DP property can be described by nest Next operators, which is an LTL formula. According to the paper’s proof, the learnt model will completely differ from the original model for such kind of formula. This is the main reason. Besides, normally the DP algorithm’s structure and transitions are completely known and there is no need to learn. If not, the assumption of learning is that we know the DTMC structure of DP algorithm, without transition probabilities. This situation is quite rare…

Shizhen: Dirk Beyer and Martin Spiessl LMU Munich, Munich, Germany. MetaVal: Witness Validation via Verification

MetaVal可以对其它verifier验证的ReachSafety, MemSafety, NoOverflows和Termination做有效性检验。它可以利用其它verifier生成的violation witness和correctness witness将原程序和性质做转换（论文中给出的程序转换方式是自动机相乘。）之后，它可以使用后端verifier来验证转换后的程序和性质是否满足，从而判断之前verifier给出的结论是否正确。这个工具的优点主要是两方面：一是可验证的性质广；二是可以直接利用verifier做有效性检验。

问：这篇论文中有提到MetaVal检查出verifier结论错误的情况么？

论文中实验部分只包括MetalVal验证其它verifier结论正确的例子，没有检查出其它verifier结论错误的数据。同时我后面会再阅读相关论文，增加对这个问题的理解。

问：这篇论文中提到的输入的correctness witness是有一个统一的格式么？

论文中将witness统一定义为protocol automata，其中correctness witness是由一种特殊的protocol automata即observe automata来表示的，所以我理解这些witness是有一种统一的输入格式。

Xie: Local Reasoning about the Presence of Bugs: Incorrectness Separation Logic

本篇文章结合了Incorrectness logic和Separation logic. 构造了新的Incorrectness Separation Logic. 并给出了相关的推理规则和语义定义，修改了Separation中free操作会导致heap的域减小从而导致incorrectness逻辑中的矛盾的问题。在修改的语义定义下，文章给出了ISL的Soundness证明。

ISL能够用来证明程序中的和内存与指针相关的bug的存在，此外文章还基于符号化的堆，给出了利用ISL的前向的符号执行的推理规则，并证明了Soundness.

该文章的代码实现被集成在Infer工具Pulse模块中，将于下个版本release.