06 Jan 2021

Discussion meeting

This is a summary of the group meeting on 6 January 2021

Polynomial Invariant Generation for Non-deterministic Recursive Program

该文章研究了对于含有多项式的非确定性含函数调用的程序中的多项式的不变式生成问题。本文的基本的思路是利用蕴含式左右两边的多项式组合关系，用待定系数的方式，建立系数的二次系统，并利用已有的二次求解器求是否有使得约束满足的系数，如果存在则找到多项式不变式。该文在理论上给出了一个semi-complete和sound的算法，并且算法的复杂度为subexponential，该结果在已有的理论结果中目前是最好的，其中soundness根据如上的idea容易得到，semi-completeness则是利用了Putinar’s Positivstellensatz定理以及一些归约进行证明。文章对该算法也给出了Java的实现，并在一些简单例子上进行了实验。根据实验结果以及和相关工具的对比，侧面说明了该算法对于处理多项式不变式和多项式程序的有效性。

问题以及思考：

该方法在Ranking Function合成中的对应。不变式能够利用以上方法是因为Hoare Logic中天然对于Inductive Invariant存在蕴含式，在Ranking Function中定义仅要求下降和有界。如果用以上的Idea将有界提出来（1）有界的条件可以写成多项式平方的和，或者可以证明某candidate ranking function在相应的程序空间上是大于某平方和。（2）对于下降的条件，目前的想法：如果是连续的ranking function，能否从导数上面找一些使得能够推出严格下降的条件。
该文章的不变式都是conjunctive的，能否结合之前disjunctive的文章？
其他invariant生成方法的调研，以及工具调研，例如UAutomizer等

Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks

Main contribution:

This paper study strategy synthesis for partially observable Markov decision process (POMDPs). They propose a novel method that combines from machine learning and formal verification.

First, they train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP.
Secondly, they restrict the RNN-based strategy to represent a finite-memory strategy and implement it on a specific POMDP. For the resulting finite Markov chain, formal verification provides guarantees against temporal logic specifications.
If the specification is not satisfied, they use the information from counterexamples to improve the strategy by iteratively training the RNN.

Experimental Results:

They evaluate the RNN-based synthesis procedure on benchmark examples and compare to PRISM-POMDP and the point-based solver SolverPOMDP. Experiments show that their method elevates the state-of-the-art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.

Comments and problems:

这篇文章研究的是一个比较经典的问题，有很多传统方法，在解决这种问题时神经网络可以作为一个很好的工具来应用，该问题和目前在做的案例关系很大，后面可以再仔细读一下研究一下：
1. 目前案例中实现的是传统的point-based value iteration算法求解strategy，且没有引入specification，后面可以考虑如何结合这篇文章的方法，用RNN来提取strategy。
LSTM一般处理固定长度的输入输出序列，如语音和文章等，这篇文章对POMDP是怎么处理的？
1. 针对特定问题固定了输入和输出的序列长度，因此这篇文章考虑的是finite horizon的POMDP问题和undiscounted accumulated cost.
2. 文章最后提到future work是考虑扩展该方法到continuous state spaces上，另外我们可以考虑如何做infinite的问题(discounted accumulated cost).
由于目前无人机案例还没做完，读了这篇文章的实验感觉有点担心用传统算法能不能很好地求解无人机防碰撞的strategy了（可能需要缩减状态数规模）。