07 Apr 2021

Discussion meeting

This is a summary of the group meeting on 7 April 2021.

L*- Based Learning of Markov Decision Processes

They provide two algorithms to learn a given black-box Markov decision processes based on Angluin’s L* algorithm. They are exact learning and sample-based learning, respectively.

Q: What is for unambiguous?
A: The compatibility relation is not an equivalence relation, and a state may belong to more than one classes, which we call an ambiguous state, so for an unambiguous state, there is only one class whose representation is compatible with it.
Q: How to check equivalence in exact learning?
A: First it checks whether the hypothesis is isomorphic to the model. If so, we only need finitely many output distribution queries to determine all the transition probabilities.
Q: Is there a quantitative description of the convergence?
A: Not yet in the paper. In the paper it only gives a convergence description based on Borel-Cantelli Lemma. It is interesting to consider its PAC guarantee description.
Q: Can MDP learning used for model checking and other field in software engineering?
A: There are some experiments in the paper. We need more investigation on this part.