This is a summary of the group meetings on 7 April, 2021
L*- Based Learning of Markov Decision Processes
They provide two algorithms to learn a given black-box Markov decision processes based on Angluin’s L* algorithm. They are exact learning and sample-based learning, respectively.
Q: What is for unambiguous?
A: The compatibility relation is not an equivalence relation, and a state may belong to more than one classes, which we call an ambiguous state, so for an unambiguous state, there is only one class whose representation is compatible with it.
Q: How to check equivalence in exact learning?
A: First it checks whether the hypothesis is isomophic to the model. If so, we only need finitely many output distribution queries to determine all the transition probabilities.
Q: Is there a quantitative description of the convergence?
A: Not yet in the paper. In the paper it only gives a convergence description based on Borel-Cantelli Lemma. It is interesting to consider its PAC guarantee description.
Q: Can MDP learning used for model checking and other field in software engineering?
A: There are some experiments in the paper. We need more investigation on this part.