07 Apr 2021

Discussion meeting

This is a summary of the group meeting on 7 April 2021.

L*- Based Learning of Markov Decision Processes

They provide two algorithms to learn a given black-box Markov decision processes based on Angluin’s L* algorithm. They are exact learning and sample-based learning, respectively.

  • Q: What is for unambiguous?
  • A: The compatibility relation is not an equivalence relation, and a state may belong to more than one classes, which we call an ambiguous state, so for an unambiguous state, there is only one class whose representation is compatible with it.

  • Q: How to check equivalence in exact learning?
  • A: First it checks whether the hypothesis is isomorphic to the model. If so, we only need finitely many output distribution queries to determine all the transition probabilities.

  • Q: Is there a quantitative description of the convergence?
  • A: Not yet in the paper. In the paper it only gives a convergence description based on Borel-Cantelli Lemma. It is interesting to consider its PAC guarantee description.

  • Q: Can MDP learning used for model checking and other field in software engineering?
  • A: There are some experiments in the paper. We need more investigation on this part.