Machine Learning for Intelligent chrdimi/teaching/theses/nikos-thesis.pdf · PDF file...

Click here to load reader

  • date post

    26-Sep-2020
  • Category

    Documents

  • view

    2
  • download

    0

Embed Size (px)

Transcript of Machine Learning for Intelligent chrdimi/teaching/theses/nikos-thesis.pdf · PDF file...

  • Machine Learning for Intelligent Agents

    N. Tziortziotis

    P h . D . D i s s e r t a t i o n

    – ♦ –

    Ioannina, March 2015

    ΣΜΗΜΑ ΜΗΦΑΝΙΚΩΝ Η/Τ & ΠΛΗΡΟΥΟΡΙΚΗ΢ ΠΑΝΕΠΙ΢ΣΗΜΙΟ ΙΩΑΝΝΙΝΩΝ

    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING UNIVERSITY OF IOANNINA

  • �������� ��������� ������� ��� ��� �������� ������ ���������

    � ����������� ��������

    ����������� ����

    ��������� ��� ��� ������ ��������� ������� ��������

    ��� �µ�µ���� ��������� �/� & ������������ ���������� ��������

    ��� ���

    ������� �����������

    �� µ���� ��� ����������� ��� �� ���� ���

    ������������ ���������� ���� �����������

    ������� 2015

  • ���µ���� ��µ���������� ��������

    • ������������ �������, ��������� ��������� ��� �µ�µ���� ��������� �/� ��� ������������ ��� ���������µ��� ��������� (���������)

    • ���������� �����, ��������� ��� �µ�µ���� ��������� �/� ��� ������������ ��� ���������µ��� ���������

    • ����������� �����, ����������� ��������� ��� �µ�µ���� ��������� �/� ��� ������������ ��� ���������µ��� ���������

    ����µ���� ���������� ��������

    • ������������ �������, ��������� ��������� ��� �µ�µ���� ��������� �/� ��� ������������ ��� ���������µ��� ��������� (���������)

    • ���������� �����, ��������� ��� �µ�µ���� ��������� �/� ��� ������������ ��� ���������µ��� ���������

    • ����������� �����, ����������� ��������� ��� �µ�µ���� ��������� �/� ��� ������������ ��� ���������µ��� ���������

    • ������������ ������, ��������� ��������� ��� �µ�µ���� ��������� �/� ��� ������������ ��� ���������µ��� ���������

    • �������� ������, ��������� ��� �µ�µ���� �������� �����µ���� ��� ���������µ��� ��������

    • ������ ����������, ����������� ��������� ��� ������ ������������ ����- ����� ��� ��������� ����������� ��� ������������ ������

    • �������-�������� ������������, ��������� ��� ������ ������������ ��������� ��� ��������� ����������� ��� �.�.�

  • Dedication

    To my family.

  • Acknowledgement

    I would like to sincerely thank my academic advisor Prof. Konstantinos Blekas for his valuable motivation, encouragement and assistance to my research e�ort during the elaboration of this dissertation. For the time and e�ort he spent for my guidance throughout all these years, dating back to 2008. He was always available when I needed him. Also, he was the one who introduced me to the exciting area of reinforcement learning. The collaboration with him has been a pleasant and memorable experience.

    Also, I would like to thank Prof. Christos Dimitrakakis for his excellent advising during my studies as an exchange Ph.D student, at the Swiss Federal Institute of Technology Lausanne. He gave me the opportunity to collaborate with him and extend my research horizons. Our conversations along with his endless passion for research will remain unforgettable. He is one of the most brilliant persons that I have ever meet. I feel really blessed to have worked with him.

    Furthermore, I am also grateful to the other members of my supervising committee Prof. Aristidis Likas and Prof. Chistophoros Nikou for their suggestions and insightful remarks. I would also like to thank Prof. Michail G. Lagoudakis, Prof. Andreas- Georgios Stafylopatis, Prof. Konstantinos Vlachos, and Prof. Georgios Vouros for serv- ing in the examination committee of my dissertation.

    I would also like to thank my colleagues Georgios Papagiannis and Konstantinos Tziortziotis for the excellent collaboration and the pleasant work environment during the last two years. I hope to meet again all together.

    A big thank goes to my parents Vasileios and Anastasia, as well as to my sisters Zoi and Eirini for always believing in me and unconditionally encouraging me. This disser- tation would definitely not be possible without their support and sacrifices, especially during these tough years. I am feel really grateful to have them in my life.

    Finally, I feel the need to express a heartfelt thank to Katerina Pandremmenou for her support and tolerance all these years. This journey would not be so delightful without her.

    Ioannina, March 2015 Nikolaos V. Tziortziotis

  • Contents

    1 Introduction 1 1.1 Machine Learning on Intelligent Agents . . . . . . . . . . . . . . . . . . . 2 1.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.2.1 Value Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Value Function Approximation . . . . . . . . . . . . . . . . . . . . 6

    1.3 Policy Evaluation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 Temporal Di�erence Learning . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 Least Squares Temporal Di�erence Learning . . . . . . . . . . . . . 10

    1.4 Control Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.1 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.2 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.3 Least Squares Policy Iteration . . . . . . . . . . . . . . . . . . . . . 13 1.4.4 Gaussian Process Reinforcement Learning . . . . . . . . . . . . . . 14

    1.5 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.6 Thesis Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2 Value Function Approximation through Sparse Bayesian Modeling 21 2.1 Gaussian Process Temporal Di�erence . . . . . . . . . . . . . . . . . . . . 23

    2.1.1 Online Sparcification . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1.2 Online Gaussian Process TD . . . . . . . . . . . . . . . . . . . . . . 26

    2.2 Relevance Vector Machine Temporal Di�erence . . . . . . . . . . . . . . . 27 2.2.1 Relevance Vector Machine TD for Policy Evaluation . . . . . . . . . 27 2.2.2 Sparse Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . 31 2.2.3 Episodic tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.4 Relevance Vector Machine TD for Policy Improvement . . . . . . . . 35

    2.3 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.1 Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    3 Model-based Reinforcement Learning using an Online Clustering Scheme 40 3.1 Clustering for Feature Selection . . . . . . . . . . . . . . . . . . . . . . . 42

    3.1.1 Clustering using Mixture Models . . . . . . . . . . . . . . . . . . . 43

    i

  • 3.1.2 Online EM Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Model-based Value Function Approximation . . . . . . . . . . . . . . . . . 46 3.3 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    3.3.1 Domains . . . . . . . . . . . . . . . . . . . . . .