My name is Nat. I am a PhD student under the supervision of Prof. Murray Shanahan at Imperial College London. My research interests are in the areas of reinforcement learning, generative models and deep learning with Bayesian methods.
15 June 2018: Our paper Deep Reinforcement Learning with Risk-Seeking Exploration is accepted at SAB 2018
Deep Reinforcement Learning with Risk-Seeking Exploration
N. Dilokthanakul and M. Shanahan in the Proceedings of the 15th International Conference on the Simulation of Adaptive Behavior (SAB 2018), From Animals to Animats 15, Publisher: Springer International Publishing; https://doi.org/10.1007/978-3-319-97628-0_17
In most contemporary work in deep reinforcement learning (DRL), agents are trained in simulated environments. Not only are simulated environments fast and inexpensive, they are also `safe'. By contrast, training in a real world environment (using robots, for example) is not only slow and costly, but actions can also result in irreversible damage, either to the environment or to the agent (robot) itself. In this paper, we consider taking advantage of the inherent safety in computer simulation by extending the Deep Q-Network (DQN) algorithm with an ability to measure and take risk. In essence, we propose a novel DRL algorithm that encourages risk-seeking behaviour to enhance information acquisition during training. We demonstrate the merit of the exploration heuristic by (i) arguing that our risk estimator implicitly contains both parametric uncertainty and inherent uncertainty of the environment which are propagated back through Temporal Difference error across many time steps and (ii) evaluating our method on three games in the Atari domain and showing that the technique works well on Montezuma's Revenge, a game that epitomises the challenge of sparse reward.
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
N. Dilokthanakul, C. Kaplanis, N. Pawlowski and M. Shanahan in arXiv preprint arXiv:1705.06769
The problem of sparse rewards is one of the hardest challenges in contemporary reinforcement learning. Hierarchical reinforcement learning (HRL) tackles this problem by using a set of temporally-extended actions, or options, each of which has its own subgoal. These subgoals are normally handcrafted for specific tasks. Here, though, we introduce a generic class of subgoals with broad applicability in the visual domain. Underlying our approach (in common with work using "auxiliary tasks") is the hypothesis that the ability to control aspects of the environment is an inherently useful skill to have. We incorporate such subgoals in an end-to-end hierarchical reinforcement learning system and test two variants of our algorithm on a number of games from the Atari suite. We highlight the advantage of our approach in one of the hardest games -- Montezuma's revenge -- for which the ability to handle sparse rewards is key. Our agent learns several times faster than the current state-of-the-art HRL agent in this game, reaching a similar level of performance.
Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders
N. Dilokthanakul, P. A. M. Mediano, M. Garnelo, M. C. H. Lee, H. Salimbeni, K. Arulkumaran, and M. Shanahan, In arXiv preprint arXiv:1611.02648.
We study a variant of the variational autoencoder model (VAE) with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the known problem of over-regularisation that has been shown to arise in regular VAEs also manifests itself in our model and leads to cluster degeneracy. We show that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with our model. Furthermore we analyse the effect of this heuristic and provide an intuition of the various processes with the help of visualizations. Finally, we demonstrate the performance of our model on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving competitive performance on unsupervised clustering to the state-of-the-art results.
Classifying Options for Deep Reinforcement Learning
K. Arulkumaran, N. Dilokthanakul, M. Shanahan, and A. A. Bharath, presented at IJCAI Workshop on Deep Reinforcement Learning, 2016
In this paper we combine one method for hierarchical reinforcement learning-- the options framework with deep Q-networks (DQNs)-- through the use of different “option heads” on the policy network, and a supervisory network for choosing between the different options. We utilise our setup to investigate the effects of architectural constraints in subtasks with positive and negative transfer, across a range of network capacities. We empirically show that our augmented DQN has lower sample complexity when simultaneously learning subtasks with negative transfer, without degrading performance when learning subtasks with positive transfer.