Meta Reinforcement Learning - What Are You?

16/7/2021 ● 3 minutes to read

Machine learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data. So far so good... but, how do computers' experience something? Even more interesting - does this "experience" can be generalized to situations that the machine did not "experience" before? These are wonderful questions that ML researchers are trying to answer (and rapidly successfully do so). Therefore, if we are (somewhat) able to learn a solution to a problem by "experience" the way to the solution for a large number of cases, then the obvious next step is "let's learn to learn!". Today, I would like to focus on the less famous sister of the supervised-, unsupervised-, and reinforcement- learning family.

Just to make sure we all sync, Reinforcement Learning (RL) is the ability of a machine or robot to learn new skills by interacting with an environment and learn by trials and errors it commits, according to Yu et al. [1]. MRL is based on the idea that the learner (e.g., the machine) is rewarded for right actions (and sometimes punished for wrong actions). This process encourages the machine to do “right” actions more often and avoid the “wrong” action if it can, which in the long run helps it achieve optimal results. Henceforth, this can be identified as a way of directing unsupervised machine learning.

If we take this idea a step forward, one may ask how it is possible to find such a reward mechanism in a complex scenario? Or even better - do we can find an optimal reward mechanism for a given scenario? Fortunately, the answer is yes! Please welcome Meta Reinforcement Learning (MRL). MRL is a version of a meta-learner algorithm that searches out and finds appropriate learning algorithms tailored to specific learning tasks [3].

A huge step forward using MRL is presented by the paper “learn2learn: A Library for Meta-Learning Research” [4] which provides low-level routines common across a wide range of meta-learning techniques (e.g. meta-descent, meta-reinforcement learning, few-shot learning), and builds standardized interfaces to algorithms and benchmarks on top of them.

You are probably asking yourself how all this magic comes to life… Well, let's take a quick look at what is under the hood. So, meta-learning can be done using several main techniques. First, gradient-based meta-learning. Model-agnostic meta-learning (MAML) [5] aims to learn the initial parameters of a neural network such that taking one or several gradient descent steps from this initialization leads to effective generalization (or few-shot generalization) to new tasks. Then, when presented with new tasks, the model with the meta-learned initialization can be quickly fine-tuned using a few data points from the new tasks. Second, Recurrence-based meta-learning. This approach to meta-learning is to use recurrent models; in this case, the update function is always learned and corresponds to the weights of the recurrent model that update the hidden state. The parameters of the prediction model correspond to the remainder of the weights of the recurrent model and the hidden state.

That said, the progress in the MRL direction is overwhelming. According to Google, in 2017, there were around 10400 papers about meta-learning and the number kept increasing with 11800, 13300, 15600, and 8960 for 2018, 2019, 2020, and the first half of 2021. This data clearly shows the increased interest in meta machine learning in general and MRL in particular. MRL-based algorithms are shown to be a useful approach for a lot of tasks. However, there are still a lot of hurdles in order to achieve a successful machine employing this approach for “out-of-the-lab” scenarios and much more research is required to get us a community to a point where MRL can be used as widely as supervised learning for example.

References

  1. Yu, T., Quillen, D., He, Z., Julian, R., Hausman, K., Finn, C., Levine, S., Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning. PMLR. 2020.
  2. Hochreiter, S., Younger, A. S., Conwell, P. R., Learning To Learn Using Gradient Descent. Proceedings of the International Conference on Artificial Neural Networks. 2001.
  3. Arnold, S. M. R., Mahajan, P., Datta, D., Bunner, I., Zarkias, K. S., learn2learn: A Library for Meta-Learning Research. KTH Royal Institute of Technology and Research Institutes of Sweden. 2020.
  4. Finn, C., Abbeel, P., Levine, S., Model-agnostic meta-learning for fast adaptation of deep networks. CoRR. 2017.

Continue Reading