This list of out of date, please check my Google Scholar for a complete list of publications.

  • Implicit Under-Parameterization Inhibts Data-Efficient Deep Reinforcement Learning
    Aviral Kumar^, Rishabh Agarwal^, Dibya Ghosh, Sergey Levine [arXiv]
    International Conference on Learning Representations (ICLR), 2021

  • OPAL: Offline Primitive Discovery For Accelerating Reinforcement Learning
    Anurag Ajay, Aviral Kumar, Pulkit Agarwal, Sergey Levine, Ofir Nachum [arXiv]
    International Conference on Learning Representations (ICLR), 2021

  • Conservative Safety Crtics for Exploration
    Homanaga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart, Sergey Levine, Florian Shkruti, Animesh Garg [arXiv]
    International Conference on Learning Representations (ICLR), 2021

  • COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
    Avi Singh, Albert Yu, Jonathan Yang, Jesse Zhang, Aviral Kumar, Sergey Levine
    [arXiv] [website] (October 2020)
    4th Conference on Robot Learning (CoRL), 2020
    In this work, we show how the simple conservative Q-learning (CQL) algorithm for offline RL can allow us effectively leverage prior, task-agnositc robotic interaction datasets to learn complex policies to solve novel downstream tasks, from a wide variety of initial conditions, using only minimal supervision from the downstream task, which is also labeled with only a 0-1 sparse reward. No need for hierarchies, skills or primitives.

  • Conservative Q-Learning for Offline Reinforcement Learning
    Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine
    [arXiv] [website] [talk] (June 2020)
    Advances in Neural Information Processing Systems (NeurIPS), 2020
    In this work we present a novel offline RL algorithm that learns a Q-function that lower-bounds the policy value, by adding specific regularizers to Q-function training. By optimizing the policy against this conservative Q-function, we are able to obtain state-of-the-art performance on a number of hard control tasks for offline RL and Atari games with limited data.

  • DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
    Aviral Kumar, Abhishek Gupta, Sergey Levine [arXiv] [BAIR Blog] (March 2020)
    Advances in Neural Information Processing Systems (NeurIPS) (Spotlight Presentation, 3% acceptance rate), 2020
    In this paper, we ask the following question: Which distribution should be used to train Q-functions? We show that the choice of data distribution affects the performance of deep RL algorithms. Contrary to what is commonly believed, even on-policy data collection may fail to correct errors during Q-learning. We then devise a method for optimizing data distributions based on maximizing error reduction.

  • Model Inversion Networks for Model-Based Optimization
    Aviral Kumar, Sergey Levine [arXiv] (December 2019)
    Advances in Neural Information Processing Systems (NeurIPS), 2020
    Model-based optimization problems or black-box optimization problems appear in several scenarios, such as protein design, or aircraft design or contextual bandits. In this work, we design a new model-based optimization method for extremely high-dimensional problems (>1000 dims) which is robust to going off the manifold of valid inputs.

  • One Solution is Not All You Need: Few-Shot Extrapolation Via Structured MaxEnt RL
    Saurabh Kumar, Aviral Kumar, Sergey Levine, Chelsea Finn [arXiv] (October 2020)
    Advances in Neural Information Processing Systems (NeurIPS), 2020
    In this paper, we propose to utilize “structured” diversity to obtain multiple different ways of solving a given task, allowing us to revert to a different way of solvcing a task for environment perturbations ina few-shot extrapolation scenario.

  • Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
    Aviral Kumar*, Justin Fu*, George Tucker, Sergey Levine
    Advances in Neural Information Processing Systems (NeurIPS), 2019
    [Paper] [Project Page] [BAIR Blog] \ In this paper, we attribute the problem of optimization of Q-functions in offline reinforcement learning settings to querying the Q-function at out-of-distribution action inputs during training. We propose a simple fix: constrain the support of the learned policy to lie within the support of the behavior policy and use this to perform offline actor-critic or Q-learning.

  • Diagnosing Bottlenecks in Deep Q-Learning Algorithms
    Justin Fu*, Aviral Kumar*, Matthew Soh, Sergey Levine
    36th International Conference on Machine Learning (ICML) 2019 [Paper] (* Equal Contribution)
    In this paper, we provide an empirical analysis of some of the design factors in deep Q-learning algorithms including the effects of the capacity of the function approximator, the effect of sampling error and the effect of training distribution. We also devise a set of toy gridworld environments that test these properties and can be used for prototyping new algorithms.

  • Graph Normalizing Flows
    Jenny Liu*, Aviral Kumar*, Jimmy Ba, Jamie Kiros, Kevin Swersky
    Advances in Neural Information Processing Systems (NeurIPS), 2019
    [Paper] [Project Page] (* Equal Contribution)

  • Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings
    Aviral Kumar, Sunita Sarawagi, Ujjwal Jain
    35th International Conference on Machine Learning (ICML) 2018 [Main Paper]

  • GRevnet: Improving Graph Neural Networks with Reversible Computation
    Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky
    NeuRIPS 2018 Relational Representation Learning Workshop [Paper]

  • Feudal Learning for Large Discrete Action Spaces with Recursive Substructure
    Aviral Kumar, Kevin Swersky, Geoffrey Hinton
    Hierarchical Reinforcement Learning Workshop, NIPS 2017
    [Main Paper]


  • Offline Reinforcement Learning: Tutorial, Review and Perspectives on Open Problems
    Sergey Levine, Aviral Kumar, George Tucker, Justin Fu
    [arXiv] (May 2020)
    In this tutorial paper, we present an introduction to offline RL, present a survey of classical and current deep offline RL approaches and discuss open problems and our perspectives on them.


  • Datasets for Data-Driven Reinforcement Learning
    Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine
    [arXiv] [Benchmark] (April 2020)
    At RL for Real Life Virtual Conference, 2020.
    In this paper, we introduce datasets and benchmarks for offline RL. Most of the prior offline RL research was carried out on datasets generated from RL policies or replay buffers obtained by training online RL agents. In this paper we introduce a suite of more realistic offline RL datasets, of the sort that is likely to be encountered in reality to stress test offline RL methods.

  • Reward-Condtioned Policies
    Aviral Kumar, Xue Bin Peng, Sergey Levine [arXiv] (December 2019)
    In this paper, we pose reinforcement learning as a supervised learning problem that requires predicting the distribution of actions conditioned on the return of the policy. This allows us to reuse suboptimal trajectories for training as these trajectories are optimal conditioned on the return obtained by the policy that generated those trajectories.

  • Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
    Xue Bin Peng, Aviral Kumar, Grace Zhang, Sergey Levine [Project Page] (October 2019) In this paper, we propose a simple off-policy (and offline) reinforcement learning algorithm that use a simple supervised regression approach to optimize the policy in reinforcment learning.

  • The Reach-Avoid Problem for Constant-Rate Multi-mode systems
    Krishna S, Aviral Kumar, Fabio Somenzi, Behrouz Touri, Ashutosh Trivedi
    (Alphabetically Sorted By Last Name)
    15th International Symposium on Automated Technologies for Verification and Analysis, 2017 [Springer Link]

  • Challenges and Tool Implementation of Hybrid Rapidly-Exploring Random Trees
    Stanley Bak, Sergiy Bogomolov, Thomas A Henzinger, Aviral Kumar\
    (Alphabetically Sorted By Last Name)
    10th Numerical Software Verification, CAV 2017
    [Springer Link]