Anikait Singh

I'm a first year Ph.D Student at Stanford AI. I was a student researcher at Google DeepMind Robotics, based in Mountain View. My research is supported by the NSF Graduate Research Fellowship.

Previously, I was at UC Berkeley advised by Sergey Levine, Chelsea Finn, and Aviral Kumar as part of BAIR working on Deep RL and Robot Learning.

Email  /  CV  /  Scholar  /  Twitter  /  Github

profile photo


My primary research interests are in decision-making methods such as reinforcement learning and scaling them up. I believe that a good target of my research would be to produce foundation models for decision-making that utilize large diverse data sources that show good generalization and enable rapid learning.

understanding_rlhf Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Anikait Singh*, Fahim Tajwar*, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar
project page / paper / code

Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy sampling or negative gradients outperform offline, maximum likelihood objectives.

workflow Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Chethan Bhateja*, Derek Guo*, Dibya Ghosh*, Anikait Singh, Manan Tomar,
Quan Vuong, Yevgen Chebotar, Sergey Levine, Aviral Kumar
ICRA, 2024
paper / project page / videos

VPTR is a framework that combines the benefits of pre-training on video data with robotic offline RL approaches that train on diverse robot data, resulting in value functions and policies for manipulation tasks that are robust and generalizable.

workflow Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration
CoRL, 2024
project page / paper / blog

This is an opensource dataset comprised of a large collection of robot embodiments. We study how vision-language models trained on X-Embodiment Datasets can enable efficient adaptation to new robots, tasks, and environments.

workflow RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Google DeepMind Robotics
ICRA, 2023
project page / paper / blog

We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning.

workflow Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints
Anikait Singh*, Aviral Kumar*, Quan Vuong, Yevgen Chebotar, Sergey Levine
NeurIPS, 2023
paper / talk

CQL (ReDS) is an offline RL method that modifies a typical distribution constraint into an approximate support-level constraint via re-weighting to enable efficient learning from heteroskedastic dataset compositions.

workflow Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto*, Yuexiang Zhai*, Anikait Singh, Max Sobol Mark,
Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine
NeurIPS, 2023
project page / paper / video / code

A method that learns a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q-values are at a reasonable scale. This leads to effective online fine-tuning, enabling benefits of offline initializations in online fine-tuning

workflow Pre-Training for Robots: Offline RL Enables Learning from a Handful of Trials
Aviral Kumar*, Anikait Singh*, Frederik Ebert*, Mitsuhiko Nakamoto
Yanlai Yang, Chelsea Finn, Sergey Levine
RSS, 2023
project page / paper / video

PTR is a framework based on offline RL that attempts to effectively learn new tasks by combining pre-training on existing robotic datasets with rapid fine-tuning on a new task, with as few as 10 demonstrations.

workflow When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine
ICLR, 2022
project page / paper

Theoretical paper that characterize the properties of environments that allow offline RL methods to perform better than BC methods, even when only provided with expert data. Additionally, policies trained on sufficiently noisy suboptimal data outperform BC algorithms with expert data, especially on long-horizon problems.

workflow A Workflow for Offline Model-Free Robotic Reinforcement Learning
Aviral Kumar*, Anikait Singh*, Stephen Tian, Chelsea Finn, Sergey Levine
CoRL, 2021, (Oral Presentation)
project page / paper / talk

Our proposed workflow aims to detect overfitting and underfitting in model-free offline RL, and provides guidelines for addressing these issues via policy selection, regularization, and architecture design.

workflow A Mobile Application for Keyword Search in Real-World Scenes
Shrinivas Pundlik, Anikait Singh, Gautam Baghel, Vilte Baliutaviciute, Gang Luo
IEEE Journal of Translational Engineering in Health and Medicine, 2019

System to help visually-impaired patients localize where words are present in a cluttered environment. This system utilizes OCR + Levenshtein Distance along with specialized audio cues and additional assistive features to enable efficient and intuitive search in crowded, diverse environments.


csteaching Undergraduate Student Instructor, CS285 Fall 2022
Undergraduate Student Instructor, CS188 Spring 2022
Undergraduate Student Instructor, CS285 Fall 2021
csteaching Program Coordinator, Mentor, Deep Learning Portal 2024

Website template by Jon Barron.