About

I am a PhD student in computer science at the University of Chicago. I am fortunate to be advised by Dr. Allyson Ettinger.

I also work closely with Dr. Kanishka Misra.

My research focus is on data-efficiency for learning i.e. selecting the best small subsets of data for training which not only maintains accuracy but also covers data diversity. I also work on how new concepts are encoded in LLM during their training.

Prior to joining UChicago, I was a software engineer in the industry, focusing on Distributed File Systems.

I did my Masters in Computer Science from IIT, Kanpur.

Publications

  • Sorting through the noise: Testing robustness of information processing in pre-trained language models [pdf] Accepted at EMNLP 2021
  • Pragmatic competence of pre-trained language models through the lens of discourse connectives [pdf] Accepted at CoNLL 2021

Blog

Occasional posts about data-centric AI, learning dynamics, and model training.