Welcome
I am a Research Scientist at Meta working on research and development of efficient optimization algorithms, large batch training and distributed training methods for pre-training large language models.
My research interests lie in Large Scale Machine Learning, Optimization Algorithms, Distributed Learning, and applications to real-world problems in NLP, Ranking & Recommender Systems and Information Retrieval. In the past, I have also worked on scalable parameter estimation techniques for bayesian models. I obtained my PhD in Computer Science from UC Santa Cruz working with Prof. S.V.N. Vishwanathan on hybrid-parallel and de-centralized stochastic optimization algorithms for large-scale machine learning models. Before that I received my Masters in Computer Science from Georgia Tech.
Internship opportunities in our group: We have exciting opportunities for PhD student interns to work on projects related to optimization, large-scale training of deep learning models, LLMs. If you are interested and have strong research background with hands-on implementation experience please get in touch.
News
| [Apr 2026] | Our work on Adaptive Batch Sizes using Optimizer Dependent Gradient Noise Scales Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent, is accepted to ICML 2026. |
| [Apr 2026] | Code for our work on GPA (Generalized Primal Averaging) our new optimizer for LLM training is available publicly. |
| [Jan 2026] | Our work on Adaptive Batch Sizes using Optimizer Dependent Gradient Noise Scales Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent, is available on ArXiv. |
| [Dec 2025] | Our work on Generalized Primal Averaging (GPA) Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs, is available on ArXiv. |
| [Jan 2025] | Our work on Memory Efficient Sharpness Aware Minimizer nuSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints is accepted to Transactions on Machine Learning Research (TMLR) 2025. |
| [Dec 2024] | Our work on pre-training LLMs on AWS Trainium HLAT: High-quality Large Language Model Pre-trained on AWS Trainium is accepted to IEEE Big Data 2024. |
| [May 2024] | Three of our recent works MADA: Meta-Adaptive Optimizers through hyper-gradient Descent, EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence and Variance-reduced Zero Order Optimization for LLM Fine-tuning got accepted to ICML 2024. |
| [Apr 2024] | Pre-print of our paper, HLAT: High-quality Large Language Model Pre-trained on AWS Trainium, is available on ArXiv. |
| [Apr 2024] | Pre-print of our paper, EMC2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence, is available on ArXiv. |
| [Apr 2024] | Pre-print of our paper, Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models, is available on ArXiv. |
| [Jan 2024] | Our paper, Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate, has been accepted to AISTATS, 2024. |
| [Jan 2024] | Pre-print of our paper, MADA: Meta-Adaptive Optimizers through hyper-gradient Descent, is available on ArXiv. |
| [Jan 2024] | Elevated to grade of Senior IEEE member. |
| [Feb 2023] | Updated pre-print version of our work, Contractive error feedback for gradient compression, is available on ArXiv. |
| [Jul 2020] | Received certificate of appreciation for contributions as a reviewer for ICML 2020 to be hosted in Vienna. |
| [Apr 2020] | Joined as an Applied Scientist at Amazon. |
| [Apr 2020] | Pre-print version of our work on Scalable Factorization Machines, DS-FACTO: Doubly Separable Factorization Machines, is available on ArXiv. |
| [Jan 2020] | Delivered a talk on Scaling Multinomial Logistic Regression through Hybrid-Parallelism at Fiddler.ai, Palo Alto. |
| [Jan 2020] | Delivered a talk on work done during my PhD thesis Hybrid-Parallel Parameter Estimation for Bayesian and Frequentist Models at IBM Research, Almaden. |
| [Dec 2019] | Defended my PhD dissertation titled Hybrid-Parallel Parameter Estimation for Bayesian and Frequentist Models. |
| [Apr 2019] | Our paper Scaling Multinomial Logistic Regression via Hybrid Parallelism, has been accepted to KDD, 2019. as a Oral Presentation (9.16% acceptance rate). |
| [Dec 2018] | Our paper on scaling inference for mixture of exponential family models titled Extreme Stochastic Variational Inference: Distributed and Asynchronous, has been accepted to AISTATS, 2019. |
| [May 2018] | Invited to attend TRIPODS Madison summer school 2018 on "Fundamentals of Data Analysis, at University of Wisconsin, Madison. |
| [May 2018] | Received NSF travel award to MLSE 2018. Invited to present a poster at the CMU-Georgia Tech Symposium on Machine Learning in Science and Engineering, at CMU, Pittsburgh. |
| [May 2015] | Attending AT&T Machine Learning Summit hosted by AT&T Research, New York. The summit will feature talks and roundtable discussions on Applications in Intelligent Systems, Big Data, and Security. |
| [Sep 2014] | Our work titled Ranking via Robust Binary Classification, got accepted to NeurIPS, 2014 (19.79% acceptance rate). |