Welcome

I am a Research Scientist at Meta working on research and development of efficient optimization algorithms, large batch training and distributed training methods for pre-training large language models.

My research interests lie in Large Scale Machine Learning, Optimization Algorithms, Distributed Learning, and applications to real-world problems in NLP, Ranking & Recommender Systems and Information Retrieval. In the past, I have also worked on scalable parameter estimation techniques for bayesian models. I obtained my PhD in Computer Science from UC Santa Cruz working with Prof. S.V.N. Vishwanathan on hybrid-parallel and de-centralized stochastic optimization algorithms for large-scale machine learning models. Before that I received my Masters in Computer Science from Georgia Tech.

Internship opportunities in our group: We have exciting opportunities for PhD student interns to work on projects related to optimization, large-scale training of deep learning models, LLMs. If you are interested and have strong research background with hands-on implementation experience please get in touch.

News

[Apr 2026]	Our work on Adaptive Batch Sizes using Optimizer Dependent Gradient Noise Scales Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent, is accepted to ICML 2026.
[Apr 2026]	Code for our work on GPA (Generalized Primal Averaging) our new optimizer for LLM training is available publicly.
[Jan 2026]	Our work on Adaptive Batch Sizes using Optimizer Dependent Gradient Noise Scales Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent, is available on ArXiv.
[Dec 2025]	Our work on Generalized Primal Averaging (GPA) Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs, is available on ArXiv.
[Jan 2025]	Our work on Memory Efficient Sharpness Aware Minimizer nuSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints is accepted to Transactions on Machine Learning Research (TMLR) 2025.
[Dec 2024]	Our work on pre-training LLMs on AWS Trainium HLAT: High-quality Large Language Model Pre-trained on AWS Trainium is accepted to IEEE Big Data 2024.
[May 2024]	Three of our recent works MADA: Meta-Adaptive Optimizers through hyper-gradient Descent, EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence and Variance-reduced Zero Order Optimization for LLM Fine-tuning got accepted to ICML 2024.
[Apr 2024]	Pre-print of our paper, HLAT: High-quality Large Language Model Pre-trained on AWS Trainium, is available on ArXiv.
[Apr 2024]	Pre-print of our paper, EMC2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence, is available on ArXiv.
[Apr 2024]	Pre-print of our paper, Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models, is available on ArXiv.
[Jan 2024]	Our paper, Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate, has been accepted to AISTATS, 2024.
[Jan 2024]	Pre-print of our paper, MADA: Meta-Adaptive Optimizers through hyper-gradient Descent, is available on ArXiv.
[Jan 2024]	Elevated to grade of Senior IEEE member.
[Feb 2023]	Updated pre-print version of our work, Contractive error feedback for gradient compression, is available on ArXiv.
[Jul 2020]	Received certificate of appreciation for contributions as a reviewer for ICML 2020 to be hosted in Vienna.
[Apr 2020]	Joined as an Applied Scientist at Amazon.
[Apr 2020]	Pre-print version of our work on Scalable Factorization Machines, DS-FACTO: Doubly Separable Factorization Machines, is available on ArXiv.
[Jan 2020]	Delivered a talk on Scaling Multinomial Logistic Regression through Hybrid-Parallelism at Fiddler.ai, Palo Alto.
[Jan 2020]	Delivered a talk on work done during my PhD thesis Hybrid-Parallel Parameter Estimation for Bayesian and Frequentist Models at IBM Research, Almaden.
[Dec 2019]	Defended my PhD dissertation titled Hybrid-Parallel Parameter Estimation for Bayesian and Frequentist Models.
[Apr 2019]	Our paper Scaling Multinomial Logistic Regression via Hybrid Parallelism, has been accepted to KDD, 2019. as a Oral Presentation (9.16% acceptance rate).
[Dec 2018]	Our paper on scaling inference for mixture of exponential family models titled Extreme Stochastic Variational Inference: Distributed and Asynchronous, has been accepted to AISTATS, 2019.
[May 2018]	Invited to attend TRIPODS Madison summer school 2018 on "Fundamentals of Data Analysis, at University of Wisconsin, Madison.
[May 2018]	Received NSF travel award to MLSE 2018. Invited to present a poster at the CMU-Georgia Tech Symposium on Machine Learning in Science and Engineering, at CMU, Pittsburgh.
[May 2015]	Attending AT&T Machine Learning Summit hosted by AT&T Research, New York. The summit will feature talks and roundtable discussions on Applications in Intelligent Systems, Big Data, and Security.
[Sep 2014]	Our work titled Ranking via Robust Binary Classification, got accepted to NeurIPS, 2014 (19.79% acceptance rate).

Parameswaran Raman

News