Senior Machine Learning Engineer - CoreAI

Redmond, Washington, United States

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Microsoft

Entdecken Sie Microsoft-Produkte und -Dienste für Ihr Zuhause oder Ihr Unternehmen. Microsoft 365, Copilot, Teams, Xbox, Windows, Azure, Surface und mehr kaufen

View all jobs at Microsoft

Apply now Apply later

The Microsoft CoreAI Post-Training team is dedicated to advancing post-training methods for both OpenAI and open-source models. Their work encompasses continual pre-training, large-scale deep reinforcement learning running on extensive GPU resources, and significant efforts to curate and synthesize training data. In addition, the team employs various fine-tuning approaches to support both research and product development.  

 

The team also develops advanced AI technologies that integrate language and multi-modality for a range of Microsoft products. The team is particularly active in developing code-specific models, including those used in Github Copilot and Visual Studio Code, such as code completion model and the software engineering (SWE) agent models.   

 

The team has also produced publications as by-products, including work such as LoRA, DeBerTa, Oscar, Rho-1, Florence, and the open-source Phi models. 

  

We are looking for a Senior Machine Learning Engineer - CoreAI with significant experience in large-scale model training, data curation, and hands-on coding, ideally from leading research labs. You will develop LLMs, SLMs, multimodal, and coding models using both proprietary and open-source frameworks. Key responsibilities include improving model quality and training efficiency through advanced techniques and data strategies, and managing the full pipeline from data ingestion, evaluation, to inference. 

 

Our team values startup-style efficiency and practical problem-solving. We are seeking a curious, adaptable problem-solver who thrives on continuous learning, embraces changing priorities, and is motivated by creating meaningful impact. Candidates must be self-driven, able to write high-quality code and debug complex systems, document their work clearly, and demonstrate solid experience in shipping ML systems. The ability to quickly translate ideas into working code for rapid experimentation would be a plus. You may include information about any individual who can serve as your referral in your application.  

 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 

  

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day. 

Responsibilities

Core Qualifications & Responsibilities 

  • Implement large-scale model training, especially with LLMs, SLMs, multimodal, or code-specific models.
  • Develop robust evaluation frameworks to assess model performance, conduct systematic benchmarking, and address identified weaknesses while ensuring compliance with customer standards.
  • Write efficient, production-quality code and debug complex distributed systems.
  • Build and maintain internal tools to streamline training and evaluation workflows and automate repetitive tasks within secure development environments. 

Research & Innovation 

  • Contribute to or build on existing innovations like technical report of the well-known models.
  • Help develop models powering tools like GitHub Copilot, Cursor, and VS Code suggestions. 

Other: 

Qualifications

Required/Minimum Qualifications

  • Doctorate in relevant field
    • OR equivalent experience.
  • 2+ years of experience in large-scale LLM - especially on finetuning LLMs, SLMs, multimodal, or code-specific models
  • 2+ years of coding and debugging skills in Python and experience with ML frameworks such as PyTorch or Triton 
  • 2+ years of expertise in experiments and analysis - design experiments and create high quality evaluations that push model to solve real usage patterns
  • 2+ years of software engineering skills with the ability to write efficient, maintainable, production-quality code and debug complex distributed systems
  • 2+ years of experience with cloud platforms and ML infrastructure - demonstrated proficiency in building and maintaining ML production pipelines

Other Requirements: 

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: 

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred/Additional Qualifications

  • Proven track record of impactful research, preferably at leading research labs, with published work or real-world deployments
  • Extensive experience with foundation models including advanced reinforcement learning or inference-time search techniques, developing benchmarks, model optimizations under complex setups
  • Hands-on experience with inference optimization, including quantization, distillation, and speculative decoding
  • Proficiency in programming tools/containerization (Docker / Kubernetes)
  • Demonstrated ability to work in cross-functional teams and collaborate effectively with researchers, product managers, and other engineers to deliver complex ML solutions
  • Startup-style mindset, be agile, solution-oriented, and able to operate with minimal overhead
  • Self-driven and organized with the ability to take ownership of projects and document findings clearly and effectively
  • AI-forward approach with a demonstrated willingness to incorporate AI tools in day-to-day work to enhance productivity and innovation 

 

Research Sciences IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft posts positions for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled. 

 

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

 

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

Apply now Apply later
Job stats:  0  0  0

Tags: Agile Copilot Deep Learning Distributed Systems Docker Engineering GitHub GPU Kubernetes LLMs LoRA Machine Learning ML infrastructure Model training OpenAI Open Source Pipelines Python PyTorch Reinforcement Learning Research Security

Perks/benefits: Career development Medical leave Startup environment

Region: North America
Country: United States

More jobs like this