Member of Technical Staff - Data Infrastructure Engineer (DevOps|SRE|Platform Engineering|MLOps

New York City, New York, United States

Microsoft

Entdecken Sie Microsoft-Produkte und -Dienste für Ihr Zuhause oder Ihr Unternehmen. Microsoft 365, Copilot, Teams, Xbox, Windows, Azure, Surface und mehr kaufen

View all jobs at Microsoft

Apply now Apply later

As Microsoft continues to lead the frontier of artificial intelligence, we are seeking passionate and driven engineers to solve some of the most challenging and impactful AI problems of our time. Our vision is bold: to build intelligent systems across agents, applications, services, and infrastructure — and to make this intelligence universally accessible for consumers, businesses, and developers alike.

 

Microsoft AI (MAI) is looking for an experienced Data Infrastructure Engineer to join the team behind personal AI and Copilot systems. We are building mission-critical platform components that drive data pipelines, enable seamless human-AI interactions, and power the evolution of intelligent systems. This role blends platform engineering, DevOps/SRE practices, and MLOps to support large-scale data workflows and AI model development.

 

You’ll bring technical depth, a passion for automation and observability, fluency in distributed systems, and the creativity to architect solutions that scale. Just as importantly, you’ll bring empathy, a collaborative spirit, and a growth mindset to support a world-class engineering culture.

 

This position is based in New York, NY or Redmond, WA, with an in-office requirement of 3 days per week.

Responsibilities

  • Design, build, and maintain scalable, reliable, and observable data and ML infrastructure that powers mission-critical AI applications.
  • Implement DevOps and SRE best practices, including automated deployments, service monitoring, and incident response.
  • Develop self-service tooling and workflows that streamline developer and researcher productivity.
  • Create robust CI/CD pipelines and automate infrastructure provisioning using Infrastructure as Code (Bicep, Terraform, ARM).
  • Collaborate closely with AI researchers, platform engineers, and application developers to deliver seamless and secure data workflows.
  • Participate in technical design reviews and contribute to maintaining a clean, secure, and well-documented codebase.
  • Proactively identify and resolve bottlenecks and inefficiencies in data pipelines and infrastructure.
  • Embody and promote Microsoft’s culture and values of respect, integrity, accountability, and inclusion.

Qualifications

Required Qualifications:

  • Bachelor’s degree in Computer Science, Mathematics, or a related field AND 4+ years experience in a data infrastructure, DevOps, SRE, or MLOps role supporting high-volume, low-latency data systems
    • OR equivalent experience
  • 3+ years experience managing and scaling distributed systems, from bare-metal to Kubernetes, including deep knowledge across the full stack (UI, middleware, platform services)
  • 2+ years building and deploying containerized applications with Kubernetes and Helm/Kustomize.
  • Proficiency in scripting and automation using languages such as Python, Bash, or PowerShell with Proven experience in automating operational tasks, including health checks, alerting, and observability for data and ML systems.
  • Demonstrated success in troubleshooting and supporting critical production systems with managing CI/CD pipelines and release automation.

 

Preferred Qualifications:

  • Experience with Azure, AWS, or GCP and cloud-native data infrastructure.
  • Hands-on experience with modern data storage and processing technologies, including relational and NoSQL databases, key-value stores, Spark compute engines, distributed file systems such as HDFS and ADLS Gen2, as well as messaging systems like Event Hub, Kafka, and RabbitMQ.
  • Collaboration experience with Data Engineer, Data Scientists, ML Engineers, Networking, and Security teams.
  • Familiarity with modern web stacks: Typescript, Node.js, React, PHP (a plus).
  • Understanding of MLOps principles: model training pipelines, artifact versioning, and experiment tracking.
  • Familiarity with agentic workflows, deep learning, or AI frameworks is an advantage.
  • Practical experience using LLMs (e.g., GPT-based models) in daily workflows — such as automating documentation, code generation, code review, or operational intelligence.
  • Demonstrated understanding of prompt engineering techniques to effectively design, optimize, and evaluate interactions with large language models (LLMs).
  • Ability to resolve complex performance and scalability issues across services and infrastructure layers.
  • Interpersonal and communication skills, with a passion for continuous learning and mentorship.
  • Experience applying LLMs to accelerate DevOps tasks, enhance incident response, or streamline cross-functional collaboration is a strong plus.

 

Data Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

 

Microsoft will accept applications for the role until June 9, 2025.

 

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.  We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

 

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

 

#MicrosoftAI #Copilot

Apply now Apply later

Tags: AWS Azure CI/CD Computer Science Copilot Data pipelines Deep Learning DevOps Distributed Systems Engineering GCP GPT HDFS Helm Kafka Kubernetes LLMs Machine Learning Mathematics ML infrastructure ML models MLOps Model training Node.js NoSQL PHP Pipelines Prompt engineering Python RabbitMQ React Security Spark Terraform TypeScript

Perks/benefits: Career development Health care Medical leave

Region: North America
Country: United States

More jobs like this