Applied ML Engineer

USA-CA - Promontory B, United States

Broadcom

Broadcom Inc. is a global technology leader that designs, develops and supplies a broad range of semiconductor, enterprise software and security solutions.

View all jobs at Broadcom

Apply now Apply later

Please Note:

1. If you are a first time user, please create your candidate login account before you apply for a job. (Click Sign In > Create Account)

2. If you already have a Candidate Account, please Sign-In before you apply.

Job Description:

The VMware Cloud Foundation (VCF) Division enables organizations worldwide to run their business-critical and modern applications securely, resiliently, and cost-efficiently.

With our flagship VMware Cloud Foundation platform and our industry-leading technologies, including vSphere, vSAN, NSX, VCF Automation, and VCF Operations, Broadcom customers receive the scale and agility of a public cloud with the security and performance of a private cloud. Modern infrastructures, accelerated application innovation, and predictable TCO savings and investment returns are just a few benefits of having a private cloud infrastructure powered by VMware Cloud Foundation.

Together, our bold group of technology professionals with diverse backgrounds – spanning engineering, products, marketing, partners, professional services, and global support services – is focusing on what can be for the largest enterprises, governments, financial services, healthcare, manufacturing and educational institutions of the world.

We seek an experienced ML Engineer to join our team to build a platform to enable Agentic AI development. The ideal candidate will have a strong background in machine learning, data engineering, and DevOps practices, with specific expertise in managing the lifecycle of LLMs.

The Elevator Pitch: Why will you enjoy this new opportunity?

As a senior LLMOps engineer in the AI & Advanced Services organization, you will work with a team of AI researchers, ML Engineers, & product teams to incorporate GenAI capabilities into the VCF technology stack. With the unprecedented progress of AI, substantial Machine Learning applications are possible to automate workflows in VCF and simplify the life of an IT administrator. These applications are now creating exciting research and product opportunities to rethink technology stacks, diverse use cases, and radically simplify infrastructure deployment and operations.


Success in the Role: What performance goals will you work towards completing over the first 6-12 months?

Within your first 6 months

  • Understand the current LLMOps practices, techniques, and tools used to deploy, monitor, and maintain large language models efficiently.

  • Collaborate with data scientists, DevOps engineers, and IT professionals to streamline LLM operations.

  • Design and optimize LLM development lifecycles, including data ingestion, data preparation, prompt engineering, model fine-tuning, and deployment

After 6 months

  • Oversee data preparation and prompt engineering efforts to improve model performance.

  • Develop and maintain model review and governance processes to ensure compliance and quality.

  • Set up and manage model monitoring systems with human feedback loops

  • Optimize LLM pipelines for efficiency, scalability, and risk reduction

The Work: What type of work will you be doing? What assignments, requirements, or skills will you be performing regularly?

  • Constant collaboration: Work closely with data scientists, engineers, and other stakeholders to understand requirements, provide technical guidance, and communicate model performance and limitations

  • Data management and curation: Manage and curate large datasets for model training, evaluation, and testing, ensuring data quality, integrity, and compliance.

  • Model deployment and integration: Deploy trained models to on-premises infrastructure

  • Model serving and inference: Optimize and maintain model serving systems to support real-time inference and batch processing

  • Model monitoring and maintenance: Continuously monitor model performance, identify issues, and perform updates, retraining, or rollbacks as needed

Job Requirements

Required Qualifications:

  • Bachelor's in Computer Science, Data Science, or a related field and 12+ years of experience in full-stack software development, with at least two years focused on GenAI and LLM applications OR Master's Degree and 10+ years of experience.

  • Strong programming skills in Python and experience with ML frameworks such as PyTorch or TensorFlow

  • Experience post-training LLMs for instruction tuning and preference alignment 

  • Familiarity with MLOps platforms (e.g. MLflow, Weights & Biases)

  • Knowledge of vector databases and similarity search techniques

Preferred Qualifications:

  • 3-5 years of experience in machine learning operations, with at least 2 years focused on LLMs

  • Knowledge of LangChain, LlamaIndex or similar frameworks for building LLM applications

  • Experience with model serving (vLLM, Triton, llama.cpp, etc) and API integration for LLMs

  • Understanding of human-in-the-loop feedback systems for LLM evaluation

  • Experience with agentic systems development frameworks such as AutoGen and CrewAI.

Skills and Attributes:

  • Strong problem-solving and analytical skills

  • Excellent communication and collaboration abilities

  • Ability to work in a fast-paced, cross-functional environment

  • Passion for staying current with LLM research and industry trends

  • Strong focus on operational efficiency and scalability

Additional Job Description:

Compensation and Benefits 

The annual base salary range for this position is $127,000 - $225,000.

  

This position is also eligible for a discretionary annual bonus in accordance with relevant plan documents, and equity in accordance with equity plan documents and equity award agreements. 

  

Broadcom offers a competitive and comprehensive benefits package: Medical, dental and vision plans, 401(K) participation including company matching, Employee Stock Purchase Program (ESPP), Employee Assistance Program (EAP), company paid holidays, paid sick leave and vacation time. The company follows all applicable laws for Paid Family Leave and other leaves of absence. 

Broadcom is proud to be an equal opportunity employer.  We will consider qualified applicants without regard to race, color, creed, religion, sex, sexual orientation, gender identity, national origin, citizenship, disability status, medical condition, pregnancy, protected veteran status or any other characteristic protected by federal, state, or local law.  We will also consider qualified applicants with arrest and conviction records consistent with local law.

If you are located outside USA, please be sure to fill out a home address as this will be used for future correspondence.

Apply now Apply later
Job stats:  1  0  0

Tags: APIs Computer Science Data management Data quality DevOps Engineering Generative AI LangChain LLaMA LLMOps LLMs Machine Learning MLFlow MLOps Model deployment Model training Pipelines Prompt engineering Python PyTorch Research Security TensorFlow Testing vLLM Weights & Biases

Perks/benefits: Career development Competitive pay Equity / stock options Health care Medical leave Salary bonus Signing bonus

Region: North America
Country: United States

More jobs like this