Senior Principal AI Product Engineer – Data
5000 - Vertex US - Boston, United States
Vertex Pharmaceuticals
Vertex Pharmaceuticals invests in scientific innovation to create transformative medicines for people with serious diseases.Job Description
General Summary:
At Vertex, we are pioneering the use of large language models and generative AI to create transformative solutions across our organization. Our corporate data science team is at the forefront of that effort, and we are looking for a Senior AI Product Engineer – Data to build the foundations for the next generation of human augmented AI products.
The ideal candidate brings strong expertise in data engineering for AI products and is well-versed in the latest Large Language Model (LLM) technologies including Retrieval-Augmented Generation (RAG) and unstructured knowledge bases. We are seeking engineers with strong experience collaborating with data scientists to prepare and organize data for AI solutions, and to help transition these solutions from pilot to production.
You will work on a highly collaborative, centralized team of data scientists, product engineers, and strategists that drive value and impact for our highest priority business needs. You will work side-by-side with internal partners across clinical, commercial, manufacturing and general and administrative areas to develop creative solutions that contribute meaningfully to our business and patients.
You will develop AI focused data products for managing text document libraries including automation of NLP embeddings into vector database assets, automation of RAG procedures, and deployment into a full data pipeline in collaboration with UI developers.
Key Duties and Responsibilities:
- Identify, evaluate, and integrate diverse data sources into data products for the Generative AI program
- Develop and optimize data processing workflows for chunking, indexing, and vectorization for both text and non-text data sources
- Benchmark and implement various vector stores, embedding techniques, and retrieval methods
- Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search)
- Implement and maintain auto-tagging systems and data preparation processes for LLMs
- Integrate and optimize workflows using various vector store technologies
- Write clean, maintainable data pipelines that generate AI and data science-ready outputs
- Collaborate with data scientists and front-end engineers to design scalable AI-driven solutions
- Communicate the technical approaches and associated benefits or drawbacks to multiple stakeholders via oral presentations or written documentation
- Provide engineering oversight across projects and support data scientists and engineers in optimizing approaches
- Continually scan the external environment for and bring in novel technological innovations that can be applied to our projects
- Write clean, maintainable data pipelines that generate data science-ready outputs
Knowledge and Skills:
- Deep understanding of modern data architectures incorporating AI/ML components including LLMs from vendor platforms such as AWS, Snowflake, Azure, and Databricks
- Expertise in SQL, Python and related tools
- Experience with RAG frameworks, knowledge base construction, and vector store technologies and their applications in AI
- Experience with automation of quasi-experimental design techniques, such as A/B testing, to measure solution impact
- Experience with data cleaning, tagging, and annotation processes
- Experience with embedding techniques, similarity search algorithms, and information retrieval concepts
- Deep understanding of data privacy standards and ethics
Education and Experience:
- Bachelor’s degree in mathematics, computer science, or a related field; advanced degree is a plus
- 6+ years of experience in data engineering, preferably in AI/ML contexts, or a related field or the equivalent combination of education and experience
- Experience in defining and developing data products
- Experience with data quality, cleaning and masking techniques
- Experience with and understanding of data science concepts
- Experience in organizing and incorporating complex systems requirements into product features and effectively prioritizing features
Flex Designation:
Hybrid-Eligible Or On-Site EligibleFlex Eligibility Status:
In this Hybrid-Eligible role, you can choose to be designated as:
1. Hybrid: work remotely up to two days per week; or select
2. On-Site: work five days per week on-site with ad hoc flexibility.
Note: The Flex status for this position is subject to Vertex’s Policy on Flex @ Vertex Program and may be changed at any time.
Company Information
Vertex is a global biotechnology company that invests in scientific innovation.
Vertex is committed to equal employment opportunity and non-discrimination for all employees and qualified applicants without regard to a person's race, color, sex, gender identity or expression, age, religion, national origin, ancestry, ethnicity, disability, veteran status, genetic information, sexual orientation, marital status, or any characteristic protected under applicable law. Vertex is an E-Verify Employer in the United States. Vertex will make reasonable accommodations for qualified individuals with known disabilities, in accordance with applicable law.
Any applicant requiring an accommodation in connection with the hiring process and/or to perform the essential functions of the position for which the applicant has applied should make a request to the recruiter or hiring manager, or contact Talent Acquisition at ApplicationAssistance@vrtx.com
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: A/B testing Architecture AWS Azure Computer Science Databricks Data pipelines Data quality Engineering Generative AI LLMs Machine Learning Mathematics NLP Pipelines Privacy Python RAG Snowflake SQL Testing Vertex AI
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.