Principal Data Engineer

New York, NY, United States

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Apply now Apply later

The Enterprise Corporate Data Team is looking for a Principal Data Engineer, a senior technical leader responsible for architecting the core data infrastructure and platforms that power enterprise-scale AI applications.  Reporting to the VP of Engineering, this role will focus on building systems to generate content tagging, semantic ontology, persona modeling, integrating content metadata with behavioral data to support personalization, audience development, and intelligent content discovery. 
 

The Principal Data Engineer will lead the end-to-end design and implementation of scalable pipelines, platforms and systems that support semantic analysis and Knowledge Graph generation across massive volumes of unstructured data using GeN AI systems. This individual will also co-ordinate with an offshore team of engineers, ensuring consistent delivery, code quality, and alignment with business and technical goals. The ideal candidate will possess an entrepreneurial ethos, an ability to operate in a dynamic environment, and a working knowledge of the current digital media landscape. This role is based in New York City. 
 

Key Responsibilities: 


●    Lead the design and implementation of high-performance data pipelines and infrastructure to support automated generation of semantic ontology and knowledge graph. 
●    Architect scalable data platforms that integrate structured and unstructured data—including behavioral signals, content metadata, and user engagement data—for Gen AI use cases. 
●    Build systems that enable semantic enrichment of content through entity recognition, disambiguation, normalization and deduplication techniques. 
●    Drive the creation and maintenance of flexible ontologies and taxonomies to organize media content for personalization, recommendation, and audience segmentation. 
●    Partner closely with ML engineers and data scientists to deploy and operationalize models for content and audience intelligence. 
●    Oversee and co-ordinate with an offshore engineering team, providing technical guidance, code reviews, and project oversight to ensure timely, high-quality deliverables. 
●    Ensure best practices in data governance, quality, observability, and documentation across all engineering workflows. 
●    Collaborate with stakeholders across product, marketing, and data science to translate business needs into scalable AI data systems. 
●    Well versed in architecting, designing and developing large scale OLTP and OLAP systems. 
●    Experience building and operating streaming systems using messaging systems like Kafka, Pub/sub, SQS etc. 
●    Experience building an RAG system with Google, OpenAI or another Gen AI platform. 
●    Experience building a knowledge graph using Neo4j, Spanner, Neptune or another tool is a plus 


Qualifications: 


●    10+ years of experience in data engineering, with significant experience building large-scale, distributed data systems to support Data analysis, AI/ ML and key business use cases. 
●    Proven expertise in content classification, tagging, and ontology/taxonomy development, especially using NLP and semantic techniques. 
●    Strong coding and data architecture skills using Typescript, Python, SQL, and tools like Apache Spark, Kafka, Airflow, Node Js, and cloud-native platforms (e.g., AWS, GCP, or Azure). 
●    Hands-on experience integrating ML models into production environments for tasks such as entity extraction, text classification, or semantic search. 
●    Deep understanding of working with unstructured data (text, images, video), metadata enrichment, and knowledge graph integration. 
●    Experience managing and mentoring distributed/offshore engineering teams, with a track record of driving execution across time zones. 
●    Excellent communication and collaboration skills, with the ability to bridge technical execution and business strategy. 
 

Preferred Qualifications: 

●    Experience in digital media, publishing, ad tech, or content platforms. 
●    Bachelor’s , Master’s or Ph.D. in Computer Science, Data Engineering, or a related field. 
●    Knowledge of  LLMs and generative AI in applied settings (e.g., content summarization, auto-tagging, retrieval augmentation).                                                                       ●    Working experience with OLAP and OLTP systems is a plus


In accordance with applicable law, Hearst is required to include a reasonable estimate of the compensation for this role if hired in New York City. The reasonable estimate, if hired in New York City, is $325,000-$350,000. Please note this information is specific to those hired in New York City. If this role is open to candidates outside of New York City, the salary range would be aligned to that specific location. A final decision on the successful candidate’s starting salary will be based on a number of permissible, non-discriminatory factors, including but not limited to skills and experience, training, certifications, and education. Hearst provides a competitive benefits package, including medical, dental, vision, disability, and life insurance, 401(k), paid holidays and paid time off, employee assistance programs, and more. 
 

Hearst is one of the nation’s largest global, diversified information, services and media companies.


Hearst has been innovating for more than a century, leading with purpose, integrity and a culture of care, with a mission to inform audiences and improve lives.


The company’s diverse portfolio includes global financial services leader Fitch Group; Hearst Health, a group of medical information and services businesses; Hearst Transportation, which includes CAMP Systems International, a major provider of software-as-a-service solutions for managing maintenance of jets and helicopters; ownership in cable television networks such as A&E, HISTORY, Lifetime and ESPN; 35 television stations; 24 daily and 52 weekly newspapers; digital services businesses; and more than 200 magazines around the world.


Hearst is always moving forward, investing in healthcare solutions to improve patient outcomes and technology that curbs emissions; providing vital analysis, data and software to the global financial services industry; delivering important service and investigative journalism; and inspiring audiences with sports and entertainment programming.


With a commitment to maintaining the highest quality in its products and services, Hearst is dedicated to serving the communities it operates in, both civically and philanthropically.


Hearst is an Equal Employment Opportunity employer.  We do not discriminate in hiring on the basis of race, color, national origin, religion, creed, sex or gender, gender identity, gender expression, sexual orientation, age, physical or mental disability, military or veteran status, or any other characteristic protected by federal, state, or local law.

Apply now Apply later
Job stats:  0  0  0
Category: Engineering Jobs

Tags: Airflow Architecture AWS Azure Classification Computer Science Data analysis Data governance Data pipelines Engineering GCP Generative AI Kafka LLMs Machine Learning ML models Neo4j NLP Node.js OLAP OpenAI Pipelines Python RAG Semantic Analysis Spark SQL Streaming TypeScript Unstructured data

Perks/benefits: Competitive pay Flex hours Flex vacation Health care Insurance

Region: North America
Country: United States

More jobs like this