Principal Engineer - Data Ingestion & AI Pipeline
USD 122K-173K (estimate) Senior-level Full Time
Tasks
- Balance innovation with reliability security compliance scalability and cost
- Build pipeline orchestration monitoring retry and error handling
- Create reusable patterns reference architectures standards and guardrails
- Define data quality checks completeness accuracy duplication stale content sensitive information
- Define engineering approach for program portfolio solutions
- Define enterprise technology stack
- Design incremental ingestion change detection delta processing and reprocessing
- Develop scalable data ingestion pipelines for RAG and AI
- Ensure ingestion compliance for security privacy retention and regulatory requirements
- Establish ingestion standards for lineage freshness versioning access controls auditability
- Extract transform enrich validate chunk classify and load structured and unstructured data
- Implement OCR document intelligence parsing entity extraction and content classification
- Implement transformation patterns for parsing text extraction normalization deduplication enrichment chunking classification metadata generation
- Lead planning definition and design across multiple teams
- Mentor engineers and guide technical direction
- Optimize ingested data for embeddings indexing and retrieval
- Own architecture decisions across systems and domains
- Provide technical leadership across data engineering AI platform cloud and application teams
- Provide technical oversight design reviews and code within domain
- Structure enriched data for LLM reasoning
Perks/Benefits
Skills/Tech-stack
Access Control | Apache Airflow | Apache Spark | Apache Spark SQL | Azure Data | Azure Data Factory | Azure Event | Azure Event Hubs | Azure Fabric | Azure Synapse | Azure Synapse Analytics | Batch Processing | Change detection | Chunking | Content Classification | DBT | Data Factory | Data Governance | Data Lineage | Data Quality | Data Retention | Data Security | Data orchestration | Databricks | Deduplication | Delta Processing | Document processing | ELT | ETL | Embeddings | Entity Extraction | Event Hubs | Incremental processing | Kafka | LLM | Metadata generation | Normalization | OCR | Privacy Compliance | RAG | Reprocessing | Schema inference | Semantic Search | Spark SQL | Streaming | Synapse Analytics | Text extraction | Vector Databases | Vector Search
Regions
Countries
States
Related jobs
-
AWS Bedrock | Agent systems | Anthropic API | Autogen | Azure401k matching program | Adoption Assistance | Development and career growth opportunities | Fertility treatments | Flexible work schedulesSenior-level Contract Full TimeRemote, OR, United States R7h ago
-
Senior Data Engineer USD 90K-110KAWS | Agile | Apache NiFi | Data Architecture | Data ModelingAutonomy | Flexible working hours | Global employee assistance programme | Online training videos | Teambuilding eventsSenior-level Full TimeNew York, United States7h ago
-
Data Engineer USD 74K-133KAgile | Apache Airflow | BigQuery | Cloud Composer | Cloud Data401k retirement plan | Dental insurance | Disability insurance | Flexible time off | Health insuranceMid-level Full TimeLisle, IL, United States R9h ago
-
API Testing | Cypher | Data Quality | DataOps | DevOpsBenefits | Competitive pay | Growth opportunity | Remote work | Travel requiredSenior-level Full TimeReston, VA, United States R11h ago
-
Principal AI/ML Scientist USD 150K-207KAWS | AWS GovCloud | Artificial Intelligence | Azure | Azure AIPublic trust suitabilitySenior-level Full TimeARLINGTON, VA, United States12h ago
-
Principal Engineer - Data Platform USD 221K-387KAWS | Airflow | Apache Hive | Apache Iceberg | Apache ImpalaRemote workSenior-level Full TimeSanta Clara, California, United States R12h ago
-
Agile | Automated testing | CI/CD | Cloud Computing | CrewAIDental insurance | Health insurance | Vision insuranceMid-level Full TimeAshburn, VA, United States14h ago
-
AI Machine Learning Skill 2-FFPP-8904 USD 78K-250KC# | Data Governance | Data Modeling | Data pipeline | Java401k plan with company match | Dental insurance | Diverse inclusive workplace | Employee referral programs | Flexible spending accountsMid-level Full TimeHanover, MD15h ago
-
AWS | AWS SageMaker | Azure | Cloud Pak for Data | Cloud infrastructureAccess to national security mission work | Hybrid work | Travel opportunitiesSenior-level Full TimeUSA-VA-Herndon15h ago
-
AI-assisted software development | AWS | Agentic AI | Azure | Cloud ComputingSenior-level Full TimeUSA-VA-Herndon15h ago
-
Analytics Engineer USD 115K-150KAgile | Azure DevOps | CI/CD | DBT | Data GovernanceAdoption Assistance | Dental insurance | Disability insurance | Educational assistance | Flexible spending accountMid-level Full TimeHouston, Texas | Tulsa, Oklahoma | …16h ago
-
AI Engineer USD 180KAgent Orchestration | Cost Management | Data Pipelines | Distributed Systems | LLM401k | Commuter benefits | Dental insurance | Flexible spending | Health insuranceMid-level Full TimeNew York, New York, United States …16h ago
-
Data Platform & Engineering Specialist USD 100K-130KAWS | Amazon Kinesis | Azure | Azure Event | Azure Event HubsDental insurance | Educational assistance | Flexible spending accounts | Health insurance | Health savings accountsMid-level Full TimeLincoln, Nebraska16h ago
-
AI Architect USD 134K-237KAI Search | AI Security | API Gateway | API Integration | AWS BedrockAdoption Assistance | Dental insurance | Disability insurance | Educational assistance | Flexible spending accountsSenior-level Full TimeHouston, Texas | Tulsa, Oklahoma | …16h ago
-
CV/NLP/Multimodal LLM Machine Learning Engineer Graduate (TikTok-Trust and Safety) - 2026 Start (PhD) USD 136K-246KActive Learning | Computer Vision | Content Classification | Data-Driven Strategy | Data-drivenEntry-level Full TimeSeattle, Washington, United States17h ago
-
Senior Finance Data Engineer / Data Analyst USD 100K-120KDAX | Dashboard Development | Data Modeling | Data Standardization | Data TransformationSenior-level Full TimeAuburn Hills, MI, United States17h ago
-
AWS Glue | AWS Lambda | AWS S3 | Access Control | Data GovernanceCareer growth opportunities | Collaborative and inclusive work environment | Diverse and inclusive culture | Flexible work arrangements | Permanent remote working modelSenior-level Full TimeCanada R1d ago
-
Principal AI Architect Engineer USD 118K-195KAWS | AWS Lambda | Amazon Bedrock | Amazon EC2 | Amazon EKSSenior-level Full TimeNew York, United States1d ago
-
Senior Software Engineer (AI/ML) - Vice President USD 150K-210KAPI Security | Algorithms | Cloud Platforms | Data Structures | DatabricksSenior-level Full Time1 New York Plaza, United States1d ago
-
AI Developer USD 77K-176KAWS | Agentic Workflows | Asynchronous Messaging | Audit Logging | Automated testingDependent care | Disability insurance | Financial benefits | Health insurance | Life insuranceMid-level Full TimeUSA, VA, Arlington (1550 Crystal Dr …1d ago
-
AI Research Engineer USD 100K-150KAblation Studies | Accelerator hardware | Data Quality | Data labeling | Data quality monitoring100 percent remote | Career growth | Full-time employment | W2 employmentMid-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer USD 100K-150KAblation Studies | Accelerator hardware | Agentic Systems | Computer Vision | Data QualityMid-level Full TimeUnited States - Remote R1d ago
-
AVP, Business Insights and Analytics USD 169K-175KAlteryx | Change Management | Data Dictionary | Data Governance | Data Lineage401k | Dental insurance | Incentive Award Eligible | Life insurance | Medical insuranceExecutive-level Full TimeBuilding 400-Whippany Campus, Jefferson Park, United …1d ago
-
Hadoop Big Data Developer USD 100K-150KAWS EMR | Airflow | Apache Atlas | Apache Flink | Apache SparkRemote workSenior-level Full TimeUnited States - Remote R1d ago
-
Hadoop Big Data Developer USD 100K-150KAWS EMR | Airflow | Apache Atlas | Apache Flink | Apache HiveRemote workSenior-level Full TimeUnited States - Remote R1d ago