Senior Big Data Engineer

Hanoi, Hanoi, VN

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

ActiveFence

Protect your platform with AI safety solutions built to detect harmful content, manage AI risks, and ensure secure, compliant user experiences.

View all jobs at ActiveFence

Apply now Apply later

Description

We are seeking a Senior Big Data Engineer with a strong background in managing structured and unstructured data pipelines, who thrives in a fast-paced AI-focused environment. You will be instrumental in building and scaling our data lake architecture, supporting a system designed to fuel intelligent AI agents for data collection, labeling, and analytical reasoning. This includes integrating vector databases and optimizing for retrieval-augmented generation (RAG) workflows deployed on AWS Bedrock and other AI stacks.

Responsibilities

  • Design and implement scalable ingestion pipelines for structured/unstructured data using AWS and Databricks Unity Catalog.
  • Build and maintain high-throughput ETL/ELT pipelines with Apache Airflow and Databricks.
  • Architect and manage data modeling, storage, and indexing strategies in PostgreSQL and RDS, ensuring compatibility with AI retrieval systems.
  • Integrate and manage vector databases to support fast semantic and embedding-based search in RAG pipelines.
  • Collaborate with AI engineers to ensure seamless compatibility with LangGraph and LangSmith agent systems.
  • Implement robust data validation, lineage, and governance systems using Unity Catalog.
  • Optimize performance across distributed compute environments (Databricks, EC2).
  • Deploy and maintain Lambda-based microservices for scalable, real-time data ingestion and enrichment.

Requirements

  • 5+ years working with big data systems in production environments.
  • Proven expertise with Databricks, Unity Catalog, and Apache Spark.
  • Proficiency in Airflow, AWS stack (Lambda, EC2, RDS), and cloud-based data lake architectures.
  • Strong SQL and database design skills (PostgreSQL preferred).
  • Working knowledge of vector databases (Chroma, Pinecone, FAISS).
  • Solid understanding of data lifecycle management in ML/AI contexts.
  • Bonus: Familiarity with LangGraph, LangSmith, LangChain, or similar agent orchestration tools.

Preferred Qualifications

  • Experience with AI agent pipelines or large-scale ML model support.
  • Emphasis on data observability, security, and lineage tracking.
  • Hands-on with RAG architecture, including vector storage and semantic retrieval.
  • Exposure to AWS Bedrock and model deployment orchestration.

About ActiveFence

ActiveFence is the leading provider of security and safety solutions for online experiences, safeguarding more than 3 billion users, top foundation models, and the world’s largest enterprises and tech platforms every day.

As a trusted ally to major technology firms and Fortune 500 brands that build user-generated and GenAI products, ActiveFence empowers security, AI, and policy teams with low-latency Real-Time Guardrails and a continuous Red Teaming program that pressure-tests systems with adversarial prompts and emerging threat techniques. Powered by deep threat intelligence, unmatched harmful-content detection, and coverage of 117+ languages, ActiveFence enables organizations to deliver engaging and trustworthy experiences at global scale while operating safely and responsibly across all threat landscapes.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0

Tags: Airflow Architecture AWS Big Data Databricks Data pipelines EC2 ELT ETL FAISS Generative AI Lambda LangChain Machine Learning Microservices Model deployment Pinecone Pipelines PostgreSQL RAG Security Spark SQL Unstructured data

Region: Asia/Pacific
Country: Vietnam

More jobs like this