Data Engineer
New York, New York, United States
21.co Technologies
About the Company
At 21.co Technologies, our mission centers on building scalable bridges into the world of cryptocurrency. By creating DeFi accessibility through traditional financial standards, we bring ourselves one step closer to the equitable financial future we all believe in.
About the Role
We are seeking a highly motivated and skilled Data Engineer with a focus on MLOps and Large Language Models (LLMs) to join our team and help us design, build, and maintain robust data pipelines and infrastructure. As a Data Engineer with expertise in LLMs, you will be responsible for ensuring data is accessible, reliable, and optimally structured to support analytics, machine learning, and LLM-driven applications. You will work on cutting-edge technologies and collaborate closely with cross-functional teams, enabling you to make a significant impact on our data-driven and AI-focused strategies.
This role integrates core data engineering principles with MLOps practices to support the full lifecycle of LLM-driven applications, from data preparation to production monitoring. This role offers opportunities for growth, innovation, and learning in a dynamic and fast-paced environment.
Our culture values diversity, communication, collaboration, and a shared passion for using data and AI to drive business outcomes.
Responsibilities and Scope
- Design and maintain scalable data pipelines tailored to LLM requirements, including preprocessing unstructured text data from various sources, implementing chunking strategies, and optimizing embedding generation for vector databases.
- Build and manage data infrastructure, including data warehouses, data lakes, and streaming solutions, specifically optimized for LLM workflows.
- Deploy LLMs into production environments using containerization (Docker) and orchestration tools (Kubernetes).
- Automate CI/CD pipelines for model versioning, A/B testing, and rollback procedures, ensuring seamless updates to fine-tuned models.
- Optimize data systems for performance, reliability, and scalability, particularly for real-time inference for applications like chatbots or document analysis.
- Implement MLOps-driven model deployment and monitoring, tracking key metrics such as inference latency, token usage costs, and output quality drift.
- Manage vector databases (e.g., Qdrant, Pinecone, FAISS) and design indexing strategies for Retrieval-Augmented Generation (RAG) architectures.
- Collaborate with data scientists/analysts, and other stakeholders to understand data and LLM requirements and deliver solutions.
- Implement data governance best practices, ensuring data quality, security, and compliance, including lineage tracking for text sources and redaction pipelines for PII detection.
- Monitor and troubleshoot data pipelines and LLM deployments, resolving issues in a timely manner.
- Create and maintain documentation for all data-related processes, procedures, and workflows, including LLM-specific pipelines and deployments.
- Research and stay up-to-date with the latest trends, technologies, and best practices in data engineering, MLOps, and LLM technologies.
What You Will Need To Be Great In This Role
- 5+ years of experience as a Data Engineer with 2+ years focused on MLOps.
- Strong proficiency in Python, SQL, and data orchestration tools (e.g., Airflow).
- Experience with cloud platforms like AWS (SageMaker), Google Cloud Platform (Vertex AI), or Azure Machine Learning for managed LLM deployments.
- Familiarity with data warehouse solutions such as Snowflake or BigQuery.
- Experience with big data technologies like Spark, Hadoop, or Kafka.
- Understanding of data modeling and schema design (e.g., dimensional modeling).
- Proficiency with version control systems like Git.
- Excellent problem-solving and debugging skills.
- Strong communication skills and the ability to work collaboratively with cross-functional teams.
- Experience working in Agile development environments.
- Hands-on experience with Hugging Face Transformers, LangChain for prompt engineering, and LlamaIndex for document indexing.
- Portfolio demonstrating deployed LLM applications with measurable performance metrics.
Our Stack
- Languages: Python, SQL
- Tools: Apache Airflow, Kafka (MSK, RedPanda), LangChain, Langsmith
- Cloud Platforms: AWS (S3, Databricks)
- Databases: Postgres, MongoDB, Vector Databases (Qdrant)
- Version Control: Git
Preferred
- Experience with containerization tools like Docker and orchestration platforms like Kubernetes.
- Familiarity with modern data streaming tools (e.g., Kafka, Kinesis).
- Familiarity with Natural Language Processing (NLP) / LLM.
- Familiarity with chunking & data transformation for LLMs.
- Familiarity with Vector Databases / Embedding Stores.
- Hands-on experience with real-time analytics or machine learning pipelines.
- Exposure to or interest in data visualization tools like Tableau, Looker, or Streamlit.
- Experience with specialized LLM techniques.
- Implementation of OpenTelemetry for distributed tracing and integration with Betterstack/Grafana dashboards.
This role is based in New York City and will be expected to work from our New York office Monday - Wednesday.
Compensation (NYC Only)
Pursuant to Section 8-102 of Title 8 of the New York City administrative code, the base salary range for this role is $140,000.00 - $180,000.00. Total compensation packages are based on various factors unique to each candidate, including but not limited to skill set, years and depth of experience, certifications, and specific office location.
Tags: A/B testing Agile Airflow Architecture AWS Azure Big Data BigQuery Chatbots CI/CD Databricks Data governance Data pipelines Data quality Data visualization Data warehouse Docker Engineering FAISS GCP Git Google Cloud Grafana Hadoop Kafka Kinesis Kubernetes LangChain LLMs Looker Machine Learning MLOps Model deployment MongoDB NLP Pinecone Pipelines PostgreSQL Prompt engineering Python RAG Research SageMaker Security Snowflake Spark SQL Streaming Streamlit Tableau Testing Transformers Vertex AI
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.