Senior AI Data Engineer
Colombia
Lean Tech
Lean Solutions Group is a top workforce optimization company. Explore our offshore and nearshore staffing solutions to transform your business operations.
Company Overview:Lean Tech is a rapidly expanding organization situated in Medellín, Colombia. We pride ourselves on possessing one of the most influential networks within software development and IT services for the entertainment, financial, and logistics sectors. Our corporate projections offer many opportunities for professionals to elevate their careers and experience substantial growth. Joining our team means engaging with expansive engineering teams across Latin America and the United States, contributing to cutting-edge developments in multiple industries.As a Gen AI Engineer / Data Engineer, you will play a vital role in building and managing data ingestion pipelines and Gen AI infrastructure that powers innovative, intelligent systems.
Position Title: Senior AI Data Engineer
Location: Remote - ColombiaWhat you will be doing:The Gen AI Engineer Data Engineer, you will be responsible for designing and owning robust ingestion pipelines and supporting large-scale Gen AI workflows. You will be deeply involved in the preparation, parsing, and processing of unstructured data, ensuring that information is ready for downstream use in LLM-powered applications. Your responsibilities will include:
- LData Ingestion Ownership: Own and maintain all ingestion pipelines from various document formats including PDF, PowerPoint (PPTX), Word (DOCX), Excel (XLSX), TXT, and Markdown (MD).
- Distributed Processing: Leverage PySpark to efficiently process large-scale datasets in distributed environments.
- LLM Frameworks: Utilize Gen AI tools such as LangChain, LlamaParser, docling, OpenAI, and Hugging Face for tasks like document chunking, embedding, and indexing.
- Vector Database Management: Work with vector databases such as Databricks Vector Search, Pinecone, and Azure AI Search to store and retrieve high-dimensional data for LLM use.
- Workflow Management: Design and manage Databricks workflows and scheduled jobs for automated pipeline execution.
- Client Interaction: Serve as a client-facing engineer, ensuring ingestion pipelines meet business needs and perform reliably in production environments.
Requirements & QualificationsTo excel in this role, you should possess:
- Proven experience in processing unstructured data from formats like PDF, DOCX, PPTX, XLSX, TXT, and MD
- Strong proficiency in PySpark for distributed data processing
- Experience with Gen AI/LLM tools including LangChain, LlamaParser, docking, OpenAI, and Hugging Face
- Solid understanding of chunking, embedding, and indexing techniques
- Experience with vector databases such as Databricks Vector Search, Pinecone, and Azure AI Search
- Hands-on experience managing Databricks workflows and scheduled jobs
- Ability to interface with clients, understand requirements, and deliver tailored solutions
Desired Skills:
- Experience in deploying Gen AI pipelines in production environments
- Familiarity with optimization of vector search for LLM applications
- Understanding of retrieval-augmented generation (RAG) architectures
- Knowledge of data governance and access control within enterprise data platforms
Why you will love Lean Tech:
- Join a powerful tech workforce and help us change the world through technology
- Professional development opportunities with international customers
- Collaborative work environment
- Career path and mentorship programs that will lead to new levels.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Job stats:
0
0
0
Categories:
Deep Learning Jobs
Engineering Jobs
Tags: Architecture Azure Databricks Data governance Engineering Excel Generative AI LangChain LLMs ML infrastructure OpenAI Pinecone Pipelines PySpark RAG Unstructured data
Perks/benefits: Career development Startup environment
Region:
South America
Country:
Colombia
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
BI Developer jobsData Engineer II jobsStaff Data Scientist jobsPrincipal Data Engineer jobsSr. Data Engineer jobsPrincipal Software Engineer jobsStaff Machine Learning Engineer jobsData Science Manager jobsData Manager jobsData Science Intern jobsSoftware Engineer II jobsDevOps Engineer jobsBusiness Intelligence Analyst jobsJunior Data Analyst jobsData Analyst Intern jobsData Specialist jobsBusiness Data Analyst jobsLead Data Analyst jobsStaff Software Engineer jobsSr. Data Scientist jobsSenior Backend Engineer jobsData Governance Analyst jobsAI/ML Engineer jobsData Engineer III jobsResearch Scientist jobs
Consulting jobsAirflow jobsMLOps jobsOpen Source jobsKPIs jobsEconomics jobsJavaScript jobsLinux jobsKafka jobsTerraform jobsNoSQL jobsData Warehousing jobsGoogle Cloud jobsRDBMS jobsComputer Vision jobsGitHub jobsPostgreSQL jobsScikit-learn jobsR&D jobsPhysics jobsStreaming jobsData warehouse jobsBanking jobsHadoop jobsdbt jobs
Scala jobsLooker jobsClassification jobsPandas jobsBigQuery jobsOracle jobsRAG jobsReact jobsCX jobsScrum jobsPySpark jobsPrompt engineering jobsDistributed Systems jobsIndustrial jobsELT jobsJira jobsGPT jobsRedshift jobsMicroservices jobsRobotics jobsLangChain jobsTypeScript jobsSAS jobsOpenAI jobsJenkins jobs