Senior AI Data Engineer

Colombia

Lean Tech

Lean Solutions Group is a top workforce optimization company. Explore our offshore and nearshore staffing solutions to transform your business operations.

View all jobs at Lean Tech

Apply now Apply later

Company Overview:Lean Tech is a rapidly expanding organization situated in Medellín, Colombia. We pride ourselves on possessing one of the most influential networks within software development and IT services for the entertainment, financial, and logistics sectors. Our corporate projections offer many opportunities for professionals to elevate their careers and experience substantial growth. Joining our team means engaging with expansive engineering teams across Latin America and the United States, contributing to cutting-edge developments in multiple industries.As a Gen AI Engineer / Data Engineer, you will play a vital role in building and managing data ingestion pipelines and Gen AI infrastructure that powers innovative, intelligent systems.

 

Position Title: Senior AI Data Engineer

 

Location: Remote - ColombiaWhat you will be doing:The Gen AI Engineer Data Engineer, you will be responsible for designing and owning robust ingestion pipelines and supporting large-scale Gen AI workflows. You will be deeply involved in the preparation, parsing, and processing of unstructured data, ensuring that information is ready for downstream use in LLM-powered applications. Your responsibilities will include:
  • LData Ingestion Ownership: Own and maintain all ingestion pipelines from various document formats including PDF, PowerPoint (PPTX), Word (DOCX), Excel (XLSX), TXT, and Markdown (MD).
  • Distributed Processing: Leverage PySpark to efficiently process large-scale datasets in distributed environments.
  • LLM Frameworks: Utilize Gen AI tools such as LangChain, LlamaParser, docling, OpenAI, and Hugging Face for tasks like document chunking, embedding, and indexing.
  • Vector Database Management: Work with vector databases such as Databricks Vector Search, Pinecone, and Azure AI Search to store and retrieve high-dimensional data for LLM use.
  • Workflow Management: Design and manage Databricks workflows and scheduled jobs for automated pipeline execution.
  • Client Interaction: Serve as a client-facing engineer, ensuring ingestion pipelines meet business needs and perform reliably in production environments.

 

Requirements & QualificationsTo excel in this role, you should possess:
  • Proven experience in processing unstructured data from formats like PDF, DOCX, PPTX, XLSX, TXT, and MD
  • Strong proficiency in PySpark for distributed data processing
  • Experience with Gen AI/LLM tools including LangChain, LlamaParser, docking, OpenAI, and Hugging Face
  • Solid understanding of chunking, embedding, and indexing techniques
  • Experience with vector databases such as Databricks Vector Search, Pinecone, and Azure AI Search
  • Hands-on experience managing Databricks workflows and scheduled jobs
  • Ability to interface with clients, understand requirements, and deliver tailored solutions

 

Desired Skills:
  • Experience in deploying Gen AI pipelines in production environments
  • Familiarity with optimization of vector search for LLM applications
  • Understanding of retrieval-augmented generation (RAG) architectures
  • Knowledge of data governance and access control within enterprise data platforms

 

Why you will love Lean Tech:
  • Join a powerful tech workforce and help us change the world through technology
  • Professional development opportunities with international customers
  • Collaborative work environment
  • Career path and mentorship programs that will lead to new levels.
Join Lean Tech and contribute to shaping the data landscape within a dynamic and growing organization. Your skills will be honed, and your contributions will be vital to our continued success. Lean Tech is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Architecture Azure Databricks Data governance Engineering Excel Generative AI LangChain LLMs ML infrastructure OpenAI Pinecone Pipelines PySpark RAG Unstructured data

Perks/benefits: Career development Startup environment

Region: South America
Country: Colombia

More jobs like this