Data Engineer for AI

Philippines (Remote)

Offshorly Ltd.

A cost effective digital agency

View all jobs at Offshorly Ltd.

Apply now Apply later

This is a remote position.

What the engineer will actually do:
  • P1 | Build and schedule Python parsers that extract structured JSON from PowerPoint, PDF, and Excel documents, then land the data in Databricks Bronze → Silver tables.
  • P1 | Develop/maintain simple Auto Loader or Fivetran pipelines for ERP and ticketing systems.
  • P2 | Add basic text‑embedding or LLM‑based entity extraction (LangChain or open‑source transformers) to enrich the document feed.
  • P3 | Write unit tests and lightweight data‑quality checks (Great Expectations) so parsing errors do not break the pipeline.
  • P3 | Produce concise handover docs for our future data architect.

Skill Set:
Must‑have (core):
  • 2‑4 years building ETL or ELT pipelines with Databricks or Snowflake (Delta/Parquet, Spark SQL, Airflow or similar).
  • Solid Python (pandas, PySpark) and experience parsing Office files with libraries such as python‑pptx, openpyxl, pdfplumber, or PyPDF.
  • Basic SQL tuning and ability to work with structured schemas.
  • Git and CI/CD familiarity.
Nice‑to‑have (bonus):
  • Exposure to LangChain, Hugging Face Transformer, or any LLM inference workflow.
  • Experience adding embeddings to tables for downstream ML or search.
  • Great Expectations or similar data‑quality tooling.
  • Familiarity with Unity Catalog or Snowflake RBAC concepts.


Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  1  0

Tags: Airflow CI/CD Databricks ELT ETL Excel FiveTran Git JSON LangChain LLMs Machine Learning Pandas Parquet Pipelines PySpark Python Snowflake Spark SQL Transformers

Regions: Remote/Anywhere Asia/Pacific
Country: Philippines

More jobs like this