Data Engineer

Vilnius

Oxylabs

The best proxy service platform with 100M+ Residential and 2M Datacenter IP proxies. Extract public data from any website with ease!

View all jobs at Oxylabs

Apply now Apply later

We’re a team of 500+ professionals who develop cutting-edge web data gathering solutions for thousands of the world’s best known businesses, including Fortune 500 companies.
What’s in store for you:
As our Data Engineer, you will be responsible for tackling a diverse and challenging range of problems to help us make better business decisions at Oxylabs.io. In this role, you will focus on making data (internal/external, structured/unstructured, batch/real-time, etc.) accessible across Oxylabs.io while ensuring its accuracy and timeliness. You will also have the opportunity to work with various agent frameworks and vector databases, leveraging the latest AI models (incl. Open AI’s GPT, Anthropic’s Claude, Perplexity’s Sonar and Meta’s Lllama) to build internal tools that our commercial teams will use daily.
This is an excellent opportunity for an analytical thinker who thrives in a fast-paced environment, has a passion for AI technologies, and is eager to learn new technologies.

Our tech stack*:

  • Google Cloud Platform (Pub/Sub, GCS, BQ) for ingestion, storage and warehousing;
  • dbt and SQL for data modelling;
  • PowerBI and Superset for data visualisation.
  • Python for automation;
  • Airflow for orchestration;
  • Kubernetes/Argo CD for deployment;
  • Streamlit for data applications;
  • LangChain for agentic pipelines;
  • nexos.ai for LLM routing.

  • *We do not expect candidates to master or know all of the technologies today.

Your day-to-day:

  • Build a solid understanding of the relationships between our data sources through data analysis and modelling;
  • Use Google Cloud Platform together with Airflow to build data pipelines that apply business logic, including making choices about levels of aggregation, grouping and transforming fields without compromising scalability and performance, and being responsible for data quality;
  • Build custom scalable LLM applications using Python that leverage vector databases and RAG techniques for efficient knowledge retrieval;
  • Fine-tune and optimize prompt strategies to enhance accuracy, reliability and efficiency of LLM outputs;
  • Conduct research and rapid prototyping with new LLM technologies and frameworks to keep our AI capabilities at the cutting edge;
  • Collaborate with cross-functional teams to identify AI-driven opportunities and integrate solutions into their operations;
  • Develop internal documentation and best practices for AI implementation across the company.

Your skills & experience:

  • Excellent SQL knowledge;
  • Strong proficiency in at least one programming language, ideally Python;
  • Knowledge of the main data modelling/data warehousing principles;
  • Experience with cloud infrastructure (ideally, Google Cloud);
  • Knowledge of best DataOps practices and tools incl. version control systems (e.g., Git), Docker and CI/CD;
  • You have an interest and passion for AI-powered applications, as evidenced by personal projects or academic research;
  • You have a technical aptitude and don’t shy away from engineering-related discussions;
  • You have a keen eye for detail and troubleshooting as well as a high degree of ownership;
  • You have a structured mindset approach to solve any kind of problem and able present its outcome in a clear and concise way;
  • You have an ability to prototype solutions quickly and translate them into production-ready code;
  • You’re excellent at communicating in both written and spoken Lithuanian and English.

  • Nice to have experience:
  • Experience with LLMs (e.g., Open AI/Anthropic/Perplexity APIs);
  • Proficiency in deploying and optimizing open-source AI models (e.g., LLama) along with familiarity with related tooling (e.g., Ollama);
  • Background in machine learning beyond LLMs (e.g., computer vision, traditional ML).

Salary:

  • Gross salary 3400-5000 EUR/month. Keep in mind that we are open to discuss a different salary based on your skills and experience.

  • Up for the challenge? Let’s talk!
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Engineering Jobs

Tags: Airflow Anthropic APIs CI/CD Claude Computer Vision Data analysis DataOps Data pipelines Data quality Data Warehousing dbt Docker Engineering GCP Git Google Cloud GPT Kubernetes LangChain LLaMA LLMs Machine Learning Open Source Pipelines Power BI Prototyping Python RAG Research SQL Streamlit Superset

Region: Europe
Country: Lithuania

More jobs like this