Data Engineering
Bucharest, Romania
General role:
Contribute to the business value of Data-oriented products based on on-premise Datalake or on cloud environments, by implementing end-to-end data processing chains, from ingestion to API exposure and data visualization
General responsibility: Quality of data transformed in the Datalake, proper functioning of data processing chains and optimization of the use of resources of on-premise or cloud clusters by data processing chains
General skills: Experience in the implementation of end-to-end data processing chains and Big data architectures in the Cloud (GCP) mastery of languages and frameworks for the processing of massive data in particular in Streaming Mode (Beam DataFlow , Java, Spark / Scala / DataProc). Practice agile methods.
Role
You will set up end-to-end data processing chains in cloud environments and in a devops culture, You will work on brand new products, for a wide variety of functional areas (Engineering, Connected vehicle, Manufacturing, IoT, Commerce, Quality, Finance), with a solid team to support you.
Main responsibilities
- During the definition of the project
- Design of data ingestion chains
- Design of data preparation chains
- Design of basic ML algorithms
- Data product design
- Design of NOSQL data models
- Data visualization design
- Participation in the selection of services / solutions to be used according to usage
- Participation in the development of a data toolbox
During the iterative realization phase
- Implementation of data ingestion chains
- Implementation of data preparation chains
- Implementation of basic ML algorithms
- Implementation of data visualizations
- Use of ML framework
- Implementation of data products
- Exhibition of data products
- Configuration of NOSQL databases
- Distributed processing implementation
- Use of functional languages
- Debugging distributed processing and algorithms
- Identification and cataloging of reusable items
- Contribution to the evolution of work standards
- Contribution and advice on data processing problems
During integration and deployment
- Participation in problem solving
During serial life
- Participation in the monitoring of Operations
- Participation in problem solving
Skills
- Expertise in the implementation of end-to-end data processing chains
- Mastery of distributed development
- Basic knowledge and interest in the development of ML algorithms
- Knowledge of ingestion frameworks
- Knowledge of Beam and its different execution modes on DataFlow
- Knowledge of Spark and its different modules
- Mastery of Java (+ Scala and Python)
- Knowledge of the GCP ecosystem DataProc, DataFlow, BigQuery, Pub-Sub, PostgreSQL/Composer, Cloud Functions, StackDriver)
- Knowledge of the use of Solace
- Experience with usage of Generative AI tools (Copilot GitHub, GitLab Duo ..)
- Knowledge of Spotfire & Dynatrace
- Knowledge of the ecosystem of NOSQL databases
- Knowledge in building data product APIs
- Knowledge of Dataviz tools and libraries
- Ease in debugging Beam (+ Spark) and distributed systems
- Popularization of complex systems
- Control of the use of data notebooks
- Expertise in data testing strategies
- Strong problem-solving skills, intelligence, initiative and ability to resist pressure
- Excellent interpersonal skills and great communication skills (ability to go into detail)
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile APIs Architecture Big Data BigQuery Copilot Dataflow Dataproc Data visualization DevOps Distributed Systems Engineering Finance GCP Generative AI GitHub GitLab Java Machine Learning NoSQL PostgreSQL Python Scala Spark Spotfire Streaming Testing
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.