Data Engineer (Quantexa, Spark ,Scala, Elastic Search)

Singapore, Singapore, Singapore

Apply now Apply later

We are seeking a talented and experienced Data Engineer (Quantexa)with expertise in Hadoop, Scala, Spark, Elastic, Open Shift Container Platform (OCP) and DevOps practices. Elasticsearch to join our team. As a Data Engineer, you will play a crucial role in designing, developing, and optimizing big data solutions using Apache Spark, Scala, and Elasticsearch. You will collaborate with cross-functional teams to build scalable and efficient data processing pipelines and search applications. Knowledge and experience in the Compliance / AML domain will be a plus. Working experience with Quantexa tool is a must.

Responsibilities:

·        Implement data transformation, aggregation, and enrichment processes to support various data analytics and machine learning initiatives

·        Collaborate with cross-functional teams to understand data requirements and translate them into effective data engineering solutions

·        Design, develop, and implement Spark Scala applications and data processing pipelines to process large volumes of structured and unstructured data

·        Integrate Elasticsearch with Spark to enable efficient indexing, querying, and retrieval of data

·        Optimize and tune Spark jobs for performance and scalability, ensuring efficient data processing and indexing in Elasticsearch

·        Implement data transformations, aggregations, and computations using Spark RDDs, DataFrames, and Datasets, and integrate them with Elasticsearch

·        Develop and maintain scalable and fault-tolerant Spark applications, adhering to industry best practices and coding standards

·        Troubleshoot and resolve issues related to data processing, performance, and data quality in the Spark-Elasticsearch integration

·        Monitor and analyze job performance metrics, identify bottlenecks, and propose optimizations in both Spark and Elasticsearch components

·        Ensure data quality and integrity throughout the data processing lifecycle

·        Design and deploy data engineering solutions on OpenShift Container Platform (OCP) using containerization and orchestration techniques

·        Optimize data engineering workflows for containerized deployment and efficient resource utilization

·        Collaborate with DevOps teams to streamline deployment processes, implement CI/CD pipelines, and ensure platform stability

·        Implement data governance practices, data lineage, and metadata management to ensure data accuracy, traceability, and compliance

·        Monitor and optimize data pipeline performance, troubleshoot issues, and implement necessary enhancements

·        Implement monitoring and logging mechanisms to ensure the health, availability, and performance of the data infrastructure

·        Document data engineering processes, workflows, and infrastructure configurations for knowledge sharing and reference

Requirements

  1. More than 5 years of experience as a Data Engineer
  2. · Bachelor's or Master's degree in Computer Science, Software Engineering, or a related discipline
  3. · Possession of Quantexa certification as a Data Engineer or Data Architect, with proficiency in the tool
  4. · Demonstrated experience as a Data Engineer, utilizing Hadoop, Spark, and data processing technologies in large-scale environments
  5. · Expertise in the Scala programming language and familiarity with functional programming principles
  6. · Prior experience with the Quantexa tool is highly desirable
  7. · Comprehensive understanding of Apache Spark architecture, including RDDs, DataFrames, and Spark SQL
  8. · Advanced proficiency in designing and developing data infrastructure utilizing Hadoop, Spark, and associated tools (HDFS, Hive, Pig, etc.)
  9. · Experience with containerization platforms such as OpenShift Container Platform (OCP) and container orchestration via Kubernetes
  10. · Proficiency in programming languages commonly employed in data engineering, including Spark, Python, Scala, or Java
  11. · Knowledge of DevOps methodologies, CI/CD pipelines, and infrastructure automation tools (e.g., Docker, Jenkins, Ansible, BitBucket)
  12. · Experience with Graphana, Prometheus, and Splunk will be considered an added advantage
  13. · Background in integrating and utilizing Elasticsearch for data indexing and search applications
  14. · Solid understanding of Elasticsearch data modeling, indexing strategies, and query optimization techniques
  15. · Experience with distributed computing, parallel processing, and handling large datasets
  16. · Proficient in performance tuning and optimization methods for Spark applications and Elasticsearch queries
  17. · Strong problem-solving and analytical capabilities with the capacity to debug and resolve intricate issues
  18. · Familiarity with version control systems (e.g., Git) and collaborative development environments
Apply now Apply later
  • Share this job via
  • 𝕏
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0
Category: Engineering Jobs

Tags: Ansible Architecture Big Data Bitbucket CI/CD Computer Science Data Analytics Data governance Data quality DevOps Docker Elasticsearch Engineering Git Hadoop HDFS Java Jenkins Kubernetes Machine Learning Pipelines Python Scala Spark Splunk SQL Unstructured data

Region: Asia/Pacific
Country: Singapore

More jobs like this