Senior Engineer- Python Pyspark
Bangalore Anchorage, India
Thales
From Aerospace, Space, Defence to Security & Transportation, Thales helps its customers to create a safer world by giving them the tools they need to perform critical tasksEssential Functions / Key Areas of Responsibility
- Expert in Python, with knowledge of PySpark
- Writing distributed data processing code with PySpark
- Optimizing Spark jobs using DataFrames API and RDDs
- Working with Spark SQL for querying large datasets (structured sql/mysql database)
- Implementing UDFs (User Defined Functions) for custom transformations
- Performance tuning (caching, partitioning, broadcast joins, threading, multiprocessing, Predicate Pushdown)
- ETL development using Databricks workflows and Delta Live Tables
- Data pipelines with Databricks Jobs
- Streaming data processing with Structured Streaming ( event hub or AWS SQS)
- Non structured file processing like parquet , columnar data format
- Responsible for writing reusable, testable and efficient code — ensuring the application's performance, security and scalability.
- Excellent problem-solving ability with solid communication and collaboration skills.
- Provide technical documentation for system, features, and components
- Follow and support agile methodologies and practices by actively participating in all SCRUM ceremonies
Minimum Requirements: Skills, Experience, Education, Technical/Specialized Knowledge, Certifications, Language.
- Bachelor’s degree in Engineering, Computer Science or related study
- Proven experience as a Python language with PySpark
- Minimum 3+ years of experience
- Understanding of data structures, data modeling and software architecture
- Ability to write robust code in Python
- Outstanding analytical and problem-solving skills
- Experience with Git, JIRA, Confluence
- Experience in working on Linux OS.
- Excellent written and verbal communication.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile APIs Architecture AWS Computer Science Confluence Databricks Data pipelines Engineering ETL Git Jira Linux MySQL Parquet Pipelines PySpark Python Scrum Security Spark SQL Streaming
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.