Senior Data Engineer
Hyderabad (Office), India
Novartis
Working together, we can reimagine medicine to improve and extend people’s lives.Job Description Summary
data42 is Novartis’ ground-breaking initiative that harnesses the power of R&D data in one of the largest and most diverse datasets in the pharmaceutical industry to reimagine medicine. data42 applies machine learning, artificial intelligence, and sophisticated analytics to generate new insights that increase our understanding of disease and medicines, improve R&D decision-making and ultimately reimagine drug discovery and development. And to take this a step further, we are expanding data42 to create a first-of-its-kind, diverse ecosystem.A key aspect of the program is to centralize & streamline preclinical data collected across Novartis to enable secondary research. Preclinical pipeline team with focus to streamline end-to-end operations, developing new pipelines and building products to will help Preclinical team with data driven insights and bridge between preclinical to clinical domain. The position will work closely with preclinical pipeline lead and data engineering team.
Job Description
Major accountabilities:
- Responsible for Data Engineering - developing, testing and maintaining production grade Foundry Data Pipelines
- Working closely with Preclinical pipeline lead, tech lead and engineering team on data engineering requirements Actively participate in agile work practices.
- Evaluate and validate new Foundry platform features and align with the pipeline team to realize / enable tech spikes on the Foundry platform
- Collaboration with data scientists, data analysts and technology teams to gather requirements and implement solutions that will be tested and documented
- Coordinating with preclinical pipeline team to ensure quality controls, naming convention & best practices have been followed.
- Participate in PoC development to deliver products to address business needs.
- Understanding on Foundry Platform landscape/roadmap (preferrable).
Key performance indicators:
- Delivery of data pipeline engineering activities in a timely manner for the program.
- Execute CI/CD DevOps principles and maintain technical documentation of any new development.
- Apply Quality Engineering principles to ensure high quality delivery.
Job Dimensions:
Impact on the organization: Responsible for the development and maintenance of Preclinical Data Pipeline and delivering high quality PoC which integrates with business needs and help to drive data driven insights.
Minimum Requirements:
Education: Bachelor’s/Master's degree in Computer Science, Applied Mathematics, Engineering, or any other technology related field; equivalent of the same in working experience may also be accepted
Work Experience:
- 6+ years IT experience, 4+ years’ experience in Data Engineering on Big Data platform.
- Able to design and implement data integration of different data modalities.
- Hands-on in programming languages primarily Python, PySpark and Spark.
- Hands-on experience with GIT workflow and Strong knowledge about DevOps (CI/CD and agile framework).
- Hands-on experience working with JIRA/Confluence for technical documentation.
- Strong Analytical thinking and problem-solving skills.
- Experience building scalable solutions and pipelines on big data platforms.
- Hands-on experience on Palantir Foundry Platform using Code Repository, Code Workbook, Data Connection, etc… i.e. components to develop data pipelines (preferrable).
- Knowledge of AI/ML concepts with hands-on experience will be valuable.
- Knowledge of preclinical in-vivo study data e.g. CDSIC SEND standard will be desirable.
Skills:
- Back-End Development.
- Code Analysis.
- Big Data Platforms.
- Data Wrangling.
- Software Documentation.
- Software/Data Engineering.
- Software/Data Testing.
- Analytical thinking.
- CDISC SEND.
- Palantir Foundry.
- Unit Testing.
Desired Skills:
OOD (Object-Oriented Design), REST (Representational State Transfer), Software Design, Software Documentation, Software Engineering, Software Testing, Palantir Foundry, Unit Testing, CDISC SEND, Analytical thinking
Languages :
- Fluent English (Oral and Written)
Skills Desired
Clinical Data Management, Databases, Data Governance, Data Integrity, Data Management, Data Quality, Data Science, Waterfall Model* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Big Data CDISC CI/CD Computer Science Confluence Data governance Data management Data pipelines Data quality DevOps Drug discovery Engineering Git Jira Machine Learning Mathematics Pharma Pipelines PySpark Python R R&D Research Spark Testing
Perks/benefits: Career development Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.