Data Engineer II - PySpark

NY-New York, USA

Memorial Sloan Kettering Cancer Center

Careers and jobs in oncology and cancer care for graduates, students, medical professionals, scientists, operations professionals, hospital administrators, and nurses. [job title], [job title]

View all jobs at Memorial Sloan Kettering Cancer Center

Apply now Apply later

Company Overview

The people of Memorial Sloan Kettering Cancer Center (MSK) are united by a singular mission: ending cancer for life. Our specialized care teams provide personalized, compassionate, expert care to patients of all ages. Informed by basic research done at our Sloan Kettering Institute, scientists across MSK collaborate to conduct innovative translational and clinical research that is driving a revolution in our understanding of cancer as a disease and improving the ability to prevent, diagnose, and treat it. MSK is dedicated to training the next generation of scientists and clinicians, who go on to pursue our mission at MSK and around the globe. 

 

Please review important announcements about vaccination requirements and our upcoming EHR implementation by clicking here.

 

Important Note for MSK Employees: 

Your Career Hub profile is submitted to the hiring team as your internal resume. Please be sure your profile is fully complete with your skills, relevant experience and education (if required). Click here to learn more. Please note, this link is only accessible for MSK employees.  

Job Description

Data Engineer II

Exciting Opportunity at MSK

Seeking experienced data infrastructure engineer to help drive MSK’s data modernization journey. You will design, implement, and maintain data infrastructure solutions for a new hybrid and multi-cloud data ecosystem that will be used for multiple purposes including data science and analytics. Seeking a talented data engineer to be a key player in MSK’s mission to fight cancer.  You will leverage your skills with Spark, Data Lakehouse, and cloud technologies to design, develop, and maintain scalable data processing systems, enabling us to derive valuable insights from our data in a hybrid and multi-cloud data ecosystem. 

 

Position Summary:

  • Work closely with stakeholders to understand data requirements and provide technical solutions
  • Develop ELT data pipelines using PySpark coding best practices.
  • Use CI/CD and other dataops best practices.
  • Transform data into conformed models and data products.
  • Monitor and optimize data processing jobs for performance, ensuring efficient resource utilization, optimal data storage structure and minimal processing time.
  • Implement data validation and quality checks to ensure the integrity and reliability of data throughout the processing lifecycle.
  • Create and maintain documentation for data pipelines, processes, and best practices.
  • Keep abreast of the latest developments in data engineering and related technologies.

Required Skills:

  • Strong proficiency with PySpark development and troubleshooting.
  • Experience with Databricks and medallion architecture.
  • Solid understanding of database design fundamentals.
  • Solid understanding of cloud infrastructure concepts including network and security.
  • Experience with CI/CD for development lifecycle and testing automation.
  • Implementation expertise on AWS technologies including S3, EC2, Glue, MWAA.
  • Strong analytical, problem solving and organizational skills.
  • Excellent communication and collaboration skills.
  • Minimum 4+ years of Cloud Data Engineer working experience.

Core Skills:

  • An in-depth understanding of both business and technical discussions.
  • Excellent communication and collaboration skills.
  • Strong analytical, problem solving and organizational skills.

Additional Information:

  • Location: 633 Third Avenue, NY
  • Reporting to Director, Data Management
  • Schedule: Hybrid, 4 days a month onsite

Pay Range: $118,800 - $196,200

Helpful Links:

  • MSK Compensation Philosophy
  • Review Our Greats Benefits Offerings

#LI-POST

#LI-HYBRID

Closing

MSK is an equal opportunity and affirmative action employer committed to diversity and inclusion in all aspects of recruiting and employment. All qualified individuals are encouraged to apply and will receive consideration without regard to race, color, gender, gender identity or expression, sexual orientation, national origin, age, religion, creed, disability, veteran status or any other factor which cannot lawfully be used as a basis for an employment decision.  

 

Federal law requires employers to provide reasonable accommodation to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job or to perform your job. Examples of reasonable accommodation include making a change to the application process or work procedures, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment.

Apply now Apply later
Job stats:  1  1  0
Category: Engineering Jobs

Tags: Architecture AWS CI/CD Databricks Data management DataOps Data pipelines EC2 ELT Engineering Pipelines PySpark Research Security Spark Testing

Perks/benefits: Career development

Region: North America
Country: United States

More jobs like this