Principal Data Scientist, Accelerated Apache Spark
US, CA, Santa Clara, United States
NVIDIA
NVIDIA erfindet den Grafikprozessor und fördert Fortschritte in den Bereichen KI, HPC, Gaming, kreatives Design, autonome Fahrzeuge und Robotik.NVIDIA is looking for a Principal Data Scientist to join the GPU accelerated Apache Spark team. Data scientists spend a considerable amount of time exploring data, iterating over machine learning (ML) experiments.Apache Spark is the most popular data processing engine in data centers for data science. It is used for interactive data science, from data preparation, to running ML experiments, and all the way to deployment of ML applications. You will work with the open source community to accelerate Apache Spark with GPU. You will apply the latest ML/AI methods to empower enterprises to migrate Spark workloads onto GPUs at scale. Come join NVIDIA to apply data science to help us grow the adoption of GPU accelerated Spark.
What you’ll be doing:
Develop ML models to predict the performance of GPU accelerated Apache Spark on existing workloads.
Develop ML models to tune GPU accelerated Apache Spark configurations to optimize performance on specific workloads.
Work on systems that continuously adapt and improve the aforementioned ML models.
Work on ML/AI agents that can help fix and optimize GPU accelerated Apache Spark applications.
Work on new functionality for GPU accelerated Apache Spark to facilitate large scale ML model training and inference.
Create examples showcasing how to best use GPU accelerated Apache Spark and Spark MLlib to carry out large scale ML and DL training and inference.
Work with NVIDIA partners and customers on deploying GPU accelerated Spark ML algorithms in cloud or on-premise.
Keep up with published advances in ML systems and algorithms.
Provide technical mentorship in data science and ML to a team of engineers.
What we need to see:
BS, MS, or PhD in Data Science, Statistics, Computer Science, Computer Engineering, or closely related field (or equivalent experience).
12+ years of work or research experience, with 5+ years as technical lead, in ML model development.
2+ years of hands-on experience with Apache Spark.
Proven technical skills in crafting, implementing, and productionizing high-quality ML solutions.
Proven ability to use modern techniques and tools for all aspects of ML model development, deployment, and maintenance.
Excellent programming skills in Python and Python data science related libraries like numpy, pandas, scikit-learn, scipy, pytorch, and tensorflow.
Experience developing boosted tree model based solutions, using libraries like XGBoost.
Background in developing LLM/GenAI based solutions.
Experience in feature engineering and feature importance assessment.
Familiar with agile software development practice.
Ways to stand out from the crowd:
Knowledge of architecture of Apache Spark is a strong plus.
Familiarity with NVIDIA GPUs and CUDA is a strong plus.
Experience coding in Scala, Java, and/or C++ is a strong plus.
Able to work well with multi-functional teams across organizational boundaries and geographies.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most skilled and dedicated people in the world working for us. If you are passionate about what you do, creative and driven, we want to hear from you!
The base salary range is 272,000 USD - 419,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Tags: Agile Architecture Computer Science CUDA Engineering Feature engineering Generative AI GPU Java LLMs Machine Learning ML models Model training NumPy Open Source Pandas PhD Python PyTorch Research Scala Scikit-learn SciPy Spark Statistics TensorFlow XGBoost
Perks/benefits: Career development Equity / stock options
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.