Research Engineer

San Francisco, CA

Anyscale

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

View all jobs at Anyscale

Apply now Apply later

About Anyscale:
At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAIUberSpotifyInstacartCruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.
With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.
Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.
Anyscale is based in San Francisco, CA. Employees are required to come in office 3x a week.
About the role
The Anyscale Research team is looking for a strong ML Engineer and Researcher who is passionate about pushing the boundaries of what’s possible with Ray. In this role, you will be instrumental in developing cutting-edge features, such as our new Accelerated DAG (ADAG) API, to establish Ray as a leader in large-scale training. You will play a key role in exploring new directions and driving our vision for Ray’s future.
This position is ideal for individuals with a strong engineering background and a passion for ML systems. You will spend approximately 70% of your time coding and engineering, while the remainder will focus on research to support our strategic vision and innovation. The role demands a deep understanding of ML systems and applications, including LLMs and multimodal models, with a strong emphasis on technical skills and engineering expertise.
About the team
We are a newly formed team dedicated to applied research in both ML modeling and systems. Our mission is to advance the capabilities of Ray and advance machine learning workloads on the Anyscale Platform. We operate at the intersection of research and engineering, collaborating closely with the Ray Core and Ray Train teams to bridge gaps and develop innovative solutions that push the frontier of ML and systems research.

We'd love to hear from you if have

  • Production level experience in Machine Learning, and in distributed ML Systems (Python/Pytorch)
  • 4+ years experience in one of those fields: Machine Learning, NLP, or CV, or ML Systems
  • Graduate degree (MSc or PhD) in one of the fields above
  • Published in a top-tier AI conference (Neurips, ICML, ICLR, CVPR, ACL, etc)

A snapshot of projects you may work on

  • Enhancing Ray for Large-Scale Training: Collaborate with the Ray Core and Ray Train teams to adapt and optimize Ray for efficient, large-scale GPU-heavy training, addressing current limitations and expanding its capabilities.
  • Developing the ADAG API: Explore and potentially implement an Accelerated DAG (ADAG) API for Ray, aiming to improve performance and scalability for complex ML workflows.
  • System Integration and Optimization: Create and refine integrations between Ray and other components, such as Ray Data, to streamline large-scale ML processes and ensure seamless operation across different systems.
  • Research and Innovation: Contribute to cutting-edge research in ML systems, identifying new opportunities and methods to push the boundaries of what Ray can achieve in large-scale training environments.
  • Prototype and Benchmarking: Design and build prototypes to test new features or enhancements, and conduct benchmarking to assess performance improvements and validate the effectiveness of your solutions.
  • Work on applied research, pushing state-of-the-art on large-scale model training
  • Advance Ray as the best open source library for large-scale machine learning

Compensation

  • At Anyscale, we take a market-based approach to compensation. We are data-driven, transparent, and consistent. The target salary for this role is $170,112 ~ $237,000. As the market data changes over time, the target salary for this role may be adjusted.
  • This role is also eligible to participate in Anyscale's Equity and Benefits offerings, including, Stock Options
  • Healthcare plans, with premiums covered by Anyscale at 99%
  • 401k Retirement Plan
  • Wellness stipend
  • Education stipend
  • Paid Parental Leave
  • Flexible Time Off
  • Commute Reimbursement
  • 100% of in office meals covered
Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. 
Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish
Apply now Apply later
  • Share this job via
  • 𝕏
  • or
Job stats:  0  0  0

Tags: APIs Distributed Systems Engineering GPU ICLR ICML LLMs Machine Learning Model training NeurIPS NLP OpenAI Open Source PhD Python PyTorch Research

Perks/benefits: 401(k) matching Career development Equity / stock options Flex vacation Parental leave Wellness

Region: North America
Country: United States

More jobs like this