Infinia Engineering Intern
Remote, United States
DDN
Revolutionize your AI & HPC ops with DDN® data storage & management solutions. Achieve peak performance, seamless cloud integration & scalable efficiency.Overview
This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.
"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC
“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA
DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.
Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.
Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.
Job Description
DDN is currently seeking interns to join our Infinia Engineering Team, where you'll have the opportunity to work on cutting-edge data storage and management solutions designed for high-performance computing and artificial intelligence applications.
Responsibilities / Potential Projects:
- Design and implement integration of data ingestion and streaming pipelines with open-source tools, like Ray Data, Mosaic Streaming, Tf.data, Torch Dataloader.
- Design of optimization for training like asynchronous checkpointing, and inference, like K-V caching and LORAX.
- Guide the integration of MLFlow with DDN’s Infinia product for comprehensive experiment tracking, model versioning, and deployment.
- Drive the implementation and scaling of Retrieval-Augmented Generation (RAG) pipelines to enhance generative model performance.
- Stay abreast of the latest developments in AIOps, AI frameworks, optimization, and accelerated execution.
- Identify and implement solutions to optimize training and inference pipeline performance, runtime, and resource utilization on Infinia.
Required background/skills:
- Working toward Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or related fields.
- Some experience in machine learning operations (MLOps) or related roles.
- Some experience in building and scaling AI/ML pipelines.
- Strong understanding of machine learning frameworks and libraries (TensorFlow, PyTorch, NVIDIA NeMo, vLLM, TensorRT-LLM).
- Some understanding of cloud infrastructure (AWS, GCP, Azure) and distributed computing.
- Some experience with containerization tools (Docker, Kubernetes) and infrastructure as code.
- Excellent problem-solving and troubleshooting skills, with attention to detail and performance optimization.
- Strong communication and collaboration skills.
Preferred (Nice to have):
- Implementation-level understanding of ML frameworks, data loaders and data formats.
- Experience with scaling RAG pipelines and integrating them with generative AI models.
- Experience in operationalizing AI/ML models in production environments.
- Experience in deploying open-source vector databases at scale.
Work Conditions:
- Remote work
- 0%-10% Travel Required
DDN
DDN has a very strong orientation towards these 4 characteristics and any successful employee will demonstrate these capabilities:
Self-Starter - Takes independent action to identify and solve problems. Seeks out relevant information needed to make decisions. Gets involved with new initiatives.
Success/Achievement Orientation - Delivers quality results consistently. Targets, achieves (or exceeds) measurable results. Sets challenging goals, focuses on critical priorities, and is accountable.
Problem Solving - Recognizes problems and responds with a systematic assessment that identifies and addresses cause of issue. Practical, realistic, and resourceful.
Innovative - Builds and improves key business processes that enhance the effectiveness of DDN. Generates new ideas, challenges the status quo, and solves problems creatively.
DataDirect Networks, Inc. is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.
Tags: AIOps Architecture AWS Azure Computer Science Data management Docker Engineering GCP Generative AI Kubernetes LLMs Machine Learning MLFlow ML infrastructure ML models MLOps Open Source Pipelines PyTorch RAG Research Streaming TensorFlow TensorRT vLLM
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.