Engineer, Supercomputing & Distributed Systems
Tasks
- Build and operate distributed training and inference infrastructure
- Build fault tolerance systems for large scale pretraining
- Build systems bridging cluster capacity and research output
- Debug InfiniBand networking and distributed training runs
- Deploy and combine large language models for data captioning
- Design multi stage data pipelines for large scale datasets
- Develop large scale ETL systems
- Implement distributed shot boundary detection pipelines
- Manage distributed training on GPU Kubernetes clusters
- Orchestrate and scale large scale GPU job processing
- Profile and optimize streaming dataloaders
Perks/Benefits
- N/A
Skills/Tech-stack
Docker | DuckDB | ETL | Infiniband | Kafka | Kubernetes | Linux | NCCL | Networking | NumPy | Pandas | Pulsar | PyArrow | PyTorch | Python | RDMA | SQL | Streaming
Education
N/A
Regions
Countries
States
Related jobs
-
Senior Exploitation Specialist / Data Scientist USD 93K-166KData Analysis | Data Processing | Data Visualization | Machine Learning | PythonSenior-level Full TimeSpringfield, Virginia, United States10h ago
-
AI Agents | AI Search | AWS | Agentic Workflows | Amazon SageMaker401k | Dental insurance | Medical insurance | Paid sick hours | Vision insuranceSenior-level Contract Full TimeRidgefield Park, NJ, United States11h ago
-
Senior Software Engineer (Embedded) USD 140K-250KC# | C++ | Calibration | Closed Loop | Closed loop control401k | Catered meals | Dental insurance | Flexible work hours | Health insuranceSenior-level Full TimeLos Angeles, CA (On-site)11h ago
-
Bias Measurement | Calibration | Experiment design | Human-in-the-loop | Language ModelsEquity | Flexible work model | In office collaboration 1 to 2 times per quarterSenior-level Full TimePalo Alto, CA, US; Remote, US R11h ago
-
Internship - AI & Retail Forecast Analyst USD 60K-68KData Modeling | Data Validation | Data Visualization | Data cleaning | Machine LearningCPG industry experience | Cross-functional team experience | MentorshipEntry-level Full Time InternshipDallas, Texas, United States12h ago
-
Staff Software Engineer, Machine Learning USD 170K-277KBig Data | Data Mining | Deep learning | Information Retrieval | JavaAnnual performance bonus | Health and wellness programs | Hybrid work model | Stock | Time away from workSenior-level Full TimeSunnyvale, CA, United States12h ago
-
Internship - AI Solutions Development USD 42K-52KAI tools | Analytics | Artificial Intelligence | Automation | Data AnalysisEntry-level Full Time InternshipCorona, California, United States13h ago
-
Senior AI/ML Data Scientist USD 165K-195KA/B | A/B Testing | AWS | Ad-Tech | B testing401k match | Commuter benefits | Dental insurance | Flexible vacation | Life insuranceSenior-level Full TimeBoulder, Colorado13h ago
-
Generative AI Consultant USD 105K-105KAWS | Azure | CI/CD | Chroma | Cloud platform401k matching | College loan repayment plan | Company holidays | Dental insurance | Flexible spending accountMid-level Full TimeSan Francisco, CA, United States13h ago
-
Senior Machine Learning Engineer, Trust USD 191K-223KA/B | A/B Testing | Anomaly Detection | Apache Airflow | Apache KafkaSenior-level Full TimeRemote-USA R13h ago
-
Senior Machine Learning Engineer II USD 201K-253KAutoregressive models | Bias Mitigation | CTR Prediction | Causal Inference | Conversion RateAnnual refresh grants | Equity grant | Flex First work policy | Remote workSenior-level Full TimeUnited States - Remote R13h ago
-
Senior Machine Learning Engineer, Gen AI USD 165K-210KASR | AWS | Audio Processing | Cloud Computing | ContainersOpportunity to work in office if located near headquarters | Remote work optionSenior-level Full TimeUS Remote R13h ago
-
Senior Software Engineer, Data Platform USD 163K-247KAWS | Amazon EMR | Amazon Kinesis | Amazon MSK | Amazon RedshiftHybrid workSenior-level Full TimeDenver, CO;San Francisco, CA;New York, NY;Los … R13h ago
-
Senior Data Engineer USD 122K-195KAWS Redshift | DBT | Data Governance | Data Lineage | Data ModelingEquity | Health insurance | Hybrid work | LifeTime Membership | Parental leaveSenior-level Full TimeRemote - United States R13h ago
-
Staff Data Engineer USD 140K-224KApache Spark | CDC | DBT | Data Governance | Data ModelingGenerous parental leave | Healthcare coverage | Hybrid work schedule | Lifetime Headspace membership | Monthly wellness stipendSenior-level Full TimeRemote - United States R14h ago
-
Applied ML Engineer, Data USD 200K-260KAWS S3 | Amazon DynamoDB | Annotation Workflows | Data Filtering | Data Parsing401k retirement plan | Company equity | Company holidays | Dental insurance | Fertility supportMid-level Full TimeRemote (U.S. or Europe) R14h ago
-
AI Lead USD 82K-175KAPI Development | AWS | Inference Pipelines | LLM Operations | Language ModelsBackground check required | Remote workSenior-level Full TimeSchenectady, New York, United States, Remote R14h ago
-
Staff AI Solutions Engineer USD 177K-208KAWS | Agentic Workflows | Azure | Docker | Generative AIAdoption reimbursement | Commuter stipend | Fertility reimbursement | Flexible PTO | Home office stipendSenior-level Full TimeSan Francisco, California, United States15h ago
-
Senior Software Engineer, Data Governance & Foundations USD 166K-210KApache Airflow | Apache Flink | Apache Hudi | Apache Iceberg | Apache SparkSenior-level Full TimeUnited States - Remote R15h ago
-
Senior-level Full TimeSunnyvale, CA, United States15h ago
-
Data Engineer USD 185K-225KAWS EMR | AWS Glue | AWS S3 | Airflow | Amazon Athena401k match | Flexible PTO | Health and wellness allowance | Health insurance | Paid parental leaveSenior-level Full TimeSan Francisco (Hybrid) R15h ago
-
Senior-level Full TimeDenver, CO;San Francisco, CA;New York, NY15h ago
-
Customer Success Engineer - Database (2nd Shift) USD 75K-94KAnsible | Backups | ClickHouse | Cloud infrastructure | Database performanceConference reimbursement | Employee assistance program | Flexible time off | Remote work | Training reimbursementEntry-level Full TimeSeattle R15h ago
-
Customer Success Engineer - Database (2nd Shift) USD 75K-94KAnsible | Automation | Cause analysis | ClickHouse | Cloud infrastructureConference reimbursement | Employee assistance program | Employee equity options | Flexible time off | LinkedIn Learning accessEntry-level Full TimeDenver R15h ago
-
Customer Success Engineer - Database (2nd Shift) USD 75K-94KAnsible | Backups | ClickHouse | Helm | Incident ResponseConference reimbursement | Employee assistance program | Employee meetups | Flexible time off | LinkedIn Learning accessEntry-level Full TimeBoston R15h ago