HPC Specialist
Tasks
- Architect distributed serving solutions
- Configure GPU server fleets
- Configure firewalls
- Configure inter node communication
- Configure load balancers
- Deploy GPU infrastructure
- Evaluate model serving frameworks
- Handle incident response
- Implement inference acceleration techniques
- Implement multi node multi GPU model deployments
- Implement storage solutions
- Improve reliability with monitoring and alerting
- Maintain GPU infrastructure
- Manage GPU-enabled Kubernetes clusters
- Optimize gpu infrastructure
- Optimize inference cache storage
- Optimize infrastructure performance
- Optimize storage for model weights
- Perform capacity planning
- Profile model performance
- Provision GPU server fleets
- Research GPU technologies
- Troubleshoot performance bottlenecks
Perks/Benefits
- N/A
Skills/Tech-stack
Ansible | Bash | Distributed Systems | Firewalls | GPU drivers | GPU infrastructure | Grafana | HTTP/2 | Infrastructure as Code | Kubernetes | Large Language Model | Large language model inference | Linux | Load Balancing | Model Inference | Model Serving | Network Configuration | Network Load Balancing | Prometheus | Python | SGLang | Storage Optimization | TCP/IP | Terraform | VLLM | “as-code”
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Related jobs
-
Senior Platform Analytics Engineer CAD 156K-214KAWS | Apache Airflow | Apache Spark | Big Data | DBTBenefits | Equity | Hybrid work schedule | Remote work flexibilitySenior-level Full TimeKitchener-Waterloo, ON; Toronto, ON22h ago
-
Senior-level Full TimeMontreal22h ago
-
Principal Software Engineer (AI) (CAD) CAD 113K-160KAI RMF | Agentic Systems | Artificial Intelligence | Audit Logging | Cost OptimizationAllyship and inclusion communities | Continuous development support program | Employee assistance program | Employee recognition programs | Fertility and adoption supportSenior-level Full TimeRemote- Canada R23h ago
-
Embedded Software Developer (Optical Control) CAD 71K-113KC# | C++ | Control Systems | Control loops | DWDMMid-level Full TimeKanata, ON1d ago
-
Availability Testing | C++ | Connection Management | Data Movement | Distributed SystemsBackup child and elder care | Dental insurance | Fertility assistance | Flexible paid time off | Health insuranceSenior-level Full TimeToronto1d ago
-
Apache Spark | Azure | Azure Data | Azure Data Factory | Data FactoryMid-level Full TimeToronto, ON, Canada1d ago
-
Senior Backend Software Developer, Platform & Services CAD 123K-161KAPI Design | AWS | ClickHouse | Data pipeline | Database DesignDental insurance | Flexible work hours | Health insurance | Open PTO policy | Parental leave top-upSenior-level Full TimeCanada (Remote) R1d ago
-
Senior Data Engineer CAD 132K-208KAccess Management | Cassandra | Data Governance | Data Modeling | Data QualitySenior-level Full TimeRemote - Toronto, Ontario, Canada R1d ago
-
AI Solution Specialist (Hybrid) CAD 100K-176KAPI Integration | Architecture Design | Automation | Document analysis | LLM Prompt EngineeringEmployee benefits | Hybrid work environmentSenior-level Full TimeMontreal 700, Canada1d ago
-
Senior Snowflake Data Engineer, AVP CAD 91K-140KAPI | Airflow | Apache Airflow | Azure | Azure DevOpsDevelopment programs | Educational support | Employee networks | Flexible work options | Insurance plansSenior-level Full TimeToronto, Ontario, Canada1d ago
-
Software Developer (API & Data Engine Integration) CAD 270K-270KAWS | Azure | CI/CD | Cloud Storage | Data IngestionContract position | Fully remoteSenior-level ContractToronto, ON, Canada R1d ago
-
Data Engineer CAD 100KAPI | AWS Lambda | Amazon S3 | Amazon SageMaker | Amazon Web ServicesFlexible vacation and sick days | Health and wellness benefits | Paid time off | Professional development paid training | Team events and activitiesSenior-level Full TimeToronto, Canada2d ago
-
A/B | A/B Testing | Agents | B testing | EmbeddingsAsynchronous collaboration | Equity opportunities | Flexible schedule | Mentorship | Portfolio Ready ProjectsEntry-level InternshipOntario, Canada2d ago
-
Applied AI Engineer CAD 115K-145KAI orchestration | API Development | AWS | AWS Bedrock | Anthropic Claude401k match | Annual professional development budget | Charitable donation match | Commuter benefits | Flexible time offMid-level Full TimeRemote - Ontario, Canada R2d ago
-
Analysis algorithms | Cloud Computing | Coordinate systems | Data Structures | DockerMid-level Full TimeMontréal, Québec, Canada2d ago
-
APIs | Agentic Systems | Cloud Computing | Deep learning | Fine TuningGym reimbursement | Health insurance | Meal stipend | PTO | Pension matchingEntry-level Full TimeMontreal3d ago
-
Senior Data Engineer CAD 113K-139KApache Kafka | Apache Spark | Azure Data | Azure Data Factory | Azure Data LakeCare days | Defined benefit pension | Extra statutory holidays | Flexible work arrangement | Generous vacationSenior-level Full TimeVancouver, British Columbia, Canada3d ago
-
Senior-level Full TimeCanada4d ago
-
Commissioning Specialist - Robotics CAD 80K-110KAutomation | Documentation | Hardware Integration | Project Management | PythonCareer development plans | Comprehensive benefits plan | Hybrid work | Mentorship program | Professional developmentMid-level Full TimeMontreal, QC, Canada4d ago
-
API Integration | Agent Frameworks | Artificial Intelligence | Backend Services | Content ModerationCollaborative team environment | High impact AI applications | Modern development tools and resources | Remote work flexibilitySenior-level Full TimeCanada R4d ago
-
AWS | CI/CD | Containerization | Geopandas | KubernetesSenior-level Full TimeVancouver4d ago
-
API Integration | Apache Airflow | Cloud platform | Docker | EthereumGrowth opportunities | Hybrid work | Remote workSenior-level Full TimeToronto R4d ago
-
Mid-level Full TimeToronto4d ago
-
AI Engineer-Memory Retrieval CAD 95K-132KAgent Orchestration | Autogen | CrewAI | Embedding | Information RetrievalCompany paid cell phone plan | Employee assistance program | Flexible hours | Flexible work options | Health, dental, vision coverageMid-level Full TimeToronto, ON, Canada4d ago
-
Lead AI Engineer CAD 170K-255KAI Agents | Asynchronous programming | Background Jobs | Change detection | Data integrationCompany paid cell phone plan | Dental coverage | Employee assistance program | Flexible work options | Healthcare coverageSenior-level Full TimeToronto, ON, Canada4d ago