Data Center Technician
Menlo Park
Lamini
Lamini is the enterprise LLM platform for existing software teams to quickly develop and control their own LLMs. Lamini has built-in best practices for specializing LLMs on billions of proprietary documents to improve performance, reduce...
Lamini enables every enterprise to safely, quickly, and cost-effectively build their own Expert AI. Our customers own their own models, trained on their data. Lamini optimizes for Expert AI workloads with minimal hallucination, enterprise-grade security, and enterprise flexibility, running on any infrastructure. Our team is made up of highly committed engineers, researchers, and tech industry veterans excited by mission and technology. We’re backed by leading VCs as well as computing and technology companies.
We are looking for a skilled Data Center Technician to oversee the physical and technical aspects of our GPU cluster. This role is essential for maintaining a stable and efficient computing environment, optimizing system performance, and minimizing downtime. You’ll be hands-on with hardware, responsible for troubleshooting, maintenance, and upgrades, and collaborate with our engineering teams to support their research and production workloads.
We are looking for a skilled Data Center Technician to oversee the physical and technical aspects of our GPU cluster. This role is essential for maintaining a stable and efficient computing environment, optimizing system performance, and minimizing downtime. You’ll be hands-on with hardware, responsible for troubleshooting, maintenance, and upgrades, and collaborate with our engineering teams to support their research and production workloads.
Key Responsibilities:
- Cluster Management: Oversee day-to-day operations of our GPU cluster, including hardware and software maintenance, troubleshooting, and repairs to ensure optimal performance.
- Deployment & Configuration: Assist with the deployment, configuration, and calibration of GPU servers, racks, and networking equipment.
- Hardware Upgrades: Implement and support hardware upgrades, including new GPU installations, networking installations, and other critical infrastructure updates.
- Monitoring & Optimization: Continuously monitor system performance, capacity, and health using tools and alerts, and take proactive steps to optimize resource allocation and prevent downtime.
- Troubleshooting: Quickly diagnose and resolve hardware and network issues, coordinating with team members to minimize disruptions.
- Documentation: Maintain accurate records of configurations, maintenance schedules, and hardware inventory for efficient and organized data center management.
- Collaboration: Serve as datacenter liaison for vendor support personnel and manage support tickets with hardware vendors as needed. Work closely with AI researchers and engineers to understand their hardware requirements and support them in running large-scale ML and DL workloads. Ability to prioritize and communicate issues, as well as provide clear and accurate SLAs.
- Oncall: Ability to work in an environment that operates 24/7 with an ability to participate in on-call rotation and provide after-hours support as needed
Requirements:
- Technical Education/Experience: Bachelor’s degree in Computer Science, IT, Electrical Engineering, or a related field, or equivalent hands-on experience.
- Data Center Expertise: 2+ years of experience in a data center environment, with a strong understanding of server maintenance, networking, and hardware troubleshooting.
- GPU Knowledge: Experience working with GPU hardware, preferably in an AI or high-performance computing environment (experience with AMD GPUs is a plus).
- Networking Skills: Familiarity with networking concepts (TCP/IP, DNS, DHCP, RoCE, redundancy) and experience with network hardware in a data center setting.
- Problem-Solving Skills: Strong analytical skills and the ability to quickly diagnose and resolve technical issues.
- Team Player: Effective communication skills and the ability to work collaboratively with engineering and research teams.
Preferred Skills:
- Scripting & Automation: Basic scripting skills (Python, Bash) to automate routine tasks.
- Hands-On Data Center Experience: A deep understanding of server hardware, BMC-based manageability, BIOS setting and firmware deployment. Familiarity with Infiniband switches and network topology.
- Monitoring Tools: Familiarity with monitoring and logging tools, such as Prometheus, Grafana, or similar. Basic Linux system administration expertise.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Job stats:
0
0
0
Tags: Computer Science Engineering GPU Grafana InfiniBand Linux Machine Learning Python Research Security Testing
Region:
North America
Country:
United States
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
Principal Data Scientist jobsBI Developer jobsData Scientist II jobsStaff Data Scientist jobsPrincipal Data Engineer jobsData Manager jobsJunior Data Analyst jobsData Science Manager jobsResearch Scientist jobsBusiness Data Analyst jobsLead Data Analyst jobsSenior AI Engineer jobsData Engineer III jobsSr. Data Scientist jobsData Science Intern jobsData Specialist jobsJunior Data Engineer jobsSenior Data Scientist, Performance Marketing jobsSoftware Engineer, Machine Learning jobsData Analyst Intern jobsSr Data Engineer jobsBI Analyst jobsSoftware Engineer II jobsData Analyst II jobsData Engineering Manager jobs
Snowflake jobsLinux jobsEconomics jobsHadoop jobsJavaScript jobsOpen Source jobsPhysics jobsComputer Vision jobsMLOps jobsAirflow jobsKafka jobsRDBMS jobsBanking jobsNoSQL jobsGoogle Cloud jobsData Warehousing jobsScala jobsR&D jobsKPIs jobsData warehouse jobsGitHub jobsScikit-learn jobsOracle jobsPostgreSQL jobsCX jobs
Classification jobsStreaming jobsSAS jobsTerraform jobsLooker jobsScrum jobsDistributed Systems jobsPandas jobsData Mining jobsPySpark jobsBigQuery jobsRobotics jobsJenkins jobsJira jobsIndustrial jobsRedshift jobsReact jobsdbt jobsUnstructured data jobsMicroservices jobsData strategy jobsE-commerce jobsMySQL jobsMatlab jobsNumPy jobs