Systems Validation Engineer
San Jose
Etched
Transformers etched into silicon. By burning the transformer architecture into our chips, we're creating the world's most powerful servers for transformer inference.About Etched
Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents.
Role Summary
We are seeking a highly skilled Systems Validation Engineer to take ownership of the full platform server product and ensure seamless integration, functionality, and performance. If you are passionate about AI, hardware, and systems architecture, and thrive in a fast-paced, hands-on environment, we’d love to hear from you.
Key Responsibilities
Full Platform Ownership
Own the end-to-end validation of the full platform server product, including hardware, firmware, and software integration.
Develop and execute a comprehensive test plan to validate server functionality, performance, and reliability.
Sohu Integration
Lead the integration of the Sohu platform into the server, ensuring full support for side-band communication, telemetry, and BMC (Baseboard Management Controller) integration.
Ensure that all system-level telemetry and communication channels are functioning correctly and efficiently.
Debugging and Root Cause Analysis
Proactively identify, diagnose, and resolve critical system issues, driving root cause analysis to the component level.
Work closely with hardware, firmware, and software teams to address and resolve complex system-level issues.
System-Level Expertise
Ensure proper functioning of all server subsystems, including power, thermal, storage, networking, and PCIe interfaces.
Evaluate and optimize system performance under various workloads and stress conditions.
Provide expert guidance on server architecture and component-level interactions.
Command Line and Linux Expertise
Utilize Linux command line extensively for system validation, debugging, and monitoring.
Develop and maintain automated test scripts, diagnostic tools, and Python scripts.
Experience with IPMI protocol direct calls with BMC.
Cross-Functional Collaboration
Collaborate with design, firmware, software, and manufacturing teams to identify and resolve issues early in the development cycle.
Provide feedback to improve product design and manufacturing processes.
You may be a good fit if you have
Bachelor’s or Master’s degree in Electrical Engineering, Computer Engineering, or a related field.
5+ years of experience in systems validation, hardware integration, or related roles.
Strong understanding of server architecture and subsystems (power, thermal, storage, networking, PCIe, etc.).
Experience with side-band communication protocols, telemetry, and BMC integration.
Proficiency with Linux command line and debugging tools.
Strong problem-solving skills with the ability to debug down to the component level.
Excellent understanding of electronic components and schematics; ability to work closely with hardware design teams.
Experience in writing and executing test plans and automating validation processes.
Strong communication and cross-functional collaboration skills.
Benefits
Full medical, dental, and vision packages, with 100% of premium covered
Housing subsidy of $2,000/month for those living within walking distance of the office
Daily lunch and dinner in our office
Relocation support for those moving to West San Jose
How we’re different
Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.
We are a fully in-person team in West San Jose, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture Engineering Linux Python Research Transformers
Perks/benefits: Health care Relocation support
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.