Systems Validation Engineer

San Jose

Etched

Transformers etched into silicon. By burning the transformer architecture into our chips, we're creating the world's most powerful servers for transformer inference.

View all jobs at Etched

Apply now Apply later

About Etched

Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents.

Role Summary

We are seeking a highly skilled Systems Validation Engineer to take ownership of the full platform server product and ensure seamless integration, functionality, and performance. If you are passionate about AI, hardware, and systems architecture, and thrive in a fast-paced, hands-on environment, we’d love to hear from you.

Key Responsibilities

  • Full Platform Ownership

    • Own the end-to-end validation of the full platform server product, including hardware, firmware, and software integration.

    • Develop and execute a comprehensive test plan to validate server functionality, performance, and reliability.

  • Sohu Integration

    • Lead the integration of the Sohu platform into the server, ensuring full support for side-band communication, telemetry, and BMC (Baseboard Management Controller) integration.

    • Ensure that all system-level telemetry and communication channels are functioning correctly and efficiently.

  • Debugging and Root Cause Analysis

    • Proactively identify, diagnose, and resolve critical system issues, driving root cause analysis to the component level.

    • Work closely with hardware, firmware, and software teams to address and resolve complex system-level issues.

  • System-Level Expertise

    • Ensure proper functioning of all server subsystems, including power, thermal, storage, networking, and PCIe interfaces.

    • Evaluate and optimize system performance under various workloads and stress conditions.

    • Provide expert guidance on server architecture and component-level interactions.

  • Command Line and Linux Expertise

    • Utilize Linux command line extensively for system validation, debugging, and monitoring.

    • Develop and maintain automated test scripts, diagnostic tools, and Python scripts.

    • Experience with IPMI protocol direct calls with BMC.

  • Cross-Functional Collaboration

    • Collaborate with design, firmware, software, and manufacturing teams to identify and resolve issues early in the development cycle.

    • Provide feedback to improve product design and manufacturing processes.

You may be a good fit if you have

  • Bachelor’s or Master’s degree in Electrical Engineering, Computer Engineering, or a related field.

  • 5+ years of experience in systems validation, hardware integration, or related roles.

  • Strong understanding of server architecture and subsystems (power, thermal, storage, networking, PCIe, etc.).

  • Experience with side-band communication protocols, telemetry, and BMC integration.

  • Proficiency with Linux command line and debugging tools.

  • Strong problem-solving skills with the ability to debug down to the component level.

  • Excellent understanding of electronic components and schematics; ability to work closely with hardware design teams.

  • Experience in writing and executing test plans and automating validation processes.

  • Strong communication and cross-functional collaboration skills.

Benefits

  • Full medical, dental, and vision packages, with 100% of premium covered

  • Housing subsidy of $2,000/month for those living within walking distance of the office

  • Daily lunch and dinner in our office

  • Relocation support for those moving to West San Jose

How we’re different

Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

We are a fully in-person team in West San Jose, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Engineering Jobs

Tags: Architecture Engineering Linux Python Research Transformers

Perks/benefits: Health care Relocation support

Region: North America
Country: United States

More jobs like this