AI Engineer (Vision) - Enterprise

San Francisco & Palo Alto, CA

Apply now Apply later

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.

Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity.

We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important.

All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

You will work directly with our enterprise customers, owning the strategy and execution of Vision or Video Understanding integrations. You’ll act as a specialized AI startup CTO, focusing on vision-driven technologies, leading high-stakes projects, and delivering measurable impact. If you excel at combining deep technical expertise with customer-focused innovation, particularly in the Vision domain, we’d love to hear from you. Your day-to-day work may include:

  • Designing and building end-to-end AI solutions, from understanding customer pain points to scoping product specs and deploying VLM-powered vision interfaces.
  • Benchmarking vision models, writing evaluations, or analyzing performance to identify weaknesses in image recognition, object detection, or visual understanding.
  • Improving model performance through system prompt tuning and fine-tuning VLMs.
  • Working with multimodal teams to generate data for research efforts.
  • Generating synthetic data or kicking off campaigns to generate human data with the help of our AI tutors.
  • Analyzing vision request logs, image data, or video inputs to enhance system accuracy and user experience.
  • Building internal tools to automate VLM workflows, such as image processing pipelines or real-time visual analysis.

Focus

  • Deep expertise in working with vision or video understanding models, delivering robust and scalable solutions.
  • Ability to handle ambiguity, adapt to evolving requirements, and prioritize effectively in a fast-paced startup setting.
  • Exceptional communication skills to clarify specific requirements with customers and drive projects to successful completion.
  • Emphasis on designing, implementing, and maintaining efficient architectures, including image recognition, object detection, and real-time visual processing.
  • Proficiency in managing complex codebases and optimizing vision data pipelines for high-throughput, low-latency performance.
  • Define critical benchmarks for Vision or Video Understanding performance: Establish key performance benchmarks tailored to enterprise vision use cases, such as image classification accuracy, object detection precision, and real-time latency, reflecting customer data distributions.
  • Initiate human data collection: Design and manage campaigns to acquire high-quality image and video data from diverse enterprise contexts, supporting model training and validation.
  • Drive Vision model integration with enterprise partners: Collaborate with cross-functional teams to integrate Vision capabilities into enterprise workflows, enabling seamless adoption in areas like automated quality control, surveillance, and augmented reality.

Requirements

An ideal candidate meets at least the following:

  • Strong engineering background.
  • Experience interfacing between technical and customer-facing teams.
  • Excellent verbal and written communication skills in English.
  • Ability to translate business and vision-specific product needs into technical solutions.
  • Proven experience implementing VLM or machine learning products, including APIs, back-end, and front-end vision interfaces.
  • Strong proficiency in Python and/or TypeScript.
  • Solid understanding of HTTP protocol and real-time communication protocols (e.g., WebRTC for video streaming).

Standout Experiences

Candidates may distinguish themselves with:

  • Building evaluations for Vision capabilities, such as image recognition accuracy or robustness of object detection.
  • Demonstrating expertise in machine learning fundamentals, including vision model evaluation, training, or fine-tuning.
  • Deploying Vision models to production, optimizing for low-latency and high-reliability environments.
  • Writing developer documentation or creating vision-specific SDKs.
  • Working with large-scale image or video datasets, optimizing vision processing pipelines, or scaling systems for enterprise-grade workloads.
  • Using infrastructure tools like Pulumi or Terraform for deploying Vision systems.

Interview Process

After submitting your application, our team reviews your CV and Statement of Exceptional Work. If selected, you’ll be invited to a 15-minute technical phone interview where we’ll discuss your background in VLMs/LLMs. Successful candidates proceed to the main process:

  • 15 min Technical Screen
  • 2x 45 min Coding Interview (focused on Vision models or related challenges)

The Statement of Exceptional Work is a critical factor in our evaluation.

We aim to complete the main process within one week. All applications are reviewed by our technical team, not recruiters. Interviews are conducted via Google Meet or in-person.

Benefits

  • Competitive cash-based compensation
  • xAI equity
  • Private health and dental insurance
  • Unlimited time off subject to prior approval

Annual Salary Range

$180,000 - $440,000 USD



xAI is an equal opportunity employer and does not unlawfully discriminate based on race, color, religion, ethnicity, ancestry, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, disability, medical conditions, genetic information, marital status, military or veteran status, or any other applicable legally protected characteristics. 

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all applicable federal, state, and local laws, including the San Francisco Fair Chance Ordinance, Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. 

For Los Angeles County (unincorporated) Candidates:

xAI reasonably believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: 

  • Access to information technology systems and confidential information, including proprietary and trade secret information, and/or user data;
  • Interacting with internal and/or external clients and colleagues; and
  • Exercising sound judgment.

California Consumer Privacy Act (CCPA) Notice

Apply now Apply later
Job stats:  0  0  0

Tags: APIs Architecture Classification Data pipelines Engineering Excel LLMs Machine Learning Model training Pipelines Privacy Python Research Streaming Terraform TypeScript

Perks/benefits: Career development Competitive pay Equity / stock options Health care Insurance Startup environment Unlimited paid time off

Region: North America
Country: United States

More jobs like this