Full Stack AI/ML Engineer
Palo Alto, California, United States
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Key
Unlock new networking and career opportunities with Key, the leading AI-powered professional networking platform. Build and manage communities, host and join exclusive events, and create meaningful connections to supercharge your growth all in...About Us
This role is for one our client. Our client is one of the worlds fastest-growing AI companies, pushing the boundaries of AI-assisted software development. Their mission is to empower the next generation of AI systems to reason about and work with real-world software repositories. You'll be working at the intersection of software engineering, open-source ecosystems, and frontier AI.
Project Overview
Our client is building high-quality evaluation and training datasets to improve how Large Language Models (LLMs) interact with realistic software engineering tasks. A key focus of this project is curating verifiable software engineering challenges from public GitHub repository histories using a human-in-the-loop process.
Why This Role Is Unique
- Collaborate directly with AI researchers shaping the future of AI-powered software development.
- Work with high-impact open-source projects and evaluate how LLMs perform on real bugs, issues, and developer tasks.
- Influence dataset design that will train and benchmark next-gen LLMs.
Role Overview What Does a Typical Day Look Like?
- Review and compare 34 model-generated code responses per task using a structured ranking system.
- Evaluate code diffs for correctness, code quality, style, and efficiency.
- Provide clear, detailed rationales explaining the reasoning behind each ranking decision.
- Maintain high consistency and objectivity across evaluations.
- Collaborate with the team to identify edge cases and ambiguities in model behavior.
Required Skills & Experience
- 5+ years of software engineering experience, including 2+ continuous years at a top-tier product company (e.g., Stripe,Netflix,Datadog, Dropbox, Shopify, PayPal, IBM Research).
- Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools.
- Deep understanding of software architecture, design, development, debugging, and code quality/review assessment.
- Proven ability to review code diffs and evaluate correctness, maintainability, and efficiency.
- Excellent oral and written communication skills for clear, structured evaluation rationales.
Bonus Points
- Experience in LLM research, developer agents, or AI evaluation projects.
- Background in building or scaling developer tools or automation systems.
Engagement Details
- Commitment: ~10-20 hours/week (partial PST overlap required)
- Type: Contractor (no medical/paid leave)
- Duration: 1 month - starting next week; potential extensions based on performance and fit.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture Engineering GitHub LLMs Machine Learning Open Source Research
Perks/benefits: Medical leave
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.