Staff Machine Learning Engineer

San Francisco, CA or Remote (USA)

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time Senior-level / Expert USD 220K - 280K

Fieldguide

The Fieldguide AI Platform for Advisory & Audit provides an engagement automation platform for advisory and audit firms to save time, increase margins, and improve client satisfaction.

View all jobs at Fieldguide

Apply now Apply later

Posted 9 hours ago

About Us:

Fieldguide is establishing a new state of trust for global commerce and capital markets through automating and streamlining the work of assurance and audit practitioners specifically within cybersecurity, privacy, and ESG (Environmental, Social, Governance). Put simply, we build software for the people who enable trust between businesses.

We’re based in San Francisco, CA, but built as a remote-first company that enables you to do your best work from anywhere. We're backed by top investors including Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, DNX Ventures, Global Founders Capital, Justin Kan, Elad Gil, and more.

We value diversity — in backgrounds and in experiences. We need people from all backgrounds and walks of life to help build the future of audit and advisory. Fieldguide’s team is inclusive, driven, humble and supportive. We are deliberate and self-reflective about the kind of team and culture that we are building, seeking teammates that are not only strong in their own aptitudes but care deeply about supporting each other's growth.

As an early stage start-up employee, you’ll have the opportunity to build out the future of business trust. We make audit practitioners’ lives easier by eliminating up to 50% of their work and giving them better work-life balance. If you share our values and enthusiasm for building a great culture and product, you will find a home at Fieldguide.

About the Role

As a Staff Machine Learning Engineer at Fieldguide, you will lead the development of next-generation AI-driven features on our platform, transforming the audit and advisory industry through cutting-edge generative AI solutions. You’ll focus on applying advanced Machine Learning (ML) and Large Language Models (LLMs) to solve complex problems for our customers, while guiding the technical direction of our ML team in a high-growth startup environment. This role is both strategic and hands-on – you will set best practices for our generative AI efforts and also dive into coding and architecture as needed to drive critical projects from concept to production.

In this role, you will be the go-to expert for generative AI at Fieldguide. You’ll establish standards for prompt engineering, context management, and model evaluation, ensuring our use of LLMs is effective, safe, and scalable. As a Staff MLE, you will also act as a multiplier for the entire engineering team: reviewing architectures for AI features, mentoring other engineers, and fostering a culture of excellence in ML. You’ll collaborate closely with cross-functional stakeholders – from product managers and designers to even high-profile clients – to translate business needs into technical solutions and to communicate how our AI-driven approach creates value. This is a unique opportunity to shape the future of Fieldguide’s AI capabilities and establish yourself as a technical leader in the burgeoning field of generative AI.

What You’ll Do

Architect Generative AI Solutions: Design and oversee the architecture of systems that leverage LLMs and retrieval-augmented generation (RAG) techniques. You will make key decisions on how we integrate LLMs with our existing platform and data stores, including building agent-based frameworks where LLMs interact with tools and knowledge bases (e.g. creating AI “co-pilots” for auditors). You’ll conduct rigorous architectural reviews and ensure our designs meet high standards for scalability, security, and reliability.
Establish Prompt Engineering Best Practices: Develop and codify best practices for prompt engineering and context management in our AI applications. You will guide the team in crafting effective prompts, choosing model parameters, and managing conversation context to optimize LLM performance. This includes building internal libraries or templates for prompts and educating engineers on how to avoid common failure modes. By setting this technical quality bar, you’ll ensure consistency and excellence in how we build GenAI features.
Develop Evaluation Frameworks: Create and implement frameworks to evaluate generative AI outputs for quality, accuracy, bias, and safety. You will define ML performance metrics specific to generative models (e.g. factual correctness rates, relevance scores, user feedback loops) and possibly leverage tools or develop custom evaluators (such as automated prompts or human-in-the-loop reviews). These evaluation strategies will inform model improvements and help establish standards for GenAI system evaluation across the company.
Lead High-Impact ML Projects: Take ownership of our most critical AI projects from ideation to production. You will collaborate with stakeholders to identify high-impact opportunities where AI can solve business problems, then roadmap solutions and drive their execution. This could range from developing an NLP feature that auto-identifies risks in audit documents, to launching a new GPT-based analytics module for our platform. You’ll coordinate across product and engineering teams to deliver these initiatives and clearly communicate their results and business impact.
Technical Leadership & Mentorship: Serve as a technical leader and mentor within the engineering org. You will guide more junior ML Engineers through code reviews, design discussions, and one-on-one mentorship, helping them level up their skills. You might lead an internal “ML Guild” or chapter, hosting knowledge-sharing sessions on topics like prompt tuning or vector databases. By instilling best practices and providing hands-on guidance, you’ll raise the technical proficiency of the entire team.
Cross-Team and External Collaboration: Work closely with cross-functional teams and occasionally directly with customers to ensure our AI solutions meet real-world needs. You’ll act as an expert liaison for high-profile or demanding clients when deep technical expertise is required to shape requirements or explain AI results. In these settings, you should be comfortable being “the most technical person in the room” and able to communicate complex ML concepts in a clear, business-aligned manner. Your ability to earn trust and align expectations with both internal and external stakeholders will be key.
Innovation and Thought Leadership: Stay at the forefront of ML and GenAI advancements, and bring new ideas into Fieldguide. You’ll continuously research emerging techniques in LLMs, from fine-tuning methods to new open-source models, and assess how we can leverage them. You may also contribute to the broader tech community through publications, blog posts, or speaking at conferences, representing Fieldguide’s technical work externally. While not required, we highly value this kind of thought leadership as it reinforces our credibility in the AI space.
ML Ops & Future Model Development: In addition to immediate project work, you will help shape our longer-term ML infrastructure. This includes guiding how we productionize models (monitoring, CI/CD for ML, data pipelines) and preparing for future needs such as custom model training. As we evolve to possibly fine-tune or train domain-specific models, you’ll provide direction on the initial pipeline setup and best practices. Essentially, you’ll make sure our ML systems and team processes scale effectively as usage grows (data flywheels, feedback loops, etc., as per our product growth).

Who You Are

8+ years of experience in applied machine learning, software engineering, or related fields, with around 3+ years in technical leadership roles (such as leading project teams or architecting major systems). You have a track record of delivering significant ML projects that drove business or industry impact.
Deep expertise in Generative AI: Extensive experience working with generative AI technologies. You have hands-on knowledge of modern LLMs (GPT-style models, etc.), including deploying them in production and optimizing their performance. Experience with other NLP techniques is a plus.
Prompt Engineering & LLM Evaluation: Strong familiarity with prompt engineering concepts and strategies for large language models. You understand how to craft and refine prompts to achieve desired outcomes, and how to manage context and memory in LLM applications. Additionally, you have experience evaluating AI models – whether through quantitative metrics, user studies, or tools – and using those evaluations to iterate on solutions.
Strong ML Engineering and Coding Skills: Fluency in Python and the ML/PyData ecosystem (NumPy, pandas, scikit-learn, TensorFlow/PyTorch, etc.). You write clean, efficient code and are experienced in building and maintaining data pipelines and ETL processes for ML. You are comfortable with version control (Git) and CI/CD pipelines for deploying ML models.
Architectural Design & Systems Thinking: Demonstrated ability to design complex software systems. You’ve worked with cloud-based architectures (e.g. using AWS or similar) and understand how to integrate ML components into larger products. Experience with RAG architectures (combining LLMs with vector databases or search indices) and building agent-based systems is highly valuable. You approach design with scalability, maintainability, and security in mind, and you’re adept at conducting technical reviews and providing guidance to ensure high-quality delivery.
Leadership & Mentorship: Proven experience mentoring engineers or leading teams. You elevate those around you – for example, by improving coding standards, introducing best practices, or organizing knowledge-sharing forums. You’re able to influence without authority, earning the respect of your peers through your expertise and collaborative approach.
Excellent Communication Skills: Exceptional ability to communicate technical concepts to diverse audiences. You can translate between technical jargon and business objectives effortlessly, whether it’s writing design docs, giving an internal tech talk, or discussing project scope with an executive or customer. You are a patient listener and effective explainer, which makes you a key bridge between the engineering team and other stakeholders.
Startup Mindset: Ability to work in a fast-paced, evolving startup environment. You are proactive, adaptable, and comfortable with ambiguity. You take ownership of challenges, exhibit a bias for action, and can balance strategic thinking with getting hands-on when necessary (no task is too small or too big). You also understand the stage-appropriate needs of a growing company – knowing when to build quick prototypes versus scalable solutions.

Bonus Points

Domain Knowledge: Background in or exposure to the audit, advisory, or fintech industries. Understanding the domain can help you tailor AI solutions to our clients’ needs more effectively.
MLOps & Data Engineering: Experience setting up ML pipelines, model deployment workflows, and monitoring in production (MLflow, Kubernetes, etc.). Any experience with data lakehouses or integrating with data warehouses for ML is a plus.
Vector Databases/Search: Familiarity with technologies for semantic search and retrieval (e.g. Pinecone, Elasticsearch, FAISS). Experience building RAG systems or integrating knowledge bases with LLMs would be useful.
AI Safety & Ethics: Experience with AI safety research, bias mitigation techniques, or responsible AI frameworks to ensure our generative models uphold trust and compliance (important in audit/financial contexts).
Publications or Open Source: Contributions to research publications or popular open-source projects in ML/AI. A demonstrated interest in thought leadership (blog posts, conference talks) in the generative AI space.
Community Building: Experience in building internal communities of practice (e.g. leading an AI guild or chapter). This could mean you’ve set up mentorship programs or led tech talks in previous organizations, indicating you’ll enrich Fieldguide’s engineering culture.