Senior Machine Learning Engineer - Circuit and Fault Intelligence
San Francisco, CA
Gridware
With more than 90 million field hours and counting, Gridware is already detecting the small changes that lead to big problems.Together with utility companies, we’re preventing hazards from causing harm. We’re helping crews restore power quickly...Role OverviewGridware is creating cutting edge technology to increase hazard awareness on the electric distribution system. We are building the observability layer of a safer and more efficient grid.We are seeking an Machine Learning Engineer to lead the development of robust models and data pipelines for detecting and interpreting events from distributed sensors installed on electrical distribution infrastructure. You will support the full ML product lifecycle from model development and prototyping to building stable, fully supported, production systems for real-time event detection.You will collaborate with a diverse team of scientist and engineers to build the hardware, software, and the operational systems to deliver actionable information to utility operators.
Responsibilities
- Design, train, and deploy ML models for real-time and batch detection of events, anomalies, or faults from distributed sensor networks.
- Build end-to-end data and ML pipelines for sensor ingestion, preprocessing, feature extraction, and model inference.
- Collaborate with hardware teams, data engineers, and product managers to define ML system requirements.
- Work with distributed streaming systems (e.g., Apache Kafka, Spark Structured Streaming) for real-time data processing and inference.
- Develop tools and processes for continuous model evaluation, retraining, and performance monitoring (MLOps best practices).
- Lead the adoption of scalable frameworks for spatial-temporal and graph-based modeling of sensor systems.
- Mentor junior engineers and participate in architecture and design reviews.
Required Skills
- 5+ years of experience designing and deploying ML systems in production environments.
- Proficiency in Python and ML libraries (e.g., PyTorch, TensorFlow, scikit-learn, XGBoost).
- Strong background in time-series analysis, anomaly detection, or sensor fusion.
- Experience with real-time or distributed data systems: Spark, Kafka, Flink, or similar.
- Solid understanding of data engineering fundamentals, including ETL and batch/streaming processing.
- Experience deploying models via REST APIs or frameworks like MLflow, TorchServe, or FastAPI.
- Familiarity with cloud-native architectures (AWS, Azure, GCP) and containerization (Docker, Kubernetes).
Bonus Skills
- Experience with Graph Neural Networks (GNNs), spatial-temporal modeling, or edge ML.
- Exposure to sensor networks in domains like energy, industrial IoT, transportation, or environmental monitoring.
- Contributions to open-source ML or data systems.
BenefitsHealth, Dental & Vision (Gold and Platinum with some providers plans fully covered) Paid parental leave Alternating day off (every other Monday)“Off the Grid”, a two week per year paid break for all employees. Commuter allowance Company-paid training
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs Architecture AWS Azure Data pipelines Docker Engineering ETL FastAPI Flink GCP Industrial Kafka Kubernetes Machine Learning MLFlow ML models MLOps Model inference Open Source Pipelines Prototyping Python PyTorch Scikit-learn Spark Streaming TensorFlow XGBoost
Perks/benefits: Career development Parental leave Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.