Principal Site Reliability Engineering Lead

Oakland, CA, US, 94612

Pacific Gas and Electric Company

Pacific Gas and Electric Company (PG&E) provides natural gas and electric service to residential and business customers in northern and central California.

View all jobs at Pacific Gas and Electric Company

Apply now Apply later

Requisition ID # 165427 

Job Category: Information Technology 

Job Level: Manager/Principal

Business Unit: Information Technology

Work Type: Hybrid

Job Location: Oakland

 

 

Department Overview

The Data Solutions Architecture Team at Pacific Gas & Electric Company is responsible for driving long-term, enterprise-wide data solutions, target state architecture, and overall excellence with the application of data, analytics and information to critical business challenges and opportunities. This team is chartered to develop the strategy, roadmap, and accompanying standards that will enable better use of data and information and to develop analytics maturity at PG&E.

 

Position Summary

The Digital Utility runs on data and information. At PG&E, we have many teams building data products that need support, and our operations teams are on the hook for ensuring reliability and support across all data products. The Principal Site Reliability Engineering Lead fills a critical role in empowering our operations teams to do their best work.

 

The Principal Site Reliability Engineering Lead will drive our operations strategy in DA&I, working with operations teams with implementing best practices, mentoring junior engineers, driving automation, and building a continuously improving operations practice. You will work with operations management and operations engineers to create scalable DevOps practices for key data platforms at DA&I, notably Palantir Foundry, Snowflake, and Informatica. You will also get hands-on with operational problems, and building out operations tooling for the team.

 

We strive for a team that will make a difference in the new PG&E. As Site Reliability Engineering Lead, you will have a direct impact on the day-to-day life of data solutions, delivery, and affect the Safety of California. You will be collaborating with other technical leaders and Executive Leadership to help reshape a first-class operations team, with high levels of reliability for the data products we, and our customers, rely on the most. As Site Reliability Engineering Lead, you will work closely with supportive Operations management, a talented team in need of your guidance, and an organization looking to you to support their key products.

 

The Principal Site Reliability Engineering Lead will report to the Senior Manager of Data Solutions Architecture in the Data Analytics & Insights department of Information Technology, and work closely with the Data Ecosystem Operations team.


PG&E is providing the salary range that the company in good faith believes it might pay for this position at the time of the job posting. This compensation range is specific to the locality of the job. The actual salary paid to an individual will be based on multiple factors, including, but not limited to, specific skills, education, licenses or certifications, experience, market value, geographic location, and internal equity. We would not anticipate that the individual hired into this role would land at or near the top half of the range described below, but the decision will be dependent on the facts and circumstances of each case.


A reasonable salary range is:

Bay Area Minimum: $155,000.00

Bay Area Maximum: $265,000.00

 

Job Responsibilities

  • Technical Support and Collaboration: Provide applications engineering support to product teams. Collaborate with product teams, support teams, and customers on shared goals, cross-team projects, and new initiatives.
  • Continuous Improvement and Reliability Practices: Strive for continuous improvement in processes and reliability practices. Develop and evolve improved operations workflows.
  • Leadership and Mentoring: Show teams how to improve quality and eliminate waste by implementing improvements with them.
  • Hands-on Troubleshooting: As a member of the Operations team, you will join them on-call and be available to help with escalated issues, or issues requiring your additional experience and steady hand.
  • Operations tooling: You will build tools for improved operational workflows in collaboration with, and leading, members of the Operations team.
  • Efficiency: Identify wasteful processes and procedures. Work with teams to streamline and automate tasks.
  • Performance Monitoring and Improvement: Monitor, measure, and enhance the performance and state-awareness of systems. Identify and drive improvements in infrastructure and system reliability, performance, and monitoring.
  • Root Cause Analysis and Investigation: Lead investigations into repetitive damage and failure rates, utilizing root cause analysis techniques. Implement corrective and preventive actions based on findings.
  • Reliability and Capital Planning: Participate in annual and long-term reliability planning, ensuring alignment with operational objectives. Contribute to the development and execution of life cycle asset management processes.
  • Architecture: Own the Information Architecture and related Technical Architecture for the Operations sub-domain of the Data & Information Architecture domain.
  • Technology Life Cycle: Develop and execute strategies to introduce new capabilities needed, evolve and mature existing capabilities, and retire capabilities at their end of life.
  • Documentation and Governance: Develop and maintain architectural guidance documents and artifacts, practices and procedures, and governance to support the above.
  • Strategic Planning: Support technology strategy, planning, and road mapping activities across IT and at the enterprise level.
  • Data Analysis and Predictive Modeling: Perform statistical data analysis. Utilize data insights for capacity planning, demand forecasting, and identifying performance bottlenecks.

 

Qualifications

Minimum:

  • Bachelors Degree in Computer Science or job-related discipline or equivalent experience
  • 7 years of relevant work experience in Information Technology, Data Management, Business Intelligence, and Analytics, to include experience in both IT and line of business departments


Desired:

  • Experience working directly with line of business stakeholders demonstrating job-related skills.
  • 5 or more years experience with Site Reliability Engineering/DevOps practices.
  • Experience with analytics and data management principles such as: data acquisition and modeling, data warehousing, business intelligence, metadata management, master data management, advanced analytics and data science, “big data” techniques, public/hybrid/private cloud data management and analytics services data security, and data and analytics governance.
  • Ability to achieve a deep understanding of line of business strategies, priorities, needs, and current capabilities.
  • Ability to work collaboratively to engage and influence business and IT stakeholders, senior leadership and external partners.
  • Customer management and negotiation skills that enable the ability to mediate opposing viewpoints and articulate the advantages of a preferred solution.
  • Excellent written and oral communication skills across all levels; ability to communicate complex technical concepts to leaders, business sponsors and stakeholders in clear, concise language that inspires confidence and earns trust.
  • Strong leadership skills in the technology and operations domain and a high level of drive, initiative and assertiveness.
  • Extensive experience with SRE/DevOps practices and tooling
  • At least 3 years experience developing operations automation tools in Python or another high level scripting language commonly used on Unix systems.
  • Familiarity with at least two or more of: Scaled Agile, Scrum development methodology, DevOps/DevSecOps, LEAN, Six Sigma or ITIL practices.
  • Experience with any of the following: Data Architecture, Airflow, Palantir Foundry, Informatica, Spark, Snowflake, Teradata, and other database and BI technologies, data access languages such as SQL, SAS, R, Python, Scala, etc.
  • Experience working in the Utility Industry and a working knowledge of Utility concepts and challenges a plus.

Apply now Apply later
Job stats:  0  0  0

Tags: Agile Airflow Architecture Big Data Business Intelligence Computer Science Data analysis Data Analytics Data management Data Warehousing DevOps Engineering Informatica ITIL Predictive modeling Python R SAS Scala Scrum Security Snowflake Spark SQL Statistics Teradata

Perks/benefits: Equity / stock options Team events

Region: North America
Country: United States

More jobs like this