Principle Domain Architect

Warwick, GB, CV34 6DA

Full Time Senior-level / Expert GBP 63K - 79K

National Grid

We are one of the world’s largest investor-owned energy companies, committed to delivering electricity and gas safely, reliably and efficiently to the customers and communities we serve.

View all jobs at National Grid

Apply now Apply later

Posted 20 hours ago

About us

Every day we deliver safe and secure energy to homes, communities, and businesses. We are there when people need us the most. We connect people to the energy they need for the lives they live. The pace of change in society and our industry is accelerating and our expertise and track record puts us in an unparalleled position to shape the sustainable future of our industry.

To be successful we must anticipate the needs of our customers, reducing the cost of energy delivery today and pioneering the flexible energy systems of tomorrow. This requires us to deliver on our promises and always look for new opportunities to grow, both ourselves and our business.

IT and Digital works in a harmonised partnership with the National Grid group of diverse energy businesses to deliver technology which revolutionises the way we operate. As we lead the charge towards a carbon-free future, our teams are embracing disruptive changes in our industry by working with Agile methodologies and adopting Digital mindsets to drive efficiency and bring new capabilities for our internal and external customers.

Our work here is critical. National Grid moves energy to millions of homes and businesses in the UK and US and the technology we utilise to complete that task is down to us. The successful applicant for this position will be an integral contributor towards this goal and we will support your professional development as part of our multi-cultural, customer-centric global team.

National Grid is hiring a Principal Domain Architect. This is a hybrid role open to our office locations in Waltham, MA, Brooklyn, NY and Syracuse, NY.

Job Purpose

As a Principal Domain Architect for AI Ops and Site Reliability Engineering, your primary objective is to design and oversee the implementation of complex systems that meet functional and non-functional requirements. You will play a key role in developing system design policies, standards, and innovation processes specific to AI Ops and SRE. Additionally, you will actively monitor emerging technologies and assess their potential impact on the organization. Your responsibilities will include driving the strategic vision for AI Ops and SRE within the domain, ensuring alignment among stakeholders and promoting a cohesive approach.

What you'll do

• Developing AI Ops and Site Reliability Engineering (SRE) Strategies: As a Principal Cloud Domain Architect, your primary responsibility is to develop comprehensive strategies and architectures for implementing AI Ops and SRE practices within the data center and cloud domain. This involves understanding business requirements, assessing technical capabilities, and identifying areas where AI and automation can be leveraged to enhance reliability, performance, and operational efficiency.
• Designing Cloud Architecture Solutions: You will be responsible for designing cloud and on-premise architecture solutions that integrate AI technologies and SRE principles into the existing cloud infrastructure. This includes designing scalable and resilient systems, implementing monitoring and alerting mechanisms, and ensuring high availability and fault tolerance in the cloud environment.
• Collaborating with Development and Operations Teams: As a Principal Architect, you will work closely with development and operations teams to provide technical guidance and ensure the successful implementation of AI Ops and SRE practices. This involves reviewing designs, providing recommendations, and promoting best practices for building and operating reliable and efficient cloud-based applications.
• Implementing AI-Driven Monitoring and Analytics: You will be responsible for implementing AI-driven monitoring and analytics solutions in the cloud domain. This includes leveraging machine learning and data analysis techniques to identify and predict system anomalies, performance bottlenecks, and potential failures. These insights help in proactively addressing issues and optimizing the performance of cloud-based systems.
• Establishing Incident Response and Resolution Processes: You will define and establish incident response and resolution processes aligned with SRE practices within the cloud and on-premises domain. This includes setting up incident management frameworks, defining escalation paths, and implementing effective incident response strategies to minimize downtime and ensure quick resolution in the cloud environment.
• Driving Continuous Improvement and Optimization: As a Principal Architect, you will drive continuous improvement and optimization efforts within the cloud domain. This involves analyzing system metrics, conducting root cause analysis, and implementing changes to optimize cloud performance, reliability, and efficiency. Automation and self-healing mechanisms are often employed to enhance system resilience and reduce manual intervention.

What you'll get

A competitive salary between £63,000 - £79,000 dependent on capability

As well as your base salary, you will receive a bonus of up to 15% of your salary for stretch performance and a competitive contributory pension scheme where we will double match your contribution to a maximum company contribution of 12%. You will also have access to a number of flexible benefits such as a share incentive plan, salary sacrifice car and technology schemes, support via employee assistance lines and matched charity giving to name a few.

About you

• Bachelor's degree in a relevant discipline, or an equivalent combination of education, training, and experience.
• Foster one-team culture with ownership, collaboration, and empathy across functions.
• Manage risks and communicate project status, issues, and risks clearly and timely to stakeholders.
• Experience with cloud platforms such as Azure preferred, Amazon Web Services (AWS), or Google Cloud Platform (GCP) is essential for managing and optimizing cloud-based infrastructure.
• Containerization and Orchestration: Proficiency in containerization technologies like Docker and container orchestration platforms like Kubernetes is important for deploying and managing containerized applications at scale.
• Infrastructure-as-Code (IaC): Knowledge of infrastructure-as-code tools such as Terraform or AWS CloudFormation is valuable for automating the provisioning and management of infrastructure resources.
• Monitoring and Observability: Familiarity with monitoring and observability tools like Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), or Splunk is crucial for monitoring system performance, analyzing logs, and troubleshooting issues.
• Continuous Integration and Continuous Deployment (CI/CD): Experience with CI/CD pipelines and related tools such as GitHub, GitLab CI/CD
• Configuration Management: Knowledge of configuration management tools like Ansible, Puppet, or Chef is valuable for managing and automating configuration changes across infrastructure and application environments.
• Proficiency in incident management tools like ServiceNow, PagerDuty, VictorOps, or ServiceNow, as well as collaboration platforms like Slack or Microsoft Teams, is essential for effective incident response and coordination.
• Understanding of networking concepts, protocols, and security best practices is important for managing network infrastructure, implementing secure access controls, and ensuring system and data protection.
• Scripting and Programming Languages: Familiarity with scripting languages like Python, Bash, or PowerShell, as well as programming languages like Java, Go, or Ruby, enables automation and customization of various tasks and workflows.
• Database Technologies: Knowledge MySQL, PostgreSQL, MongoDB, or Redis is valuable for managing and optimizing database systems and ensuring data integrity and availability.

More Information

The closing date for this vacancy is 23rd of June. However, we encourage candidates to submit their applications as early as possible and not to wait until the published closing date. National Grid’s recruitment periods can and may vary. We reserve the right to remove this advert or close it to further applications at any point during the recruitment process.

DE & I statement

At National Grid, we work towards the highest standards in everything we do, including how we support, value and develop our people. Our aim is to encourage and support employees to thrive and be the best they can be. We celebrate the difference people can bring into our organisation, and welcome and encourage applicants with diverse experiences and backgrounds, and offer flexible and tailored support, at home and in the office.
Our goal is to drive, develop and operate our business in a way that results in a more inclusive culture. All employment is decided on the basis of qualifications, the innovation from diverse teams & perspectives and business need. We are committed to building a workforce so we can represent the communities we serve and have a working environment in which each individual feels valued, respected, fairly treated, and able to reach their full potential.

#LI-RK1 #LI-HYBRID

#LI-HYBRID #LI-RK1

Apply now Apply later

Job stats: 0 0 0

Categories: Architecture Jobs Deep Learning Jobs

Tags: Agile Ansible Architecture AWS Azure CI/CD CloudFormation Data analysis Docker Elasticsearch ELK Engineering GCP GitHub GitLab Google Cloud Grafana Java Kibana Kubernetes Logstash Machine Learning MongoDB MySQL Pipelines PostgreSQL Puppet Python Ruby Security Splunk Terraform