Senior Site Reliability Engineer
Spain (Remote)
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Nextiva
Nextiva unites every conversation along the entire customer journey. One business communication platform for voice video, chat, social media, and email.Redefine the future of customer experiences. One conversation at a time.
We’re changing the game with a first-of-its-kind, conversation-centric platform that unifies team collaboration and customer experience in one place. Powered by AI, built by amazing humans.
Our culture is forward-thinking, customer-obsessed and built on an unwavering belief that connection fuels business and life; connections to our customers with our signature Amazing Service®, our products and services, and most importantly, each other. Since 2008, 100,000+ companies and 1M+ users rely on Nextiva for customer and team communication.
If you’re ready to collaborate and create with amazing people, let your personality shine and be on the frontlines of helping businesses deliver amazing experiences, you’re in the right place.
Build Amazing - Deliver Amazing - Live Amazing - Be Amazing
We are looking for a Senior Site Reliability Engineer (SRE) to join our Middleware Engineering team based in Bangalore. In this highly dynamic environment, you'll be responsible for supporting and scaling our Kafka and Elasticsearch infrastructure - core systems that power our SaaS platform.
We're looking for someone who thrives on automation, embraces AI-driven observability, and is eager to learn and adopt new technologies quickly. You'll not only respond to production issues, but proactively build intelligent, resilient systems to prevent them.
If you enjoy owning systems end to end, writing clean automation, and working in a fast-moving team that values innovation, this role is for you.
Key Responsibilities
- Triage, troubleshoot, and resolve complex production issues involving Kafka and Elasticsearch
- Design and build automated monitoring, alerting, and logging systems - leveraging AI/ML techniques where possible
- Write tools and infrastructure software to support self-healing, auto-scaling, and incident prevention
- Automate system administration tasks - from patching and upgrades to config and deployment workflows
- Use and manage GitHub extensively for infrastructure-as-code, release management, and collaboration
- Partner with development, QA, and performance teams to ensure middleware systems are production-ready
- Participate in the on-call rotation and continuously improve incident response and resolution playbooks
- Mentor junior engineers and contribute to a culture of automation, learning, and accountability
- Lead large-scale reliability and observability projects in collaboration with global teams
Qualifications
- Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
- Fluent English communication skills (spoken and written)
Core Competencies
- 6+ years of experience in software development, automation, or infrastructure engineering
- Deep experience with Kafka and/or Elasticsearch in production environments
- Strong Linux systems expertise and 6+ years managing Linux-based environments
- Hands-on experience with cloud platforms - GCP and/or AWS required
- Proficient in scripting languages like Python, Bash, etc
- Automation-first mindset - deep experience with Ansible, Terraform, Jenkins
- Expert-level understanding of Git and GitHub workflows for CI/CD and infrastructure-as-code
- Proficient with container tools (Docker) and orchestrators (Kubernetes)
- Strong understanding of SRE principles - SLAs/SLOs, alerting, observability, and incident management
- Experience with SQL, caching systems (e.g., Redis), and troubleshooting distributed systems
- Quick learner with a strong curiosity for new tools, frameworks, and AI/ML use cases in operations
Nice to Have
- Observability Tools: Datadog, Splunk, Kibana, Opsgenie
- Programming: Java/Spring, JavaScript/React
- Middleware: RabbitMQ, Tomcat
- Experience with AI/ML-based anomaly detection, AIOps platforms, and LLM integrations for infrastructure
- Azure cloud experience (nice to have)
Why Join Us Why Join Us
- Shape the future of middleware reliability using AI and intelligent automation
- Work with a global team that values initiative, innovation, and ownership
- Grow in a fast-paced environment where learning and experimentation are part of the culture
- Drive technical leadership, mentor others, and make a meaningful platform-wide impact
How to Apply
If you're passionate about automation, AIOps, MLOps, and scalable middleware infrastructure, and you're ready to move fast, learn constantly, and own critical systems - we'd love to connect with you.
Nextiva DNA (Core Competencies)
Nextiva’s most successful team members share common traits and behaviors:
- Drives Results: Action-oriented with a passion for solving problems. They bring clarity and simplicity to ambiguous situations, challenge the status quo, and ask what can be done differently. They lead and drive change, celebrating success to build more success.
- Critical Thinker: Understands the "why" and identifies key drivers, learning from the past. They are fact-based and data-driven, forward-thinking, and see problems a few steps ahead. They provide options, recommendations, and actions, understanding risks and dependencies.
- Right Attitude: They are team-oriented, collaborative, competitive, and hate losing. They are resilient, able to bounce back from setbacks, zoom in and out, and get in the trenches to help solve important problems. They cultivate a culture of service, learning, support, and respect, caring for customers and teams.
Total Rewards
Our Total Rewards offerings are designed to allow Nexties to take care of themselves and their families so they can do their best.
Our compensation packages are tailored to each role and candidate's qualifications. We consider a wide range of factors, including skills, experience, training, and certifications, when determining compensation. We aim to offer competitive salaries or wages that reflect the value you bring to our team. Depending on the position, compensation may include base salary, incentives, or bonuses.
- Health 🍏 - Comprehensive medical coverage, including dental care
- Insurance 💼 - Life insurance, covering life and disability
- Work-Life Balance ⚖️ - PTO and Paid Sick time as per CBA, paid parental leave
- Financial Security 💰 - Private pension plan available
- Wellness 🤸 - Employee Assistance Program and comprehensive wellness initiatives
- Growth 🌱 - Access to ongoing learning and development opportunities and career advancement
At Nextiva, we're committed to supporting our employees' health, well-being, and professional growth. Join us and build a rewarding career!
#LI-SC1 #LI-REMOTE
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AIOps Ansible AWS Azure CI/CD Computer Science CX Distributed Systems Docker Elasticsearch Engineering GCP Git GitHub Java JavaScript Jenkins Kafka Kibana Kubernetes Linux LLMs Machine Learning MLOps Python RabbitMQ React Security Splunk SQL Terraform
Perks/benefits: Career development Competitive pay Equity / stock options Health care Insurance Medical leave Parental leave Salary bonus Wellness
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.