Site Reliability Engineer

Remote US and Canada

Replicant

Experience the only conversational AI platform that autonomously resolves up to 80% of customer service interactions.

View all jobs at Replicant

Replicant was founded on the belief that machines are ready to have useful, complex conversations that will transform the way they interact with the world, starting with customer service.
As the leader in Contact Center Automation, Replicant helps companies automate their most common customer service calls while empowering agents to focus on more complex and nuanced customer challenges. Replicant's AI platform allows consumers to engage in natural conversations across voice, messaging and other digital channels to resolve their customer support issues, without the wait, 24/7. We are now leading the way in using Large Language Models (LLMs) to transform customer service- again. 
If you're excited by AI, ChatGPT, LLMs and want to make an impact with other great technologists and strong go-to-market leaders, then look no further. We've grown our team by 3x, increased revenue by 4x, and were named a top enterprise AI company by The Information. We currently serve Fortune 500 customers, run millions of AI calls per month in production, and are increasing our footprint globally.
We're searching for a skilled Site Reliability Engineer to play a crucial role in scaling the infrastructure and systems that power Replicant. As our company expands, we need your expertise to optimize how Replicant's data is managed and delivered, enhance the connectivity of our software applications, and strike the right balance between engineering autonomy and standardization. Our core technology stack includes TypeScript/NodeJS and Python within a Kubernetes environment on GCP, along with tools like Helm, Terraform, Datadog, and Prometheus.

What You'll Do

  • Ensure the smooth operation and high availability of Replicant's production systems
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Develop and maintain tools and automation to prevent and quickly resolve incidents
  • Collaborate with engineering teams to improve the reliability and scalability of our applications and infrastructure
  • Participate in on-call rotation to address production issues and ensure service uptime
  • Contribute to infrastructure design and implementation, focusing on scalability, security, and cost-effectiveness
  • Stay up-to-date on industry best practices and emerging technologies in SRE and DevOps

What You'll Bring

  • Proven experience in managing and troubleshooting complex, distributed systems in a production environment
  • Strong understanding of cloud platforms (GCP preferred) and containerization technologies (Kubernetes)
  • Proficiency in scripting languages and automation tools (e.g., Python, Bash, Terraform)
  • Experience with monitoring and observability systems (e.g., Datadog, Prometheus)
  • Excellent problem-solving skills and a proactive approach to identifying and mitigating potential issues
  • Strong communication and collaboration skills, with the ability to work effectively in a team environment
  • A passion for ensuring the reliability and performance of critical systems

Bonus Points

  • Bonus Points
  • Experience with CI/CD pipelines and infrastructure-as-code practices.
  • Knowledge of networking concepts and protocols.
  • Familiarity with security best practices for cloud-based systems.
  • Familiarity with telephony applications
For all full-time employees, we offer:
🏠  Remote working environment that respects time zone differences💸  Highly competitive salaries, equity, and for US Employees, a 401(k) plan🏥  Top of the line healthcare (medical, vision, and dental)🏋️  Health and Wellness Perk🖥️ Equipment Stipend🌴  Flexible vacation policy✈️  Amazing team trips & offsites where you can find our CEO baking bread for the team🌺 Replicants are eligible for a 5-week sabbatical after being at the company for 4.5 years
Our Values
Replicant has three core values. It is critical that everyone who joins the team feels excited and moved by these values as every new team member makes an impact on our culture.
Blade Runners: We take ownership and pride to influence the outcomes of our goals. We are successful, and like a Blade Runner, use the tools at our disposal to reach our objectives. We value open and honest communication and proactively seek feedback along the way. We are a company driven to grow and achieve both individually and as a team.
Bread Makers: We are humble and strive toward an egalitarian culture. No task is too big or too small. We work together to achieve our goals and develop our company mission. We believe that the whole is greater than the sum of its parts in everything that we do.
Självdistans (Self-Distance): Självdistans is Swedish for self-distance. It's the ability to critically reflect on oneself and one's relations from an external perspective. With this in mind, we act with objectivity and always remember that we are not our work. There's no perfect science to growing a team or business, but we trust everyone at Replicant to point out our blind spots and humbly admit their own.

Replicant is proud to be an equal opportunity employer. We are committed to fostering an inclusive, diverse and equitable workplace that is built on trust, support and respect. We welcome all individuals and do not discriminate on the basis of gender identity and expression, race, ethnicity, disability, sexual orientation, colour, religion, creed, gender, national origin, age, marital status, pregnancy, sex, citizenship, education, languages spoken or veteran status. Accommodation is available upon request at any point during our recruitment process. If you require an accommodation, please speak to your talent acquisition partner or email us at hr@replicant.ai and we’ll work to meet your needs.

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  10  2  0

Tags: ChatGPT CI/CD DevOps Distributed Systems Engineering GCP GPT Helm Kubernetes LLMs Node.js Pipelines Python Security Terraform TypeScript

Perks/benefits: Flex hours Flex vacation Health care Home office stipend Salary bonus Wellness

Regions: Remote/Anywhere North America
Countries: Canada United States

More jobs like this