Site Reliability Engineer
Jakarta Selatan, DKI Jakarta, Indonesia
DANA
Mulai transaksi mudah dan aman dengan DANA, dompet digital terbaik untuk kebutuhan sehari-hari. Kirim uang, bayar QRIS, dan nikmati kemudahan transaksi lewat DANA. Terdaftar & diawasi oleh Bank Indonesia dan KOMINFOAutomation Site Reliability Engineer
You will support internal business processes by handling ad hoc requests, working on projects to improve reliability and efficiency, and building your own initiatives to streamline operations. You'll participate in daily tasks and various projects, including system drills, load testing, deployments, and researching new technologies to drive innovation.
Key Responsibilities:
- Cloud Infrastructure Management: Oversee and optimize cloud-based infrastructure, with a preference for expertise in Alibaba Cloud.
- Linux Administration: Utilize advanced Linux/Unix expertise to manage and maintain systems, ensuring stability and operational efficiency.
- Kubernetes and Containerized Applications: Manage and orchestrate containerized applications and microservices using Kubernetes to ensure scalability, reliability, and efficiency.
- CI/CD Pipelines: Design, implement, and optimize CI/CD workflows using tools such as GitLab CI and ArgoCD; automate deployments and enhance release processes.
- MLOps and AI/ML Integration: Support MLOps pipelines and machine learning workflows, with a focus on incorporating AI/ML models, including large language models (LLMs), into infrastructure.
- Monitoring and Logging: Maintain and enhance observability through tools such as Grafana and ELK (Elasticsearch, Logstash, Kibana).
- Automation and Process Improvement: Develop and maintain scripts using languages like Python, Go, and Bash to streamline operations and optimize workflows.
- System Drills and Load Testing: Participate in system drills, load testing, and performance optimization to ensure high reliability and scalability.
- Security and Best Practices: Enforce robust security measures to protect infrastructure and ensure compliance with best practices.
- Disaster Recovery and Backups: Develop, test, and implement disaster recovery plans and manage system backups.
- Ad Hoc Support: Respond to and resolve ad hoc operational requests to maintain system health and user satisfaction.
- Research and Innovation: Explore and implement new tools, technologies, and practices to drive innovation and improve operational capabilities.
Requirements:
- Operating Systems: Advanced knowledge of Linux/Unix systems.
- Cloud Platforms: Familiarity with cloud platforms such as Alibaba Cloud, AWS, or GCP.
- Kubernetes and Containerization: Experience in deploying, managing, and orchestrating containerized applications using Kubernetes.
- CI/CD: Strong understanding of continuous integration and deployment practices using tools like GitLab CI, ArgoCD, or Jenkins.
- Programming: Proficiency in at least one programming language, such as Python, Go, Java, or Bash scripting.
- MLOps: Familiarity with MLOps pipelines and machine learning models, including large language models (LLMs).
- Monitoring and Logging: Experience with monitoring and logging tools like Grafana and EL
Preferred Skills:
- Database Administration: Experience with database performance tuning and administrative tasks.
- System Design and Architecture: Ability to design and architect scalable, efficient systems.
- Agile Development: Experience working within Agile and Scrum environments.
This role offers a chance to develop and expand your skills within a collaborative team environment, helping build and maintain robust and efficient processes across our infrastructure.
---------------------------------------------------------------------------------------------------
Introduction to Automation Team
We exist to free people from repetitive, soul-crushing work, transforming tasks that limit human potential into opportunities for meaningful growth. We believe that automation should allow people to focus on work that drives creativity, problem-solving, and innovation, rather than being weighed down by monotonous routines. By automating these tasks, we enable our team and company to reach new heights of impact and purpose.
Our aim is to drive exponential change, not settle for small gains. We are committed to creating tools and systems that empower our team to accomplish in a day what our competitors might take a month to achieve. By targeting improvements at factors of x10 or x100, we’re not just enhancing processes—we’re fundamentally transforming the speed, scale, and efficiency with which we operate. Our goal is to set new standards, making breakthroughs that multiply our impact and redefine what’s possible in our industry.
For our team, automation is a path of empowerment, learning, and even a bit of magic. We love the thrill of creating solutions that feel like magic, bringing ideas to life in ways that delight our customers and make our work more meaningful. This journey is about creating, innovating, and achieving together, as each team member takes ownership, solves real problems, and sees the tangible results of their efforts. It’s a process of discovery, growth, and witnessing the transformation of our own potential along the way.
Ultimately, our purpose extends beyond the company—we aim to serve society by proving that automation can drive a future of abundance for all. We’re working toward a world where technology benefits everyone, where automation empowers rather than replaces, and where progress uplifts society as a whole. This is our vision, our mission, and the reason we’re driven to innovate every day.
Core Principles of Automation Team:
- Strong Ownership - We take full responsibility for our actions, proactively solving problems and continually improving our skills and processes.
- Bias for Action - We make quick, results-driven decisions and aren’t afraid to try new approaches.
- Customer Obsession - We deeply empathize with our customers, innovating to exceed their expectations.
- Truth Seeking - We respectfully challenge decisions when necessary, and once determined, commit wholly.
- Deliver Results 100X - We strive to achieve results that multiply our impact significantly.
Strive to Be the Best Work Environment - We make this a dream job by fostering growth and celebrating team successes.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Architecture AWS CI/CD Elasticsearch ELK GCP GitLab Grafana Java Jenkins Kibana Kubernetes Linux LLMs Logstash Machine Learning Microservices ML models MLOps Pipelines Python Research Scrum Security Testing
Perks/benefits: Career development Health care Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.