System Reliability Engineer (Big Data)
Pune, India
Fulcrum Digital
Fulcrum Digital is at the forefront of digital transformation services, offering advanced digital engineering and acceleration solutions to drive business growthFulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, health care, and manufacturing.
The
Role
- Plan, manage, and oversee all aspects of a
Production Environment for Big Data Platforms.
- Define strategies for Application Performance
Monitoring, Optimization in Prod environment
- Respond to Incidents and improvise platform
based on feedback and measure the reduction of incidents over time.
- Ensures that batch production scheduling and
process are accurate and timely.
- Able to create and execute queries to big data
platform and relational data tables to identify process issues or to
perform mass updates, preferred.
- Performs ad hoc requests from users such as
data research, file manipulation/transfer, research of process issues,
etc.
- Take a holistic approach to problem solving,
by connecting the dots during a production event through the various
technology stack that makes up the platform, to optimize meantime to recover.
- Engage in and improve the whole lifecycle of
services—from inception and design, through deployment, operation and
refinement.
- Analyze ITSM activities of the platform and
provide feedback loop to development teams on operational gaps or
resiliency concerns.
- Support services before they go live through
activities such as system design consulting, capacity planning and launch
reviews.
- Support the application CI/CD pipeline for
promoting software into higher environments through validation and operational
gating, and lead in DevOps automation and best practices.
- Maintain services once they are live by
measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms
like automation and evolving systems by pushing for changes that improve
reliability and velocity.
- Work with a global team spread across tech
hubs in multiple geographies and time zones.
- Ability to share
knowledge and explain processes and procedures to others.
Requirements
- Experience in Linux and Knowledge on ITSM/ITIL.
- Experience in the Big Data technologies (Hadoop, Spark, Nifi,
Impala)
- 4+ years of Experience in running Big
Data production systems.
- Good to have experience in industry
standard CI/CD tools like Git/BitBucket, Jenkins, Maven,
- Solid grasp of SQL or Oracle
fundamentals
- Experience with scripting, pipeline
management, and software design.
- Systematic problem-solving approach,
coupled with strong communication skills and a sense of ownership and
drive.Ability to help debug and optimize
code and automate routine tasks.
- Ability to support many different
stakeholders. Experience in dealing with difficult situations and making
decisions with a sense of urgency is needed.
- Appetite for change and pushing the
boundaries of what can be done with automation.Experience in working across
development, operations, and product teams to prioritize needs and to build
relationships are a must.
- Experience designing and implementing an effective and efficient
CI/CD flow that gets code from dev to prod with high quality and minimal manual
effort is desired.
- Good Handle on Change Management and Release Management aspects
of Software
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Banking Big Data Bitbucket CI/CD Consulting DevOps Git Hadoop ITIL Jenkins Linux Maven NiFi Oracle Research Spark SQL
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.