Assistant Manager/Data Science Engineer

India-Bangalore

Full Time Mid-level / Intermediate USD 24K - 56K *

Genpact

Artificial Intelligence. Real Outcomes. AI is changing big businesses, and so are we. Discover how cutting-edge AI drives unparalleled value.

View all jobs at Genpact

Apply now Apply later

Posted 3 hours ago

Assistant Manager/Data Science Engineer-ANA015637

Genpact (NYSE: G) is a global professional services and solutions firm delivering outcomes that shape the future. Our 125,000+ people across 30+ countries are driven by our innate curiosity, entrepreneurial agility, and desire to create lasting value for clients. Powered by our purpose – the relentless pursuit of a world that works better for people – we serve and transform leading enterprises, including the Fortune Global 500, with our deep business and industry knowledge, digital operations services, and expertise in data, technology, and AI.

Inviting applications for the role of Assistant Manager/Data Science Engineer
The Data Science Engineer will support the Membership, Digital Analytics & Data Science pillar of the Enterprise Analytics Team and will be responsible for maintaining, optimizing, and developing our internal data science solutions. Reporting directly to an offshore Data Science Engineering Manager, we are seeking candidates with a strong foundation in data engineering, machine learning operations (MLOps), and DevOps practices. Candidates should also demonstrate proficiency in cloud-based architectures (AWS), ETL pipeline management, model deployment, and CI/CD automation.

Responsibilities
Under the supervision of the offshore Data Science Engineering Manager utilizing agile methods of project management:
Acquisition products (Bingxin):
• Create and maintain ETL pipelines using PySpark and other tools, handle large, complex data pipelines with AWS technology stack
• Models, DNA & ETL code pipeline development including identifying new opportunities
• Understanding business strategies behind the current production jobs
• Implement DevOps practices, including CI/CD and version control with Git
• Code improvement/optimization recommendations considering speed, resources utilization and efficiency
• Collaborate on coding, manage feature development branches and raise/review pull requests using Bitbucket
• Deploy models using cloud-based strategies and make recommendations on improving model efficiency, performance & deployment mechanisms
• Utilize AWS services such as SageMaker, EC2, S3, and Lambda for data engineering and model deployment
• Execution of our end-to-end campaign targeting engine involves updating the ETL/DNA processes, machine learning models, direct mail assignment module, and measurement module, utilizing CI/CD tools and the AWS technology stack
• Lead the execution of campaigns and ensure comprehensive support throughout the campaign lifecycle, from design to measurement and optimization

Personalization products (Hazim):
• Develop and optimize ETL pipelines using AWS Glue and PySpark for efficient data processing.
• Optimize Spark parameters to reduce run times and improve resource utilization.
• Build and refine machine learning models using AWS SageMaker for personalization and recommendation systems
• Preprocess and clean data to ensure high-quality input for models
• Use SQL for querying and managing large datasets
• Deploy models to production using AWS SageMaker, ensuring scalability and performance.
• Containerize applications using Docker for efficient deployment across environments
• Set up CI/CD pipelines with Jenkins and AWS services to automate build, testing, and deployment processes
• Utilize AWS services including Glue, SageMaker, EC2, S3, and Lambda for data engineering and model deployment
• Optimize code and infrastructure for improved speed and efficiency
• Work with Bitbucket or any code version control and collaborative development
• Support execution of assignments such as collaborative filtering, trip propensity modeling, and engage in test design, measurement, and execution
• Lead the execution of campaigns and ensure comprehensive support throughout the campaign lifecycle, from design to measurement and optimization
• Lead the full execution of campaigns, including updating data pipelines, refining machine learning models, managing assignment modules, and measuring outcomes, while leveraging CI/CD tools and AWS infrastructure
• Collaborate on test design and measurement, focusing on performance analysis, campaign execution, and continuous improvement

Voice of Member:
• Build, maintain, and optimize automated ETL pipelines to process and transform large datasets from diverse sources efficiently
• Develop and deploy machine learning models, with a focus on NLP, and ensure seamless integration of these models into data pipelines
• Implement and manage data ingestion from external APIs, ensuring smooth data flow into existing workflows for analysis and processing
• Oversee the organization, querying, and maintenance of large-scale SQL and NoSQL databases, with a focus on text-based data management
• Utilize cloud platforms (AWS, Google Cloud, or Azure) for scalable data processing, storage, and workflow automation

Qualifications we seek in you!
Minimum Qualifications / Skills
• BSc in Computer Science, engineering (or related area), Statistics, Mathematics or equivalent experience

Preferred Qualifications/ Skills
• Experience in algorithms, data structures, and object-oriented programming
• Experience in developing Machine learning models for retail use cases
• Experience in crafting ad hoc data mining analysis for retail use cases
• Proven experience in the implementation of machine learning algorithms and applications for production-level systems
• Strong programming skills in Python, PySpark, SQL, or Java, with a focus on scalable solutions
• Experience with cluster-computing frameworks (e.g., Spark, Hadoop) and optimizing them for performance and resource efficiency
• Expertise in configuring and managing large, complex data pipelines using AWS services (e.g., Glue, SageMaker, Lambda), with experience in GCP/Azure as a plus
• Familiarity with DevOps tools and processes, including CI/CD pipelines, version control (Git), and automation tools like Jenkins
• Experience with AWS Glue for ETL processes and Spark parameter optimization for performance improvement
• Experience using development and collaboration tools like Jira and Confluence
• Ability to extract actionable insights from complex datasets using data mining, statistics, and database techniques to improve member acquisition/engagement KPIs
• Strong communication and collaboration skills, working effectively across teams and stakeholder
Genpact is an Equal Opportunity Employer and considers applicants for all positions without regard to race, color, religion or belief, sex, age, national origin, citizenship status, marital status, military/veteran status, genetic information, sexual orientation, gender identity, physical or mental disability or any other characteristic protected by applicable laws. Genpact is committed to creating a dynamic work environment that values diversity and inclusion, respect and integrity, customer focus, and innovation. Get to know us at genpact.com and on LinkedIn, X, YouTube, and Facebook.
Furthermore, please do note that Genpact does not charge fees to process job applications and applicants are not required to pay to participate in our hiring process in any other way. Examples of such scams include purchasing a 'starter kit,' paying to apply, or purchasing equipment or training.