AI Platform Engineer – BFSI Domain

Mumbai, MH, India

Expleo

Expleo is a trusted partner for end-to-end, integrated engineering, quality services and management consulting for digital transformation.

View all jobs at Expleo

Apply now Apply later

Overview

 

Key Responsibilities

·  Platform Development and Evangelism:

  • Build scalable AI platforms that are customer-facing.
  • Evangelize the platform with customers and internal stakeholders.
  • Ensure platform scalability, reliability, and performance to meet business needs.

  Machine Learning Pipeline Design:

  • Design ML pipelines for experiment management, model management, feature management, and model retraining.
  • Implement A/B testing of models.
  • Design APIs for model inferencing at scale.
  • Proven expertise with MLflow, SageMaker, Vertex AI, and Azure AI.

LLM Serving and GPU Architecture:

  • Serve as an SME in LLM serving paradigms.
  • Possess deep knowledge of GPU architectures.
  • Expertise in distributed training and serving of large language models.
  • Proficient in model and data parallel training using frameworks like DeepSpeed and service frameworks like vLLM.

Model Fine-Tuning and Optimization:

  • Demonstrate proven expertise in model fine-tuning and optimization techniques.
  • Achieve better latencies and accuracies in model results.
  • Reduce training and resource requirements for fine-tuning LLM and LVM models.

LLM Models and Use Cases:

  • Have extensive knowledge of different LLM models.
  • Provide insights on the applicability of each model based on use cases.
  • Proven experience in delivering end-to-end solutions from engineering to production for specific customer use cases.

DevOps and LLMOps Proficiency:

  • Proven expertise in DevOps and LLMOps practices.
  • Knowledgeable in Kubernetes, Docker, and container orchestration.
  • Deep understanding of LLM orchestration frameworks like Flowise, Langflow, and Langgraph.

 

Skill Matrix

LLM: Hugging Face OSS LLMs, GPT, Gemini, Claude, Mixtral, Llama

LLM Ops: ML Flow, Langchain, Langraph, LangFlow, Flowise, LLamaIndex, SageMaker, AWS Bedrock, Vertex AI, Azure AI

Databases/Datawarehouse: DynamoDB, Cosmos, MongoDB, RDS, MySQL, PostGreSQL, Aurora, Spanner, Google BigQuery.

Cloud Knowledge: AWS/Azure/GCP

Dev Ops (Knowledge): Kubernetes, Docker, FluentD, Kibana, Grafana, Prometheus

Cloud Certifications (Bonus): AWS Professional Solution Architect, AWS Machine Learning Specialty, Azure Solutions Architect Expert

Proficient in Python, SQL, Javascript

 

Responsibilities

 

Key Responsibilities

·  Platform Development and Evangelism:

  • Build scalable AI platforms that are customer-facing.
  • Evangelize the platform with customers and internal stakeholders.
  • Ensure platform scalability, reliability, and performance to meet business needs.

  Machine Learning Pipeline Design:

  • Design ML pipelines for experiment management, model management, feature management, and model retraining.
  • Implement A/B testing of models.
  • Design APIs for model inferencing at scale.
  • Proven expertise with MLflow, SageMaker, Vertex AI, and Azure AI.

LLM Serving and GPU Architecture:

  • Serve as an SME in LLM serving paradigms.
  • Possess deep knowledge of GPU architectures.
  • Expertise in distributed training and serving of large language models.
  • Proficient in model and data parallel training using frameworks like DeepSpeed and service frameworks like vLLM.

Model Fine-Tuning and Optimization:

  • Demonstrate proven expertise in model fine-tuning and optimization techniques.
  • Achieve better latencies and accuracies in model results.
  • Reduce training and resource requirements for fine-tuning LLM and LVM models.

LLM Models and Use Cases:

  • Have extensive knowledge of different LLM models.
  • Provide insights on the applicability of each model based on use cases.
  • Proven experience in delivering end-to-end solutions from engineering to production for specific customer use cases.

DevOps and LLMOps Proficiency:

  • Proven expertise in DevOps and LLMOps practices.
  • Knowledgeable in Kubernetes, Docker, and container orchestration.
  • Deep understanding of LLM orchestration frameworks like Flowise, Langflow, and Langgraph.

 

 

Qualifications

 

  • 3–5 years in AI/ML product development.
  • Skill Matrix

    LLM: Hugging Face OSS LLMs, GPT, Gemini, Claude, Mixtral, Llama

    LLM Ops: ML Flow, Langchain, Langraph, LangFlow, Flowise, LLamaIndex, SageMaker, AWS Bedrock, Vertex AI, Azure AI

    Databases/Datawarehouse: DynamoDB, Cosmos, MongoDB, RDS, MySQL, PostGreSQL, Aurora, Spanner, Google BigQuery.

    Cloud Knowledge: AWS/Azure/GCP

    Dev Ops (Knowledge): Kubernetes, Docker, FluentD, Kibana, Grafana, Prometheus

    Cloud Certifications (Bonus): AWS Professional Solution Architect, AWS Machine Learning Specialty, Azure Solutions Architect Expert

    Proficient in Python, SQL, Javascript

     

Essential skills

Skill Matrix

LLM: Hugging Face OSS LLMs, GPT, Gemini, Claude, Mixtral, Llama

LLM Ops: ML Flow, Langchain, Langraph, LangFlow, Flowise, LLamaIndex, SageMaker, AWS Bedrock, Vertex AI, Azure AI

Databases/Datawarehouse: DynamoDB, Cosmos, MongoDB, RDS, MySQL, PostGreSQL, Aurora, Spanner, Google BigQuery.

Cloud Knowledge: AWS/Azure/GCP

Dev Ops (Knowledge): Kubernetes, Docker, FluentD, Kibana, Grafana, Prometheus

Cloud Certifications (Bonus): AWS Professional Solution Architect, AWS Machine Learning Specialty, Azure Solutions Architect Expert

Proficient in Python, SQL, Javascript

 

Experience

  • Skill Matrix

    LLM: Hugging Face OSS LLMs, GPT, Gemini, Claude, Mixtral, Llama

    LLM Ops: ML Flow, Langchain, Langraph, LangFlow, Flowise, LLamaIndex, SageMaker, AWS Bedrock, Vertex AI, Azure AI

    Databases/Datawarehouse: DynamoDB, Cosmos, MongoDB, RDS, MySQL, PostGreSQL, Aurora, Spanner, Google BigQuery.

    Cloud Knowledge: AWS/Azure/GCP

    Dev Ops (Knowledge): Kubernetes, Docker, FluentD, Kibana, Grafana, Prometheus

    Cloud Certifications (Bonus): AWS Professional Solution Architect, AWS Machine Learning Specialty, Azure Solutions Architect Expert

    Proficient in Python, SQL, Javascript

     

  • upto 6 years
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: A/B testing APIs Architecture AWS Azure BigQuery Claude DevOps Docker DynamoDB Engineering GCP Gemini GPT GPU Grafana JavaScript Kibana Kubernetes LangChain LLaMA LLMOps LLMs Machine Learning MLFlow MongoDB MySQL Pipelines PostgreSQL Python SageMaker SQL Testing Vertex AI vLLM

Region: Asia/Pacific
Country: India

More jobs like this