Master Principal Artifical Intelligence Architect

Singapore

Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and cloud platform services.

View all jobs at Oracle

Apply now Apply later

The JAPAC Cloud Engineering Center of Excellence (CE CoE) is a divisional organization across Japan and Asia Pacific whose role is to accelerate consumption of Oracle Cloud Services by providing a world-class customer experience on the Oracle Cloud. Customers will have access to cutting edge and deep technical subject matter expertise and solutions across their technical lifecycle of Incubate to Implement. The primary mission of the CoE teams is to be help customers to move their workloads to Oracle Cloud Infrastructure (OCI) and playing the role their engineering partner.

We are seeking a Senior AI Architect & Infrastructure Specialist, having at-least 15-20 years of deep technical hands-on expertise in designing and implementing AI and HPC infrastructure and scalable architectures to drive the next phase of our AI growth initiatives. You have great communication skills and can interact at all levels (CIO, CTO, Product Engineering, IT Architects and developers). The job requires you to engage cross-functional teams including data scientists, AI engineers, and DevOps teams to align AI infrastructure with evolving business and technical needs. You will be accountable and empowered to drive building innovative experiences in a fast-paced, startup-like environment. Success is measured providing the field and customers with world class technical subject matter expertise and drive the following outcomes

  • Cloud consumption growth
  • New cloud customer acquisition
  • A high performance, innovative, agile and collaborative team player
  • Continuous improvement of time to value

 

Career Level - IC5

Responsibilities 

 

Why Join Us?

OCI (Oracle Cloud Infrastructure) AI Infrastructure is at the forefront of building a cutting-edge, ultra-high-performance GPU platform designed to support AI/ML/HPC workloads. The CoE provides a platform to be part of the AI revolution, architecting customer centric systems and solve real-world business problems with AI

 

Requirements - Technical

You bring your proven experience in 3 or more of the following areas. AI Infrastructure experience and experience with LLMs is a MUST

 

AI Infrastructure Design: Lead the architecture and implementation of AI and HPC infrastructure, including the use of GPUs/TPUs, high-performance networking, and scalable storage solutions to support GenAI/AI/ML workloads 

 

AI Deployment: Experience in deploying large models in production on public clouds (OCI, AWS, Azure, GCP) and hybrid cloud environments, including the use of microservices and containerization (Docker, Kubernetes) to ensure smooth deployment, scaling, and monitoring of AI/ML models in production

 

AI/ML Tools & Frameworks: Design and implement AI systems using industry-standard training,  inferencing and deployment tools such as Kubeflow, Ray, CUDA, PyTorch, and TensorFlow, ensuring optimal performance in training and deployment. Exposure to scheduling and automation tools such as Slurm, Terraform is desirable 

 

Large Language Models (LLMs): Expertise in working with closed and/or open-source LLMs (e.g., GPT, BERT, Bloom, LLaMA) and understanding the full AI life cycle, including training, fine-tuning, and deploying these models for inference in production environments.

 

Performance Optimization: Drive the optimization of AI infrastructure and applications on Oracle OCI, focusing on efficiency improvements in computational speed and resource management.

 

Security & Compliance: Ensure all AI infrastructure and solutions are compliant with industry standards and organizational policies related to security, privacy, and data governance

 

Operating Systems, Protocols and Tools: Strong Linux skills with hands-on experience in Oracle Linux/RHEL/CentOS, Ubuntu, and Debian distributions, including system administration, package management, shell scripting. Strong knowledge of networking protocols (TCP/IP, Infiniband, RDMA, UDP, HTTP) is a significant advantage. Experience on high performance storage is desirable 

 

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s problems. True innovation starts with diverse perspectives and various abilities and backgrounds.

When everyone’s voice is heard, we’re inspired to go beyond what’s been done before. It’s why we’re committed to expanding our inclusive workforce that promotes diverse insights and perspectives.

We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.

Oracle careers open the door to global opportunities where work-life balance flourishes. We offer a highly competitive suite of employee benefits designed on the principles of parity and consistency. We put our people first with flexible medical, life insurance and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by calling +1 888 404 2494, option one.

Disclaimer:

Oracle is an Equal Employment Opportunity Employer*. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

* Which includes being a United States Affirmative Action Employer

Apply now Apply later
  • Share this job via
  • 𝕏
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  1  0
Category: Architecture Jobs

Tags: Agile Architecture AWS Azure BERT CUDA CX Data governance DevOps Docker Engineering GCP Generative AI GPT GPU HPC InfiniBand Kubeflow Kubernetes Linux LLaMA LLMs Machine Learning Microservices ML infrastructure ML models Open Source Oracle Privacy PyTorch Security Shell scripting TensorFlow Terraform

Perks/benefits: Flex hours Insurance Startup environment

Region: Asia/Pacific
Country: Singapore

More jobs like this