LLM Architect
Remote, NY, US
AllCloud
AllCloud combines the expertise of cloud integration with custom solutions for proven success across top technologies including AWS and Salesforce.Description
LLM Architect
Location: US / Canada (Eastern Time) - Home based
Job Type: Full-time, Permanent
About AllCloud
AllCloud is a global professional services company providing organizations with cloud enablement and transformation tools. As an AWS Premier Consulting Partner and audited MSP, a Salesforce Platinum Partner, and a Snowflake Premier Partner, AllCloud helps clients connect their front and back offices by building a new operating model to harness the benefits of cloud technology and data and analytics.
Job Summary
We are looking for an innovative LLM Architect to lead the design and development of custom language models at AllCloud. This role will be responsible for architecting, training, and optimizing large language models based on modified transformer architectures. The ideal candidate will have deep expertise in NLP, transformer model design, and efficient training methodologies. You'll work alongside GPU Engineers and ML Engineers to create state-of-the-art language models that meet our customers' specific requirements, pushing the boundaries of what's possible with generative AI.
Responsibilities
- Design custom transformer-based language model architectures tailored to specific use cases
- Develop and implement modifications to transformer architectures to enhance performance, efficiency, or capabilities
- Create and execute model pre-training, fine-tuning, and evaluation strategies
- Implement techniques like quantization, pruning, and knowledge distillation to optimize model size and performance
- Design and implement training data pipelines, including data selection, cleaning, and augmentation
- Establish rigorous evaluation frameworks to assess model performance, fairness, and safety
- Research and implement state-of-the-art techniques in LLM development
- Create detailed documentation on model architectures, training methodologies, and performance characteristics
- Collaborate with GPU Engineers to implement efficient training strategies across distributed systems
- Work with customers to understand their unique requirements and translate them into model design decisions
Requirements
Summary of Key Requirements
- 4+ years of experience in deep learning research or development with a focus on NLP and transformer models
- Strong understanding of transformer architecture and its variants (GPT, BERT, T5, etc.)
- Experience designing and training large language models from scratch
- Expertise in PyTorch or TensorFlow for implementing custom model architectures
- Knowledge of distributed training approaches for large models (DeepSpeed, Megatron, etc.)
- Experience with model compression techniques (quantization, pruning, knowledge distillation)
- Strong background in mathematics, particularly linear algebra, differential equations, probability, and statistics
- Familiarity with current research in LLM development, including attention mechanisms, mixture of experts, and efficient training methods
- Master's or PhD in Computer Science, Machine Learning, or related field
- Publication record in NLP, LLMs, or transformer architecture (strongly preferred)
Certifications
- AWS Machine Learning Specialty (Strongly Preferred)
- NVIDIA-Certified Associate - Generative AI Multimodal (Preferred)
Why work for us?
Our team inspires progress in each other and in our customers through our relentless pursuit of excellence; you will work with leaders who promote learning and personal development.
AllCloud is an Equal Opportunity Employer and considers applicants for employment without regard to race, color, religion, sex, orientation, national origin, age, disability, genetics or any other basis forbidden under federal, provincial, or local law.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS BERT Computer Science Consulting Data pipelines Deep Learning Distributed Systems Generative AI GPT GPU Linear algebra LLMs Machine Learning Mathematics Model design NLP PhD Pipelines PyTorch Research Salesforce Snowflake Statistics TensorFlow
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.