Senior Expert – Vision-Language Models and Generative AI (GenAI)
Bengaluru , India
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Bosch Group
Moving stories and inspiring interviews. Experience the meaning of "invented for life" by Bosch completely new. Visit our international website.Company Description
Bosch Global Software Technologies Private Limited is a 100% owned subsidiary of Robert Bosch GmbH, one of the world's leading global supplier of technology and services, offering end-to-end Engineering, IT and Business Solutions. With over 28,200+ associates, it’s the largest software development center of Bosch, outside Germany, indicating that it is the Technology Powerhouse of Bosch in India with a global footprint and presence in the US, Europe and the Asia Pacific region.
Job Description
Roles & Responsibilities:
Conduct deep research in:
Vision-Language and Multimodal AI for perception and semantic grounding
Cross-modal representation learning for real-world sensor fusion (camera, lidar, radar, text)
Multimodal generative models for scene prediction, intent inference, or simulation
Efficient model architectures for edge deployment in automotive and factory systems
Evaluation methods for explainability, alignment, and safety of VLMs in mission-critical applications
Spin newer research directions and drive AI research programs for autonomous driving, ADAS, and Industry 4.0 applications.
Create new collaborations within and outside of Bosch in relevant domains.
Contribute to Bosch’s internal knowledge base, open research assets, and patent portfolio.
Lead internal research clusters or thematic initiatives across autonomous systems or industrial AI.
Mentor and guide research associates, interns, and young scientists.
Qualifications
Educational qualification:
Ph.D. in Computer Science / Machine Learning / AI / Computer Vision or equivalent
Experience:
8+ years (post PhD) in AI related to Vision and Language modalities, excellent exposure and hands on research in GenAI, VLMs, Multimodal AI, or Applied AI Research.
Mandatory/requires Skills:
Deep expertise in:
Vision-Language Models (CLIP, Flamingo, Kosmos, BLIP, GIT) and multimodal transformers
Open- and closed-source LLMs (e.g., LLaMA, GPT, Claude, Gemini) with visual grounding extensions
Contrastive learning, cross-modal fusion, and structured generative outputs (e.g., scene graphs)
PyTorch, HuggingFace, OpenCLIP, and deep learning stack for computer vision
Evaluation on ADAS/mobility benchmarks (e.g., nuScenes, BDD100k) and industrial datasets
Strong track record of publications in relevant AI/ML/vision venues
Demonstrated capability to lead independent research programs
Familiarity with multi-agent architectures, RLHF, and goal-conditioned VLMs for autonomous agents
Preferred Skills:
Hands-on experience with:
Perception stacks for ADAS, SLAM, or autonomous robots
Vision pipeline tools (MMDetection, Detectron2, YOLOv8) and video understanding models
Semantic segmentation, depth estimation, 3D vision, and temporal models
Industrial datasets and tasks: defect detection, visual inspection, operator assistance
Lightweight or compressed VLMs for embedded hardware (e.g., in vehicle ECUs or factory edge)
Knowledge of reinforcement learning or planning in embodied AI context
Strong academic or industry research collaborations
Understanding of Bosch domains and workflows in mobility and manufacturing
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture Autonomous Driving Claude Computer Science Computer Vision Deep Learning Engineering Gemini Generative AI Generative modeling Git GPT HuggingFace Industrial Lidar LLaMA LLMs Machine Learning PhD PyTorch Radar Reinforcement Learning Research RLHF SLAM Transformers
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.