Data Scientist / Engineer

USA-MA-Burlington - Blue Sky

Applications have closed

Broadcom

Broadcom Inc. is a global technology leader that designs, develops and supplies a broad range of semiconductor, enterprise software and security solutions.

View all jobs at Broadcom

Find more jobs like this Jobs in the United States

Posted 6 months ago

Please Note:

1. If you are a first time user, please create your candidate login account before you apply for a job. (Click Sign In > Create Account)

2. If you already have a Candidate Account, please Sign-In before you apply.

Job Description:

Data Scientist/Engineer practitioner to work on designing and developing a network operations AI assistant expert system using generative AI, traditional Machine Learning, and statistical analysis tools and techniques.

Recommended Skill Set List

NetOps Data Engineer/Scientist

Classic ML and Statistical Analysis Focused Skills

Machine Learning Skills:

Supervised Learning: Regression (linear, logistic, etc.), classification (SVM, decision trees, random forests, etc.), and model evaluation metrics.
Unsupervised Learning: Clustering (k-means, hierarchical), dimensionality reduction (PCA, t-SNE).
Deep Learning: Neural networks, convolutional neural networks (CNNs), recurrent neural networks (RNNs), and experience with deep learning frameworks.
Model Selection & Tuning: Cross-validation, hyperparameter optimization, and model selection techniques.

Statistical Analysis & Modeling Skills:

Solid understanding of statistical modeling and machine learning concepts, including:
- Time series analysis.
- Regression analysis
- Classification analysis
- Clustering analysis
- Hypothesis testing.
- Statistical significance.
- Bayesian methods.

Programming & Scripting Skills:

Python: Proficient in data manipulation (Pandas, NumPy), data visualization (Matplotlib, Seaborn, Plotly), and machine learning libraries (Scikit-learn, TensorFlow, PyTorch).
R: Experience with data manipulation, statistical analysis, and visualization packages (dplyr, tidyr, ggplot2).
SQL: Ability to query and manipulate large datasets from relational databases.
Cypher/GQL (Optional but beneficial): Ability to query (filter, aggregate, analyze) and manipulate complex graph-based data sets

Data Visualization & Communication Skills:

Creating clear and effective visualizations using various tools.
Communicating complex technical information to both technical and non-technical audiences.
Data storytelling and presentation skills.

Software Engineering Skills:

Programming Languages: Proficiency in Python (essential) and potentially other languages like Java, Scala, or Go.
API Development: Building and consuming APIs to integrate different components of the system.
Version Control (Git): Essential for collaborative development.
Software Design Principles: Ability to design scalable and maintainable systems.

GenAI focused skills

Includes all of the above skills plus these GenAI-specific skills

GenAI Model Integration & Application:

Experience integrating and deploying large language models (LLMs), diffusion models, and other generative AI models into data pipelines and applications. This includes understanding API interactions, prompt engineering, and model limitations.

Data Wrangling & Preprocessing for GenAI:

Expertise in cleaning, transforming, and preparing data for use in generative AI models. This includes handling unstructured data, text normalization, feature engineering specific to generative models, and data augmentation techniques.

Prompt Engineering & Optimization:

Deep understanding of prompt engineering techniques for maximizing the quality and relevance of outputs from generative AI models. Experience with iterative prompt refinement and A/B testing different prompt strategies.

III. AI and Machine Learning Skills:

Large Language Models (LLMs): Deep understanding of how LLMs work and how to effectively integrate them into the RAG system. This includes prompting engineering techniques to obtain desired outputs.
Retrieval Methods: Familiarity with different retrieval techniques, such as BM25, dense retrieval (using embeddings), and hybrid approaches.
Model Selection & Tuning: Ability to choose and fine-tune appropriate models for different tasks within the RAG pipeline.
Explainable AI (XAI): Understanding how to make the system's reasoning transparent and understandable, especially important for building trust in GenAI applications.

Experience: Bachelor's + 8+ years of related experience

Additional Job Description:

Compensation and Benefits

The annual base salary range for this position is $119,000 - $190,000

This position is also eligible for a discretionary annual bonus in accordance with relevant plan documents, and equity in accordance with equity plan documents and equity award agreements.

Broadcom offers a competitive and comprehensive benefits package: Medical, dental and vision plans, 401(K) participation including company matching, Employee Stock Purchase Program (ESPP), Employee Assistance Program (EAP), company paid holidays, paid sick leave and vacation time. The company follows all applicable laws for Paid Family Leave and other leaves of absence.

Broadcom is proud to be an equal opportunity employer. We will consider qualified applicants without regard to race, color, creed, religion, sex, sexual orientation, gender identity, national origin, citizenship, disability status, medical condition, pregnancy, protected veteran status or any other characteristic protected by federal, state, or local law. We will also consider qualified applicants with arrest and conviction records consistent with local law.

If you are located outside USA, please be sure to fill out a home address as this will be used for future correspondence.

Find more jobs like this Jobs in the United States

Job stats: 5 0 0

Categories: Data Science Jobs Engineering Jobs

Tags: A/B testing API Development APIs Bayesian Classification Clustering Data pipelines Data visualization Deep Learning Diffusion models Engineering Feature engineering Generative AI Generative modeling ggplot2 Git Java LLMs Machine Learning Matplotlib NumPy Pandas Pipelines Plotly Prompt engineering Python PyTorch R RAG RDBMS Scala Scikit-learn Seaborn SQL Statistical modeling Statistics TensorFlow Testing Unstructured data Unsupervised Learning