Senior Specialist, Digital Innovation - Data Scientist
Hyderabad, India
Responsibilities 职责
We are seeking a Data Scientist to support our custom Product Finder application by providing insights derived from company data analysis. The ideal candidate will utilize large external and internal data sets to identify product similarities and differences, recommending actionable courses based on their findings. The candidate should have extensive experience with various data mining and analysis methods, tools, model building and implementation, algorithm development, and simulation running.
Qualifications 要求
Required Knowledge/Skills/Abilities
- Programming Proficiency: Strong experience with programming languages such as R, Python, and SQL. Proficient in manipulating data and extracting insights from large datasets.
- Machine Learning Frameworks: Familiarity with machine learning frameworks like TensorFlow and PyTorch, and libraries designed for working with LLMs (Large Language Models), RAG (Retrieval-Augmented Generation), and LMMs (Linear Mixed Models).
- Large Language Models: In-depth knowledge and hands-on experience with LLMs, including models like GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers). Understanding their mechanisms, strengths, limitations, and ethical considerations.
- RAG Techniques: Demonstrated experience with RAG techniques and their applications, leveraging RAG to enhance language model performance by combining retrieval and generative capabilities for more accurate, contextually relevant responses.
- Linear Mixed Models: Solid understanding and practical application of LMMs in analyzing data with multiple levels of correlation or non-constant variability. Experience applying LMMs to complex datasets, ensuring accurate data interpretation and decision-making.
- Data Preparation: Experience in preprocessing, cleaning, and structuring large datasets for use with advanced models like LLMs and RAG systems. Efficient in managing and manipulating big datasets to ensure high-quality inputs for model training and analysis.
- Machine Learning Techniques: Solid understanding and application experience with various machine learning techniques, including clustering, decision tree learning, and artificial neural networks, along with their real-world advantages and limitations.
- Statistical Techniques: Deep knowledge of advanced statistical techniques and concepts (e.g., regression analysis, distribution properties, statistical testing) and experience applying these techniques to data analysis and modeling. Familiarity with data mining techniques such as GLM/Regression, Random Forest, Boosting, Trees, text mining, and social network analysis.
- Critical Thinking and Problem-Solving: Ability to apply critical thinking and problem-solving skills to leverage LLMs, RAG, and LMMs in addressing complex business challenges. Design and implement models to effectively analyze data, predict outcomes, and provide insights.
- Communication Skills: Excellent verbal and written communication skills, capable of articulating complex concepts and findings to both technical and non-technical stakeholders. Proven ability to collaborate with cross-functional teams to drive projects to completion.
- Interdisciplinary Collaboration: Ability to work independently with experts from various fields, including chemical engineers, process engineers, and environmental scientists, to integrate diverse data sources and insights.
- Experience and Education: 7+ years of experience manipulating datasets and building statistical models. A Bachelor's or Master’s degree in Statistics, Mathematics, Computer Science, or another quantitative field.
Desirable Knowledge/Skills/Abilities
- Coding APIs: Knowledge of coding APIs and experience with languages like JavaScript and open-source frameworks such as Streamlit.
- Cloud Services: Experience with cloud services (AWS, Azure, Google Cloud) and cloud data warehouses (Snowflake). Understanding how to leverage these for scalable data analysis.
- Data Visualization: Proficiency in data visualization tools and libraries (e.g., PowerBI, Matplotlib, Seaborn) to communicate findings visually.
- Data Analysis Tools: Experience using Alteryx for data analysis and manipulation.
- Data Architecture and Pipelines: Knowledge of data architecture and pipelines, including experience with big data technologies and database management systems.
- Enterprise Systems: Knowledge of SAP and Salesforce is a plus.
- Chemistry Background: Background in Chemistry is an advantage.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs Architecture AWS Azure BERT Big Data Chemistry Clustering Computer Science Data analysis Data Mining Data visualization GCP Google Cloud GPT JavaScript LLMs Machine Learning Mathematics Matplotlib Model training Open Source Pipelines Power BI Python PyTorch R RAG Salesforce Seaborn Snowflake SQL Statistics Streamlit TensorFlow Testing Transformers
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.