Senior Specialist, Digital Innovation - Data Scientist

Hyderabad, India

Applications have closed

Celanese

Posted 5 months ago

Responsibilities 职责

We are seeking a Data Scientist to support our custom Product Finder application by providing insights derived from company data analysis. The ideal candidate will utilize large external and internal data sets to identify product similarities and differences, recommending actionable courses based on their findings. The candidate should have extensive experience with various data mining and analysis methods, tools, model building and implementation, algorithm development, and simulation running.

Qualifications 要求

Required Knowledge/Skills/Abilities

Programming Proficiency: Strong experience with programming languages such as R, Python, and SQL. Proficient in manipulating data and extracting insights from large datasets.
Machine Learning Frameworks: Familiarity with machine learning frameworks like TensorFlow and PyTorch, and libraries designed for working with LLMs (Large Language Models), RAG (Retrieval-Augmented Generation), and LMMs (Linear Mixed Models).
Large Language Models: In-depth knowledge and hands-on experience with LLMs, including models like GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers). Understanding their mechanisms, strengths, limitations, and ethical considerations.
RAG Techniques: Demonstrated experience with RAG techniques and their applications, leveraging RAG to enhance language model performance by combining retrieval and generative capabilities for more accurate, contextually relevant responses.
Linear Mixed Models: Solid understanding and practical application of LMMs in analyzing data with multiple levels of correlation or non-constant variability. Experience applying LMMs to complex datasets, ensuring accurate data interpretation and decision-making.
Data Preparation: Experience in preprocessing, cleaning, and structuring large datasets for use with advanced models like LLMs and RAG systems. Efficient in managing and manipulating big datasets to ensure high-quality inputs for model training and analysis.
Machine Learning Techniques: Solid understanding and application experience with various machine learning techniques, including clustering, decision tree learning, and artificial neural networks, along with their real-world advantages and limitations.
Statistical Techniques: Deep knowledge of advanced statistical techniques and concepts (e.g., regression analysis, distribution properties, statistical testing) and experience applying these techniques to data analysis and modeling. Familiarity with data mining techniques such as GLM/Regression, Random Forest, Boosting, Trees, text mining, and social network analysis.
Critical Thinking and Problem-Solving: Ability to apply critical thinking and problem-solving skills to leverage LLMs, RAG, and LMMs in addressing complex business challenges. Design and implement models to effectively analyze data, predict outcomes, and provide insights.
Communication Skills: Excellent verbal and written communication skills, capable of articulating complex concepts and findings to both technical and non-technical stakeholders. Proven ability to collaborate with cross-functional teams to drive projects to completion.
Interdisciplinary Collaboration: Ability to work independently with experts from various fields, including chemical engineers, process engineers, and environmental scientists, to integrate diverse data sources and insights.
Experience and Education: 7+ years of experience manipulating datasets and building statistical models. A Bachelor's or Master’s degree in Statistics, Mathematics, Computer Science, or another quantitative field.

Desirable Knowledge/Skills/Abilities

Coding APIs: Knowledge of coding APIs and experience with languages like JavaScript and open-source frameworks such as Streamlit.
Cloud Services: Experience with cloud services (AWS, Azure, Google Cloud) and cloud data warehouses (Snowflake). Understanding how to leverage these for scalable data analysis.
Data Visualization: Proficiency in data visualization tools and libraries (e.g., PowerBI, Matplotlib, Seaborn) to communicate findings visually.
Data Analysis Tools: Experience using Alteryx for data analysis and manipulation.
Data Architecture and Pipelines: Knowledge of data architecture and pipelines, including experience with big data technologies and database management systems.
Enterprise Systems: Knowledge of SAP and Salesforce is a plus.
Chemistry Background: Background in Chemistry is an advantage.