Data Scientist (NLP)
Asia
Binance
Binance is the largest cryptocurrency exchange by trading volume, serving 185M+ users across 180+ countries. With over 350 listed Altcoins, it is the world’s leading crypto exchange.This position will be under our Risk AI team, focusing on NLP-related projects. You will utilize internal data to train models and develop applications based on these models. As a data scientist, you will leverage PB-level data and state-of-the-art machine learning infrastructure to create data products for millions of cryptocurrency users. You will collaborate with engineers, data analysts, business operations, and product/marketing managers to define and build solutions, features, algorithms, and products.
Responsibilities:
- Apply Natural Language Processing (NLP) techniques to preprocess, analyse, and extract insights from large textual datasets. Develop and fine-tune Large Language Models (LLMs) and multimodal models to derive actionable insights and enhance business decision-making processes.
- Work closely with business units to identify opportunities for leveraging company data and AI models to drive innovative business solutions and improve decision-making processes.
- Perform data cleaning, transformation, and preprocessing to create high-quality datasets for analysis and modeling. Ensure data integrity and consistency throughout the process.
- Conduct exploratory data analysis to uncover patterns, trends, and relationships within the data. Generate visualisations and summaries to effectively communicate findings to stakeholders and support data-driven decision-making.
- Stay abreast of the latest developments in artificial intelligence, with a particular focus on advancements in multimodal AI, to ensure the integration of cutting-edge technologies and methodologies into our data-driven solutions.
- Develop and apply feature engineering techniques to create meaningful features that improve the performance of models. This includes deriving new features from raw data, selecting relevant features, and transforming existing features to enhance model accuracy and efficiency.
Requirements:
- Holds a Master's degree or higher in Computer Science, Data Science, Statistics, Mathematics, Computational Linguistics, or a related field.
- A minimum of 3 years of relevant industry experience in AI/ML and Natural Language Processing is required. Experience in multimodal AI is highly preferred.
- Proficient in big data technologies such as Apache Spark, Apache Hadoop and Apache Kafka and VectorDB.
- Deep understanding of modern machine learning techniques and mathematical underpinning, such as classifications, neural networks, hyperparameter optimisation, etc.
- Solid understanding and practical experience with deep learning architectures, including transformer models (e.g., BERT, GPT). Ability to implement, optimize, and fine-tune these models for various tasks using techniques such as LoRA.
- Proficiency in programming languages such as Python, Java, or similar, with experience in machine learning (ML), natural language processing (NLP) libraries, and deep learning frameworks such as TensorFlow, PyTorch, Scikit-learn, SpaCy, and NLTK.
- Demonstrated experience in handling severely imbalanced datasets. Knowledge of techniques and strategies to address imbalances in data.
Binance is committed to being an equal opportunity employer. We believe that having a diverse workforce is fundamental to our success.By submitting a job application, you confirm that you have read and agree to our Candidate Privacy Notice.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture BERT Big Data Blockchain Computer Science Data analysis Deep Learning EDA Engineering Feature engineering Finance GPT Hadoop Java Kafka Linguistics LLMs LoRA Machine Learning Mathematics ML infrastructure NLP NLTK Privacy Python PyTorch Research Scikit-learn Security spaCy Spark Statistics TensorFlow
Perks/benefits: Career development Competitive pay Startup environment
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.