Sr Specialist Data Scientist (machine learning, statistical modeling and NLP applications)
IND:KA:Bengaluru / Innovator Building, Itpb, Whitefield Rd - Adm: Intl Tech Park, Innovator Bldg, India
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
AT&T
Shop deals on new phones, including iPhone 16 & Galaxy S25, unlimited data plans & AT&T Fiber. Get 24/7 support, pay bills, and manage your account online.Job Description:
Key Responsibilities
**Text Embeddings & NLP**
- Design and implement pipelines leveraging text embeddings for semantic search, classification, clustering, and document retrieval.
- Work with embedding techniques such as TF-IDF, Word2Vec, GloVe, FastText, and transformer-based models including BERT, Sentence-BERT, OpenAI, and Azure OpenAI embeddings.
- Apply dimensionality reduction methods (PCA, t-SNE, UMAP) to analyze and visualize embedding spaces.
- Use cosine similarity, Euclidean distance, and approximate nearest neighbor algorithms like FAISS and ScaNN for similarity search and clustering.
- Integrate embedding outputs into downstream applications such as intent detection, topic modeling, semantic deduplication, document ranking, and retrieval systems.
**Traditional Machine Learning & Statistical Modeling**
- Build and deploy predictive models with logistic/linear regression, random forests, gradient boosting techniques (XGBoost, LightGBM), SVM, Naive Bayes, k-means, and hierarchical clustering.
- Employ statistical inference techniques including hypothesis testing, confidence intervals, bootstrapping, Bayesian inference, multicollinearity diagnostics, residual analysis, and time series forecasting (ARIMA, SARIMA).
- Evaluate model performance using ROC/Precision-Recall curves, AUC, confusion matrices, F1-score, lift/gain charts, and KS statistics.
- Conduct feature selection via Lasso/Ridge regression, recursive feature elimination (RFE), and SHAP values for interpretability.
**Experimentation & Causal Inference**
- Design and analyze A/B and multivariate tests, DOE experiments, and sophisticated causal inference methods including propensity score matching, causal forests, and difference-in-differences.
- Translate experimental results into clear, actionable business insights that drive measurable outcomes.
**Data Engineering & Productionization**
- Develop scalable data pipelines using PySpark, SQL, and Azure Data Factory on platforms including Azure Data Lake, Databricks, MongoDB, and Cosmos DB.
- Deploy machine learning solutions with FastAPI, Docker containers, and Azure App Services endpoints, while monitoring model health with MLflow and model drift.
**Collaboration & Leadership**
- Partner effectively with engineering, product, and business teams to define problem statements and deliver impactful solutions.
- Lead technical discussions, perform code reviews, and mentor junior data scientists to foster technical growth.
- Communicate complex analytical insights clearly to both technical and non-technical stakeholders.
Required Skills and Qualifications
Hands-on experience in machine learning, statistical modeling, and NLP applications.
- Deep expertise in text embeddings and their real-world applications.
- Proficiency in Python, PySpark, and SQL.
- Strong foundation in statistical inference, model diagnostics, and evaluation metrics.
- Experience working with Azure cloud ecosystem, Databricks, and production deployment of ML models.
- Proven ability to design, execute, and interpret experiments with statistical rigor.
Preferred (Good-to-Have) Skills
- Familiarity with transformer-based large language models (LLMs), LangChain, or OpenAI APIs.
- Experience with MLOps tools such as MLflow and Github Actions CI/CD pipelines with Azure App Services.
- Exposure to graph analytics, retrieval-augmented generation (RAG) pipelines, or agent-based systems.
Day-to-Day Responsibilities
You will architect and implement advanced NLP and machine learning pipelines leveraging diverse text embeddings for semantic search, classification, and clustering tasks. Applying sound statistical modeling and causal inference techniques, you will lead experimentation efforts and build scalable data workflows using PySpark, SQL, and Azure services. Cross-functional collaboration will be a core part of your role as you translate analytical insights into strategic business outcomes.
Weekly Hours:
40Time Type:
RegularLocation:
IND:KA:Bengaluru / Innovator Building, Itpb, Whitefield Rd - Adm: Intl Tech Park, Innovator BldgIt is the policy of AT&T to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, AT&T will provide reasonable accommodations for qualified individuals with disabilities. AT&T is a fair chance employer and does not initiate a background check until an offer is made.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs Azure Bayesian BERT Causal inference CI/CD Classification Clustering Cosmos DB Databricks Data pipelines Docker Engineering FAISS FastAPI FastText GitHub GloVe LangChain LightGBM LLMs Machine Learning MLFlow ML models MLOps MongoDB NLP OpenAI Pipelines PySpark Python RAG SBERT SQL Statistical modeling Statistics Testing Topic modeling Word2Vec XGBoost
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.