Lead Data Scientist - Research and Development - Graph Intelligence
PA, Working at Home - Pennsylvania, United States
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Full Time Senior-level / Expert USD 108K - 201K
Highmark Health
Company :
Highmark HealthJob Description :
JOB SUMMARY
Are you an architect of interconnected data, driven by the belief that relationships hold the key to uncovering society's most complex challenges? Do you see medical records not just as disparate facts, but as a vast, dynamic network of patient journeys, disease progressions, and treatment efficacy? Highmark Health is seeking a groundbreaking Lead Data Scientist, Research & Development, specializing in Graph Intelligence, who will not just work with graph data, but will define the future of how we harness relational insights in healthcare.
This is a premier R&D position where you will lead the charge in inventing and proving out transformative graph-native analytical solutions. Your core mission is to push the boundaries of what's possible in advanced analytics by pioneering novel methodologies that leverage network science, knowledge graphs, and Graph Machine Learning (GML) to solve critical problems across the healthcare continuum. From personalizing patient care pathways to detecting complex fraud rings and understanding population health dynamics, your work will directly impact millions of lives.
As our Lead Graph Intelligence specialist, you will be the spearhead of cutting-edge research projects. This means deeply engaging with graph theory, building and enriching large-scale knowledge graphs, and developing next-generation Graph Neural Networks (GNNs), graph convolutional networks (GCNs), graph attention networks (GATs), and other advanced network algorithms. You'll architect unique graph embeddings, perform sophisticated link prediction, community detection, and anomaly detection on complex healthcare data. Your responsibilities will include designing rigorous experiments, building robust proof-of-concept models, and meticulously evaluating the performance and interpretability of these novel graph algorithms to ensure their real-world applicability.
You are not merely a data scientist; you are a relational data innovator with a strategic mindset. You inherently understand that the explicit modeling of entities and their relationships unlocks a deeper layer of intelligence that traditional tabular or sequential data models cannot. You will proactively identify opportunities to construct and leverage comprehensive healthcare knowledge graphs, integrating diverse patient, provider, claims, and clinical data to uncover hidden patterns, propagate insights through networks, and develop groundbreaking analytical solutions that exploit the rich, multi-modal structures within healthcare.
Leveraging your profound expertise in graph databases (e.g., Neo4j, ArangoDB, Amazon Neptune, Ontotext GraphDB), distributed graph processing frameworks (e.g., Apache Spark GraphX, Dask-Graph), and leading GML libraries (e.g., PyTorch Geometric, DGL, Spektral), you will conduct in-depth research, construct sophisticated predictive, prescriptive, and diagnostic models directly on graph structures. You will drive initiatives from theoretical concept to validated, scalable prototypes. You are a vigilant scout of the graph AI landscape, continuously scanning, rigorously evaluating, and championing the adoption of emerging graph platforms, algorithms, and tools. Furthermore, you will actively foster collaborations with leading academic institutions, healthcare research experts, and the broader graph community. Your contributions will extend to publishing seminal research findings in top-tier conferences and leading the dialogue on the transformative power of graph intelligence in healthcare.
ESSENTIAL RESPONSIBILITIES
- Work directly with the business to understand their business processes and aims, then identify how analytical solutions could help deliver value for them. This would include being accountable for:
- Outlining complex new use cases + creating high level impact estimates.
- Identifying data elements needed and where to get them (including proxies).
- Assembling data sets independently using knowledge of Highmark operational and analytic data structures.
- Delivering the analytical solution to several complex business problems simultaneously.
- Documenting objectives, assumptions and processes and enriching/expanding our standards as needed
- Select and apply the appropriate advanced modeling/machine learning techniques to these data sets to deliver business insight, ensuring that the final analysis is well researched, accurate, and documented. This requires: Proficiency of a substantial number of advanced analytical techniques and mastery of a few, evidenced by in-depth knowledge and delivery record (for example regression models, tree-based learning, neural networks, clustering techniques, natural language processing)
- Consult with the business to contextualize and translate the results of our analysis in a form which the business can understand and act upon. This will include: Written reports, presentation and data visualizations, and draws clear lines between the high-level problem specifications for a broad range of audiences, the analyses performed, and how the results link directly back to business objectives, and lead implementation which drives frontline workflow.
- Plan, prepare and deliver/coordinate all elements of several analyses largely independently in such a way that it is delivered on time, to a high standard and ready to implement on a production basis (including dissemination through the Organization's user systems). This includes identifying the best route to implementation (developing the analytical solution accordingly).
- Expertise and in-depth understanding of subject, be the face of major projects within ED&A, external presence/earned credibility (conferences, white papers, local/national associations); mentoring/teaching others
- Other duties as assigned or requested.
EDUCATION
Required
- Master's degree in Analytics, Mathematics, Physics, Computer and Information Science, Engineering Technology or related field OR Bachelor's Degree + 3 years of relevant work experience in lieu of a Master's Degree
Preferred
- Doctoral degree (Ph.D.) in Analytics, Mathematics, Physics, Computer and Information Science, Engineering Technology, or a related field.
EXPERIENCE
Required
5 years of Data Science
3 years Data Science (if PhD Education)
Preferred
- Deep Expertise in Graph Theory & Network Science: Comprehensive understanding of fundamental graph algorithms (centrality, community detection, pathfinding, clustering), knowledge graph principles, and network analysis techniques for complex systems.
- Advanced Graph Machine Learning (GML): Proven experience designing, implementing, and optimizing various Graph Neural Network (GNN) architectures, graph convolutional networks, and other graph-specific machine learning models for tasks like node classification, link prediction, and anomaly detection in graphs.
- Knowledge Graph Engineering: Hands-on experience in the entire lifecycle of knowledge graphs, including schema design (ontologies, RDF, OWL), data ingestion, graph construction, data cleaning, entity resolution, and advanced graph querying (e.g., Cypher, SPARQL).
- Graph Database & Platform Experience: Practical experience with one or more leading graph databases (e.g., Neo4j, Google Spanner Graph) and distributed graph processing frameworks.
- GML Libraries & Frameworks: Strong command of specialized GML libraries like PyTorch Geometric (PyG), Deep Graph Library (DGL), Spektral, or StellarGraph.
- Cloud Platform & MLOps: Experience deploying and managing ML models, particularly GML pipelines, in cloud environments (e.g., AWS, Azure, GCP) and familiarity with MLOps principles for research projects.
- Research & Publication Acumen: A track record of contributing to cutting-edge research, including peer-reviewed publications in top-tier conferences or demonstrated experience in driving novel solution development from concept to prototype.
- Healthcare Data Familiarity: Understanding of healthcare data domains (claims, clinical, EMR) and related ontologies or standards (e.g., SNOMED CT, ICD) is a significant plus.
- Experimental Design & Rigor: Demonstrated ability to design robust experiments, rigorously evaluate model performance, interpret complex results, and contribute to the scientific understanding of graph-based solutions.
LICENSES OR CERTIFICATIONS
Required
- None
Preferred
- None
SKILLS
- Analysis of business problems/needs
- Analytical and Logical Reasoning/Thinking
- Collaborative Problem Solving
- Data Analysis with SQL, BigQuery
- Statistical Analysis with Python, R
- Written & Oral Presentation Skills
- Basic proto-typing/front end skills
Language (other than English)
None
Travel Required
0% - 25&
PHYSICAL, MENTAL DEMANDS and WORKING CONDITIONS
Position Type
Office-Based
Teaches / trains others regularly
Occasionally
Travel regularly from the office to various work sites or from site-to-site
Never
Works primarily out-of-the office selling products/services (sales employees)
Never
Physical work site required
No
Lifting: up to 10 pounds
Frequently
Lifting: 10 to 25 pounds
Occasionally
Lifting: 25 to 50 pounds
Rarely
Disclaimer: The job description has been designed to indicate the general nature and essential duties and responsibilities of work performed by employees within this job title. It may not contain a comprehensive inventory of all duties, responsibilities, and qualifications required of employees to do this job.
Compliance Requirement: This position adheres to the ethical and legal standards and behavioral expectations as set forth in the code of business conduct and company policies.
As a component of job responsibilities, employees may have access to covered information, cardholder data, or other confidential customer information that must be protected at all times. In connection with this, all employees must comply with both the Health Insurance Portability Accountability Act of 1996 (HIPAA) as described in the Notice of Privacy Practices and Privacy Policies and Procedures as well as all data security guidelines established within the Company’s Handbook of Privacy Policies and Practices and Information Security Policy.
Furthermore, it is every employee’s responsibility to comply with the company’s Code of Business Conduct. This includes but is not limited to adherence to applicable federal and state laws, rules, and regulations as well as company policies and training requirements.
Pay Range Minimum:
$108,000.00Pay Range Maximum:
$201,800.00Base pay is determined by a variety of factors including a candidate’s qualifications, experience, and expected contributions, as well as internal peer equity, market, and business considerations. The displayed salary range does not reflect any geographic differential Highmark may apply for certain locations based upon comparative markets.
Highmark Health and its affiliates prohibit discrimination against qualified individuals based on their status as protected veterans or individuals with disabilities and prohibit discrimination against all individuals based on any category protected by applicable federal, state, or local law.
We endeavor to make this site accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact the email below.
For accommodation requests, please contact HR Services Online at HRServices@highmarkhealth.org
California Consumer Privacy Act Employees, Contractors, and Applicants Notice
Tags: Architecture AWS Azure BigQuery Classification Clustering Data analysis Engineering GCP Machine Learning Mathematics ML models MLOps Neo4j NLP PhD Physics Pipelines Privacy Python PyTorch R R&D RDF Research Security SNOMED Spark SQL Statistics Teaching
Perks/benefits: Career development Conferences Equity / stock options
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.