Reference Data, Data Engineer

India - Hyderabad

Full Time Senior-level / Expert USD 58K - 109K *

Amgen

Amgen is committed to unlocking the potential of biology for patients suffering from serious illnesses by discovering, developing, manufacturing and delivering innovative human therapeutics.

View all jobs at Amgen

Apply now Apply later

Posted 3 hours ago

Career Category

Information Systems

Job Description

Join Amgen’s Mission of Serving Patients

At Amgen, if you feel like you’re part of something bigger, it’s because you are. Our shared mission—to serve patients living with serious illnesses—drives all that we do.

Since 1980, we’ve helped pioneer the world of biotech in our fight against the world’s toughest diseases. With our focus on four therapeutic areas –Oncology, Inflammation, General Medicine, and Rare Disease– we reach millions of patients each year. As a member of the Amgen team, you’ll help make a lasting impact on the lives of patients as we research, manufacture, and deliver innovative medicines to help people live longer, fuller happier lives.

Our award-winning culture is collaborative, innovative, and science based. If you have a passion for challenges and the opportunities that lay within them, you’ll thrive as part of the Amgen team. Join us and transform the lives of patients while transforming your career.

What you will do

As the Reference Data Product team member of the Enterprise Data Management organization, you will be responsible for managing and promoting the use of reference data, partnering with business Subject Mater Experts on creation of vocabularies / taxonomies and ontologies, and developing analytic solutions using semantic technologies.

Work with Reference Data Product Owner, external resources and other engineers as part of the product team
Develop and maintain semantically appropriate concepts
Identify and address conceptual gaps in both content and taxonomy
Maintain ontology source vocabularies for new or edited codes
Support product teams to help them leverage taxonomic solutions
Analyze the data from public/internal datasets.
Develop a Data Model/schema for taxonomy.
Create a taxonomy in Semaphore Ontology Editor.
Perform Bulk-import data templates into Semaphore to add/update terms in taxonomies.
Prepare SPARQL queries to generate adhoc reports.
Perform Gap Analysis on current and updated data to facilitate Interim Governance.
Maintain taxonomies in Semaphore through Change Management process.
Develop and optimize automated data ingestion / pipelines through Python/PySpark when APIs are available
Collaborate with cross-functional teams to understand data requirements and design solutions that meet business needs
Identify and resolve complex data-related challenges
Participate in sprint planning meetings and provide estimations on technical implementation

What we expect of you

Master’s degree and 4 to 6 years of Computer Science, IT, or related field experience OR

Bachelor’s degree and 6 to 8 years of Computer Science, IT, or related field OR

Diploma and 10 to 12 years of Computer Science, IT, or related field

Basic Qualifications:

Knowledge of controlled vocabularies, classification, ontology and taxonomy
Experience in ontology development using Semaphore, Topbraid or a similar tool
Experience performing document classification leveraging taxonomies and using a classification tool such as Semaphore
Hands on experience writing SPARQL queries on graph data
Excellent problem-solving skills and the ability to work with large, complex datasets
Strong understanding of data modeling, data warehousing, and data integration concepts

Preferred Qualifications:

Hands on experience writing SQL using any RDBMS (Redshift, Postgres, MySQL, Teradata, Oracle, etc.).
Experience using cloud services such as AWS or Azure or GCP
Experience working in Product Teams environment
Knowledge of Python/R, Databricks, cloud data platforms
Knowledge of NLP (Natural Language Processing) and AI (Artificial Intelligence) for extracting and standardizing controlled vocabularies.
Strong understanding of data governance frameworks, tools, and best practices

Professional Certifications :

Databricks Certificate preferred
SAFe® Practitioner Certificate preferred

Soft Skills:

Critical thinking and problem-solving skills
Excellent communication and collaboration skills
Demonstrated awareness of how to function in a team setting
Demonstrated presentation skills

What you can expect of us

As we work to develop treatments that take care of others, we also work to care for your professional and personal growth and well-being. From our competitive benefits to our collaborative culture, we’ll support your journey every step of the way.

In addition to the base salary, Amgen offers competitive and comprehensive Total Rewards Plans that are aligned with local industry standards.

Apply now

for a career that defies imagination

Objects in your future are closer than they appear. Join us.

careers.amgen.com

As an organization dedicated to improving the quality of life for people around the world, Amgen fosters an inclusive environment of diverse, ethical, committed and highly accomplished people who respect each other and live the Amgen values to continue advancing science to serve patients. Together, we compete in the fight against serious disease.

Amgen is an Equal Opportunity employer and will consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or any other basis protected by applicable law.

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 1 0 0

Category: Engineering Jobs

Tags: APIs AWS Azure Classification Computer Science Databricks Data governance Data management Data Warehousing GCP MySQL NLP Oracle Pipelines PostgreSQL PySpark Python R RDBMS Redshift Research SQL Teradata