Senior Level Data Scientist

Arlington, VA, United States

PeopleTec

Delivering world-class solutions to the Department of Defense and Civilian Federal Sectors from Huntsville, Alabama.

View all jobs at PeopleTec

Apply now Apply later

Responsibilities

PeopleTec is currently seeking a Senior Level Data Scientist to support our DC-area offices of the Chief Digital and AI Office in Falls Church, Pentagon, Alexandria, and Arlington locations.

 

Duties Include:

  • Develop a generalized tool to semantically search, summarize, and categorize unstructured data

  • Participate in DoD and government AI/ML Task Forces, connect with others in DoD working on similar capabilities, and share best practices with an LLM community of practice

  • Extend a generalized API deployed to NIPR to semantically search, summarize, and categorize unstructured data and enable others across the Department to use the API within the paradigm of CDAO / Advana 1.2's self-service model

  • Support the installation of the capability on other networks at different classification levels, including SIPRnet and JWICS

  • Includes a set of swappable containers with different functions that provide inputs and outputs through an API.

  • Develop methodology to test how Search performance (with varying levels of prompt engineering)

  • Contribute to and drive a demand signal for a data operations playbook for unstructured data

  • Develop a cost model for semantic search API use cases

  • Contribute to and drive a demand signal for a data operations playbook for unstructured data

  • Develop and document a strategy and implementation plan to ingest and consistently store unstructured data on the Advana platform, following the Bronze/Silver/Gold table paradigm (i.e. raw files in bronze, parsed/transformed data in silver, cleaned, processed and data available for query in gold)

  • Develop an approach to address issues arising from maintaining semantic indices associated with document change management and version control for unstructured data, such as when a new manual comes out to replace a previous version

Major Duties/Tasks:

  • Designs, configures, develops, tests, and supports informatics and data science solutions for a wide array of technical use cases;

  • Collaborate with cross-functional teams, including data scientists and software engineers to integrate AI solutions developed by other elements of CDAO or the DoD community into Search Portfolio products when appropriate

  • Optimize AI models for performance, scalability, and efficiency, leveraging cloud-based resources and distributed computing frameworks, specifically Apache Spark/Databricks. Ability to adapt code base to also run using GPU enabled Kubernetes clusters.

  • Stay updated on and contribute to the latest advancements in AI research, applying new findings to improve Search Portfolio products

  • Manage the lifecycle of AI/ML components used in Search Portfolio products from research and development to deployment and optimization

  • Applies analytical methodologies to diagnose data-related challenges, implement solutions, and evaluate performance;

  • Documents and presents requirements, design alternatives, and findings to team members and clients;

  • Ability to develop strategic, baselined, data modeling processes; ability to accurately determine cause-and-effect relationships; and

  • Experience with integrated development environments, data integration, data visualization, data mining, and analysis tools.

  • Maintains and guides the development of common libraries and tools used by multiple teams.

  • Aids in formulating a strategy on how to achieve rapid prototyping

Qualifications

Required Skills/Experience:

  • Experience with ML fields, e.g., natural language processing, computer vision, statistical learning theory
  • Hands-on experience with Natural Language Processing (NLP), Large Language Models, text embedding, semantic query, use of generative AI for text, and retrieval augmented generation (RAG)
  • Familiarity with data preprocessing, feature engineering, and model evaluation techniques essential for machine learning projects
  • Strong understanding of various machine learning algorithms, including supervised and unsupervised learning, reinforcement learning, and neural networks
  • Experience with version control systems like Git, enabling effective collaboration and code management
  • Experience in an ML engineer or data scientist role building ML models
  • Experience writing code in Python, R, Scala, Java, C++ with documentation for reproducibility
  • Experience using Apache Spark/Databricks distributed compute environments for AI/ML workloads
  • Experience handling petabyte size datasets, diving into data to discover hidden patterns, using data visualization tools, writing SQL, and working with GPUs to develop models
  • Experience with cloud-based data persistence products, especially RDS PostgreSQL and PostgreSQL extensions such as pgvector.
  • Experience writing and speaking about technical concepts to business, technical, and lay audiences and giving data-driven presentations
  • Travel: <10 %
  • Must be a U.S. Citizen
  • An active DoD Top Secret clearance with SCI eligibility is required to perform this work. Candidates are required to have an active Top Secret clearance with SCI eligibility upon hire, and the ability to maintain this level of clearance during their employment.

Education Requirements:

  • Bachelor’s degree plus 7-10 years experience, or a Masters Degree plus 5 years of experience.

Overview

People First. Technology Always.

 

PeopleTec, Inc. is an employee-owned small business founded in Huntsville, AL that provides exceptional customer support by employing and retaining a highly skilled workforce.

 

Culture: The name "PeopleTec" was deliberately chosen to remind us of our core value system - our people. Our company's foundation was built on placing our employees and customers first. With an award-winning atmosphere, we have matured into a company that boasts the best and brightest across multiple technical fields.

 

Career: At PeopleTec, we value your long-term goals. Whether it's through our continuing-education opportunities, our robust training programs, or our "People First" benefits package, PeopleTec truly believes that our best investments are our people.

 

Come Experience It.

#cjpost #dpost

 

EEO Statement

 

PeopleTec, Inc. is an Equal Employment Opportunity employer and provides reasonable accommodation for qualified individuals with disabilities and disabled veterans in its job application procedures. If you have any difficulty using our online system and you need an accommodation due to a disability, you may use the following email address, applicationhelp@peopletec.com and/or phone number (256.319.3800) to contact us about your interest in employment with PeopleTec, Inc.

 

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, genetic information, citizenship, ancestry, marital status, protected veteran status, disability status or any other status protected by federal, state, or local law. PeopleTec, Inc. participates in E-Verify.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Data Science Jobs

Tags: APIs Classification Computer Vision Databricks Data Mining DataOps Data visualization Engineering Feature engineering Generative AI Git GPU Java Kubernetes LLMs Machine Learning ML models NLP PostgreSQL Prompt engineering Prototyping Python R RAG Reinforcement Learning Research Scala Spark SQL Statistics Unstructured data Unsupervised Learning

Perks/benefits: Career development

Region: North America
Country: United States

More jobs like this