2025-0155 Big Data AI Tech for Raw Data to Searchable Archives (NS) - WED 11 Jun
The Hague, South Holland, Netherlands
EMW, Inc.
Deadline Date: Wednesday 11 June 2025
Requirement: Big Data and AI Technology for Raw Data to Searchable Archives - Data Processing Pipeline
Location: The Hague, NL
Full Time On-Site: Yes
Time On-Site: 100%
Period of Performance: 2025 BASE: As soon as possible but no later than 30 June 2025 – 19 December 2025
2026 Option 1: 5 January 2026 – 30 June 2026
2026 Option 2: 1 July 2026 – 18 December 2026
Required Security Clearance: NATO SECRET
1 INTRODUCTION
The NATO Information and Communication Agency (NCIA) located in The Hague, Netherlands, is currently involved in processing vast amounts and highly variant data coming from theatre for the purpose of efficient archiving.
In light of these activities, within NCIA Chief Technology Office, the Exploiting Data Science and Artificial Intelligence (EDS&AI) team is tasked to apply Big Data and AI technology to prepare, run and adjust processing pipelines for processing various source data into archiving formats and metadata, and prepare for (semantic) search.
NATO has an obligation to support national investigations into situation that occurred in theatre. In order to support the different teams involved most optimal, the EDS&AI team brings the expertise to extract and exploit the vast and varied data on the table, by using the Agency’s high performance computing classified sandbox.
The EDS&AI team provides the core data science skills and technology needed for big data analysis and AI.
The EDS&AI team applies innovative technology to data whenever it is not possible to extract value with conventional approaches.
2 OBJECTIVE
This Statement of Work (SOW) describes the work necessary to provide specific AI and Data Exploitation activities – which the NCIA CTO/EDS&AI team provides – for processing raw data from theatre to searchable archives. The services described below will be provided to the NCIA CTO/EDS&AI team, as they deliver specialised Data Science and AI results to their stakeholders in NATO Headquarters and NATO Allied Command Operations.
Overarching objectives:
• make required documents from theatre accessible and searchable by archivists during execution
• capture document contents into long term preservation formats
• capture Functional Area System (FAS; back-up) contents into long term preservation formats
• identify (and remove) duplicate documents, records of temporary value and non-records that are not required for archiving
• provide (interim/final) data reports describing actions and results
3 SCOPE OF WORK
Under the direction of CTO-EDS&AI, the contractor will execute 4 week sprints which cover, in line with the overarching objectives, the following:
• Setting up / improving pipelines to process all required documents and that uniquely identifies and traces decisions and processing steps. This is to be conducted on the provided classified sandbox environment, with provided performance hardware and toolsets.
• Implementing / improving (missing) pipeline steps for marking duplicate files, based on file attributes, path (structure) and content (similarity), and rules for considering a file or structure a duplicate.
• Extracting document-format records from Functional Area Systems (FAS) databases and back-ups performed otherwise. Archiving SME’s and system SME’s are available for guidance on target formats and source system structure and data interpretation. Each FAS is processed separately; not all sprints touch upon this item.
• Processing / Monitoring progress of various office, image and video file types to the accepted archiving formats, including extraction of metadata and preparing search semantic indexes.
• Automating registering all processed documents with semantic indexes with the sandbox natural language search tool.
• Automating the final copy of all non-duplicate and extracted archive documents with content and metadata to the NATO archiving system.
• Reporting status, progress and statistics of the (raw) files being processed to archive formats, metadata and search indexes.
• Delivering full reporting of results, trace of pipeline steps taken and (stakeholder) accepted failures.
• Quarterly updates.
Not all listed items are expected to occur in all sprints. In general most items will translate to a build (new pipeline/processing step), execute (reported progress on data batches), improve (optimized or corrected pipeline or processing step) or monitor (check on logs and progressing statistics) action in most sprints.
Orchestrating pipelines are expected to utilizing KNIME. Reporting efforts are expected to target Microsoft Power BI dashboards. GitLab is expected to be used for source code management and documentation.
The content, scope timelines and acceptance criteria of each sprint will be agreed with the service delivery manager during the sprint-planning meeting, in writing.
4 DELIVERABLES AND PAYMENT MILESTONES
All of the deliverables and the underlying work under this contract will be in English. Deliverables will be sent via e-mail or removable media.
Disclosure of any or all of the deliverables to another third party, besides NCI Agency, will require the prior agreement of NCI Agency.
The contractor will provide services using an iterative approach using multiple sprints. Each sprint is scoped for a duration of 4 weeks. It is envisaged that there will be gap of at least 2 weeks each year in the summer period (July-September) and no services will be provided between Christmas and New Year’s Day. Payment milestones will be after each sprint completion.
The following deliverables are expected from this statement of work:
Deliverable 01: BASE 2025:7 sprints in 2025
Quantity: 7 (Number of sprints is an estimation, actual number of sprints will be adjusted depending on starting date.)
Payment Milestones: Upon completion of each sprint base on successful acceptance of Annex C -DAS
Deliverable 02: OPTION 1 2026 (1st half): 6 sprints
Quantity: 6
Cost Ceiling: Price will be determined by applying the price adjustment formula as outlined in CO‐115786‐ AAS+ Special Provisions article 6.5.
Payment Milestones: Upon completion of each sprint base on successful acceptance of Annex C -DAS
Deliverable 03: OPTION 2 2026 (2nd half): 6 sprints
Quantity: 6
Cost Ceiling: Price will be determined by applying the price adjustment formula as outlined in CO‐115786‐ AAS+ Special Provisions article 6.5.
Payment Milestones: Upon completion of each sprint base on successful acceptance of Annex C -DAS
1) Produce sprint completion reports (format: e-mail update on section 3 items), which include details of activities performed.
2) Participate in meetings and boards on as-needed basis, or as requested by NCI Agency.
The NCIA team reserves the possibility to exercise a number of options, based on the same sprint deliverable timeframe and cost, at a later time, depending on the project priorities and requirements.
The payment shall be dependent upon successful acceptance of the Sprint Report and Delivery Acceptance Sheet (DAS) – (Annex C).
Invoices shall be accompanied with a Delivery Acceptance Sheet (Annex C) signed by the Contractor and the project authority.
5 COORDINATION AND REPORTING
The Contractor shall provide services on-site at NATO Communications and Information Agency in The Hague. The Contractor shall coordinate with, and report to, the NCIA EDS&AI team in The Hague. One day, in-person visits to Brunsum and/or Mons are envisaged approximately every second sprint.
The Contractor shall be given access to necessary NATO IT systems, and comply with all necessary policies and procedures.
The Contractor shall participate in regular status update meetings and other meetings, physically in the office, or in person via electronic means using Conference Call capabilities, according to their manager’s instructions.
For each sprint to be considered as complete and payable, the contractor must report the outcome of his/her work during the sprint, first verbally during the retrospective meeting and then in writing within three (3) days after the sprint’s end date. A report in the format of a short email shall be sent to the nominated point of contact of the NCI Agency, mentioning briefly the work held and the development achievements during the sprint.
6 SCHEDULE
This task order will be active immediately after signing of the contract by both parties.
The 2025 BASE period of performance is as soon as possible but not later than 30 June 2025 to 19 December 2025.
If the 2026 options are exercised, the period of performance shall be 05 January to 30 June 2026, respectively 01 Jul to 18 December 2026.
7 SECURITY
The services required to be provided through this SOW require a valid NATO SECRET security clearance prior to the start of the engagement.
8 CONSTRAINTS
All the documentation provided under this statement of work will be based on NCI Agency templates and/or agreed with the NCIA service delivery point of contact.
All support, maintenance, documentation and required code will be stored under configuration management and/or in the provided NCI Agency tools.
9 PRACTICAL ARRANGEMENTS
The contractor is expected to provide services on-site at NATO Communications and Information Agency, The Hague, The Netherlands.
The services shall be provided during normal office hours following the on-site location calendar.
He or she will provide services under the direction and guidance of the CTO-EDS&AI or their designated representative.
ONE contractor must accomplish this work. In the event the contractor leaves during the contract period, a new contractor who has the proven required qualifications and is evaluated qualified and suitable shall replace them. All normal AAS+ Framework Contract Terms and Conditions apply.
10 SPECIFIC REQUIREMENTS
[See Requirements]
11 EDUCATIONAL QUALIFICATIONS
[See Requirements]
Requirements
10 SPECIFIC REQUIREMENTS
- At least 3 years’ of practical experience in the field of data science and/ or data analytics;
- Experience using data processing/visualization/analytics software packages and development environments, preferably such as KNIME, VS Code, GitLab, Power BI, Jupyter Lab, and Docker-based API;
- Experience with data processing Big Data, creating and utilizing containerized building blocks and running containers (APIs) on Kubernetes clusters;
- Experience with programming/scripting in languages like Python, R, SQL and working with data formats like CSV, XML, JSON;
- Experience performing content extraction from files/databases/systems, (LLM-based) embedding models, entity-extraction, key-word-extraction and content similarity measures;
- Creative, flexible and pro-active overcoming obstacles;
- Good drafting, communication and presentation skills in English, including technical and non-technical levels;
- High attention to detail and accuracy;
- Valid NATO SECRET Security Clearance.
11 EDUCATIONAL QUALIFICATIONS
- Master in Computer Science, Engineering or relevant field.
- A higher degree in Data Science is preferred.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs Big Data Computer Science CSV Data analysis Data Analytics Docker Engineering GitLab HPC JSON Jupyter KNIME Kubernetes LLMs Pipelines Power BI Python R Security SQL Statistics XML
Perks/benefits: Flex hours Startup environment Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.