Cloud-native architecture

What we built

Cloud migration so often means "limited in the same way but now on a managed VM".

This fulfilled the promise of abstraction and elasticity. Every asset in the archive became instantly available, searchable, and access-controlled, at genuinely unlimited scale.

Ingestion and migration

A custom Node.js orchestration service managed the transfer from local storage to cloud object storage, handling checksummed multipart uploads, triggering transcoding workflows for thumbnail and mezzanine generation, and managing the queue of assets awaiting migration. The entire historical archive was migrated off magnetic tape, using spare capacity in the legacy retrieval system to avoid disrupting live operations. The deployment, updating and configuration of these on-prem workers was managed through the service-provider management console.

Storage and metadata

We chose a NoSQL document store as the canonical data layer. This decision was driven by three scaling constraints: the schema was still being discovered through delivery and could not be pinned down in advance; write loads were extremely bursty, concentrated in the 24-48 hours after each race weekend; and our target was constant-time access to any individual asset and its metadata at any future scale.

A serverless pipeline of queues and functions transformed legacy metadata into structured records and maintained synchronisation between the document store and a dedicated search cluster which supported the complex but well-known server-bound BAU operations. The foundation of the archive remains infinitely scalable for other client applications.

Search and discovery

A search cluster provided sub-second full-text and faceted search across the entire archive. A GraphQL API shaped query responses precisely to the needs of the front-end application, eliminating round-trip overhead. The React application provided rapid UI-based filtering by driver, circuit, year, camera source, content type, and broadcast format, functionality that previously required manual metadata trawling and days of physical tape retrieval.

The permissions solution

We resolved the circular dependency by going below the search cluster's abstraction layer to its underlying engine: Apache Lucene. Within a serverless function, we instantiated Lucene as an in-process class, indexed the incoming asset's metadata into that ephemeral instance, and evaluated every group permission query against it. If a query returned a non-zero relevance score, the asset was visible to that group.

We computed the full set of permitted groups before the asset ever reached the production search cluster, which received only fully-permissioned records. This eliminated the circular dependency entirely, with no batch reprocessing and no eventual-consistency risk.

Access control at the edge

User authentication was federated across two cloud identity platforms, with group memberships encoded into authentication tokens. An edge function on the content distribution network checked each asset request against the document store using a primary-key lookup, a constant-time operation. Asset-level access control ran at the same speed regardless of whether the archive contained ten thousand or ten million items. The system could serve permissioned content to any number of concurrent users, anywhere in the world, with no bottleneck.

Tiered storage management

The application provided editorial teams with self-service controls over storage tiers. Master files could be sent to deep archival storage to reduce cost, while lower-resolution copies remained immediately available. Editors could request temporary restoration of full-resolution masters, with automatic return to archival storage after a defined period.