From days to milliseconds
A global sports organisation needed to transform a 70-year, multi-petabyte broadcast archive from physical tape storage into a cloud-native system capable of serving content instantly at unlimited scale.
The world's premier single-seater motorsport championship had a problem most organisations would envy: 50 years of race history, meticulously captured on tape, completely inaccessible at speed. Finding a single clip meant knowing where to look, who to ask, and waiting. The MARS project (Media Asset Retrieval System) was the mandate to change that.
The brief
Transform a tape-based archive spanning 1972 to the present into an instant-access, cloud-native platform. Every frame of footage, every piece of commentary, every telemetry snapshot: searchable, retrievable, and deliverable within seconds to broadcast teams around the world.
The scope: 15 petabytes of media, 2 million-plus content pieces, tens of millions of timestamped metadata tags. Timeline: three years.
Architecture decisions
The core challenge was not storage (S3 solved that) but retrieval. The archive had been organised around physical tape structure, not semantic meaning. Migrating the bytes was the easy part. Building the metadata layer that made them findable was the work.
We designed an event-driven, serverless architecture on AWS: Lambda for processing, AppSync GraphQL for the API layer, DynamoDB for hot metadata, OpenSearch for full-text and faceted search, and MediaConvert for on-demand transcoding. S3 held the canonical archive; CloudFront served derivatives at edge.
Every ingest triggered an event chain: Lambda extracted technical metadata, cross-referenced against the race calendar, enriched with existing editorial tags, and indexed into OpenSearch, all without human intervention once the pipeline was running.
The migration
Fifteen petabytes cannot travel over a wire in any reasonable timeframe. We used AWS Snowball Edge units on-site at the archive facility, staging content in batches organised by decade, championship, and asset type. Each batch was validated against checksums before the source tapes were marked for retirement.
The migration ran in parallel with platform development. Production went live on a rolling basis as decades of content came online, rather than waiting for a big bang launch.
What shipped
A broadcast-grade search and delivery platform used by the championship's global production teams, accessible from any connected device, returning results in under two seconds from a corpus that previously required days of manual research to navigate.
From first engineer on the project through to final operations handover: three years, one architecture, no data loss.
Photo by Rob Wingate on Unsplash
