Loading…
Friday October 9, 2026 10:00am - 10:30am GMT-03
This paper presents the development of a high-speed video search system based on image features for a large-scale video archive, NHK Archives, without relying on manually created metadata. Conventional video retrieval systems typically require extensive manual metadata annotation. In contrast, the proposed system extracts still images from video clips and computes their features using the CLIP model, enabling multimodal search with text or images to retrieve semantically related scenes.
To achieve practical response times over a massive dataset, the system employs approximate nearest neighbor (ANN) search. In addition, a prototype function was developed to automatically generate program structures by combining speech recognition and large language models (LLMs), improving the interpretability of search results by clearly indicating the corresponding programs and scenes.
The system adopts a hybrid architecture in which computationally intensive processes—such as clip extraction, CLIP feature computation, and speech recognition—are executed on-premise, while structure generation and search operations are performed in the cloud. This approach significantly enhances the accessibility of video content while reducing the cost of metadata creation for large-scale video archives.
Speakers
avatar for Tomoya Kusunoki

Tomoya Kusunoki

Media System Engineer, Japan Broadcasting Corporation
Tomoya Kusunoki is an engineer at Japan Broadcasting Corporation (NHK), working at the Engineering and System Solutions Center. My work focuses on broadcast systems, media workflows, and archive-related technologies and systems.
Friday October 9, 2026 10:00am - 10:30am GMT-03
Grande Otelo Room

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link