Project - MIXTAPE -- The Podcast Search Engine
MIXTAPE improves podcast discovery by enabling topic-based search of individual episodes using speech processing and natural language understanding.
- Financed by
- Horizon 2020
- Duration
- Keywords
- Media AI Indexing & Discovery
Overview
Podcasts have seen in the recent years a steady increase in numbers, both in terms of listeners and content providers. Apple reported in 2018 over 550,000 active podcast channels, with over 18.5 million episodes in over 100 languages worldwide. With so many resources available, finding the podcast episodes to listen to is one of the main issues that listeners face. The problem is that podcasts are generally organised in lists and catalogues for listeners to subscribe to, or they are sorted based on popularity.
In MIXTAPE we improve the way podcasts are searched by allowing listeners to find content based on topics of interest. For the first time it will be possible to find individual podcast episodes based on your preferences and precise interests. To enable this we will be using speech processing and natural language understanding adding these key components to our existing content management platform.
The project builds upon the MyMeedia Platform, IN2’s intelligent, modular, flexible content platform for orchestrating processing pipelines, including for:
- Text processing: language detection, stop-word elimination, tag-cloud computation, document classification, topic detection, named entity detection
- Audio processing: audio segmentation, classification (speech/non-speech, background), speaker classification and automatic speech recognition
The results of IN2's participation in Mixtape were brought into Feedle, a search engine for blogs and podcasts.
Our role
- Music/speech classification
- Speaker segmentation
- Speech recognition
- Topic detection and keyword extraction
- Pilots and Applications with the project's end users