Last week it was announced the first version of Azure Media Indexer processor, which will allow us to analyze our media content with the goal of being able to search and get the timestamp at which the keywords are used we are looking. Other useful thing is the ability to automatically generate captions. The way to work with this new processor is exactly the same as when we perform transcoding with Windows Azure Media Encoder:
using Microsoft.WindowsAzure.MediaServices.Client; using System; using System.IO; using System.Linq; using System.Threading; using System.Threading.Tasks; namespace Indexer { class Program { static void Main(string[] args) { //0. Constants const string AssetName = "brain-mp4-Source"; const string AccountName = "[YOUR_ACCOUNT_NAME]"; const string AccountKey = "[YOUR_ACCOUNT_KEY]"; //1. Install Nuget packages //1.1 Nuget: Install-Package windowsazure.mediaservices //2. Get AMS context var context = new CloudMediaContext(AccountName, AccountKey); //.3 Get the asset to index var asset = context.Assets.Where(a => a.Name == AssetName).FirstOrDefault(); //3. Get Indexer Processor var processor = context.MediaProcessors.GetLatestMediaProcessorByName("Azure Media Indexer"); //4. Create a job var job = context.Jobs.Create("Indexing job for " + AssetName); //5. Get the configuration var configuration = File.ReadAllText("IndexerConfigurationTask.xml"); //6. Create a task var task = job.Tasks.AddNew("Indexing task", processor, configuration, TaskOptions.None); task.InputAssets.Add(asset); task.OutputAssets.AddNew(string.Format("{0} Indexed", asset.Name), AssetCreationOptions.None); job.Submit(); // Check job execution and wait for job to finish. Task progressJobTask = job.GetExecutionProgressTask(CancellationToken.None); progressJobTask.Wait(); Console.WriteLine("Job finished. Final state: {0}", job.State); Console.ReadLine(); } } }
For this type of task, we use the following configuration in XML, where we use metadata in order to improve the interpretation of the words spoken. In this case, I used a famous video from Ted Talks.
<?xml version="1.0" encoding="utf-8"?> <configuration version="2.0"> <input> <metadata key="title" value="Helen Fisher: The brain in love " /> <metadata key="description" value="Why do we crave love so much, even to the point that we would die for it? To learn more about our very real, very physical need for romantic love, Helen Fisher and her research team took MRIs of people in love and people who had just been dumped." /> </input> <settings> </settings> </configuration>
Once the process is complete, we can see that we have the following files:
- JobResult.txt: This is the log of the task.
- brain.mp4.aib: Audio indexing blob file used by SQL Server using Azure Media Indexer SQL Add-on.
- brain.mp4.kw.xml: XML with keywords found during the process.
- brain.mp4.smi: File for Microsoft Synchronized Accessible Media Interchange (SAMI), used by Windows Media Player.
- brain.mp4.ttml: file that contains captions in Timed Text Markup Language format
In this post, to check the result, I downloaded the file with TTML extension and I used the HTML 5 video tag to view the generated subtitles.
<!DOCTYPE html> <html> <head> <title></title> </head> <body> <video controls autoplay> <source type="video/mp4" src="https://gismedia.blob.core.windows.net/asset-9e046f86-169b-4184-bab0-ab8973cea239/brain.mp4?sv=2012-02-12&sr=c&si=ccbce462-05aa-45e0-9ff9-5a11ad11257c&sig=1kRY5epwh0HiielC%2F5em8bixUQm3CtyRi7v%2BBgN2qf8%3D&st=2014-09-16T07%3A12%3A19Z&se=2016-09-15T07%3A12%3A19Z"> <track src="brain.ttml" label="English captions" kind="subtitles" srclang="en-us" default> </video> </body> </html>
To check the results you need to use a browser that has implemented TTML. In this case, we use Internet Explorer 11:
For now It only recognizes English.
Happy indexing!