Index videos using Azure Media Indexer and generate captions automatically

Last week it was announced the first version of Azure Media Indexer processor, which will allow us to analyze our media content with the goal of being able to search and get the timestamp at which the keywords are used we are looking. Other useful thing is the ability to automatically generate captions. The way to work with this new processor is exactly the same as when we perform transcoding with Windows Azure Media Encoder:

using Microsoft.WindowsAzure.MediaServices.Client;
using System;
using System.IO;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace Indexer
    class Program
        static void Main(string[] args)
            //0. Constants
            const string AssetName = "brain-mp4-Source";
            const string AccountName = "[YOUR_ACCOUNT_NAME]";
            const string AccountKey = "[YOUR_ACCOUNT_KEY]";

            //1. Install Nuget packages
            //1.1 Nuget: Install-Package windowsazure.mediaservices

            //2. Get AMS context
            var context = new CloudMediaContext(AccountName, AccountKey);

            //.3 Get the asset to index

            var asset = context.Assets.Where(a => a.Name == AssetName).FirstOrDefault();

            //3. Get Indexer Processor
            var processor = context.MediaProcessors.GetLatestMediaProcessorByName("Azure Media Indexer");

            //4. Create a job
            var job = context.Jobs.Create("Indexing job for " + AssetName);

            //5. Get the configuration
            var configuration = File.ReadAllText("IndexerConfigurationTask.xml");

            //6. Create a task
            var task = job.Tasks.AddNew("Indexing task",


            task.OutputAssets.AddNew(string.Format("{0} Indexed", asset.Name), AssetCreationOptions.None);


            // Check job execution and wait for job to finish.
            Task progressJobTask = job.GetExecutionProgressTask(CancellationToken.None);

            Console.WriteLine("Job finished. Final state: {0}", job.State);

For this type of task, we use the following configuration in XML, where we use metadata in order to improve the interpretation of the words spoken. In this case, I used a famous video from Ted Talks.

<?xml version="1.0" encoding="utf-8"?>
<configuration version="2.0">
    <metadata key="title" value="Helen Fisher: The brain in love " />
    <metadata key="description" value="Why do we crave love so much, even to the point that we would die for it? To learn more about our very real, very physical need for romantic love, Helen Fisher and her research team took MRIs of people in love and people who had just been dumped." />

Once the process is complete, we can see that we have the following files:

indexer result

In this post, to check the result, I downloaded the file with TTML extension and I used the HTML 5 video tag to view the generated subtitles.

<!DOCTYPE html>
    <video controls autoplay>
        <source type="video/mp4" src="">
        <track src="brain.ttml" label="English captions" kind="subtitles" srclang="en-us" default>

To check the results you need to use a browser that has implemented TTML. In this case, we use Internet Explorer 11:

Azure Media Indexer TTML

For now It only recognizes English.

Happy indexing!