toCaptions()

Converts the output from transcribe() into an array of Caption objects, so you can use the functions from @remotion/captions.

import {toCaptions, transcribe, resampleTo16Khz} from '@remotion/whisper-web';

const file = new File([], 'audio.wav');

const channelWaveform = await resampleTo16Khz({
  file,
});

const whisperWebOutput = await transcribe({
  channelWaveform,
  model: 'tiny.en',
});

const {captions} = toCaptions({
  whisperWebOutput,
});

console.log(captions); /*
 [
    {
      text: "William",
      startMs: 40,
      endMs: 420,
      timestampMs: 240,
      confidence: 0.813602,
    }, {
      text: " just",
      startMs: 420,
      endMs: 650,
      timestampMs: 480,
      confidence: 0.990905,
    }, {
      text: " hit",
      startMs: 650,
      endMs: 810,
      timestampMs: 700,
      confidence: 0.981798,
    }
  ]
*/

See also​

See also