Speech and Singing Alignment

This module transcribes speech and singing from an input audio file, converting spoken and sung content into textual form for improved accessibility, translation, or further analysis.

Documentation

Settings

  • Name
    use_segments
    Type
    boolean
    Description

  • Name
    language
    Type
    string
    Description

    The language of the input audio file. Must be one of the supported language codes

Input

  • Name
    subtitleInputFileUrl
    Type
    string
    Description

    Subtitle file you want to align with the audio

  • Name
    audioInputFileUrl
    Type
    string
    Description

    Audio file containing speech or singing that you want to align with the subtitles

Output

  • Name
    alignedByWord
    Type
    string
    Description

    JSON file containing individual words and timestamps aligned with the input audio

  • Name
    alignedByLine
    Type
    string
    Description

    JSON file containing lines from the subtitle file and timestamps aligned with the input audio

Ready to take your project to the next level?

Start now — or reach out for assistance.