Speech Recognition Skill

Converting Voice to Text

The Speech Recognition Skill converts an audio file (speech) into text. This is essential for processing voice interactions, meeting recordings, or voicemails within your workflows.

Summarizing Customer Service Call Transcripts

Imagine your call center records customer service calls. You want an automated workflow to take these audio recordings, convert them to text, and then use other AI to summarize the call or extract key information.

The Challenge

Analyzing audio recordings manually is time-consuming and difficult to scale.

The Solution

Use a Speech Recognition Skill to automatically transcribe the call audio into text. This text can then be fed into other AI Skills (like a Prompt Skill for summarization or an Entity Recognition Skill).

Setting Up the Speech Recognition Skill

  1. Locate the Node: Drag and drop the Speech Recognition Skill onto your Workflow Builder canvas. Place it after a Skill that provides an audio file (e.g., a "Document Fetch Skill" that retrieves a call recording from storage).

  2. Configure "Media File Link": Provide the direct download URL for the audio file.

Configuring the Speech Recognition Skill with the audio file URL

Understanding the Outcome (Output)

The Speech Recognition Skill provides the following information after processing:

  • transcription: This is the full text transcription of the audio file. This is the primary output you'll use in subsequent nodes.

  • statusCode: A number indicating the result of the transcription attempt:

    • 200: Success – The transcription was generated successfully.

    • 400: Invalid or inaccessible URL – The provided mediaFileLink could not be resolved or accessed.

    • 500: Service-side failure – An internal transcription error occurred.

  • error: A string with a descriptive error message. This will be null if the statusCode is 200.