Speech Recognition Node
Audio Processing Technical Deep Dive
The Speech Recognition Node performs automatic speech-to-text transcription on a provided audio file. This document details its technical specifications, including purpose, how it works, configuration schema, input/output data formats, and error handling.
Purpose and How the Node Works
The Speech Recognition Node converts audio input into textual form, enabling downstream tasks like text analysis, classification, or storage.
Input Resolution: Primarily takes a downloadUrl for the audio file. It can use $input (previous node output) or $secret (vault secrets) to construct this URL or inject authentication headers if needed.
Example Reference: https://example.com/audio/$input.filePath
Request/Processing:
The node fetches the audio file from the provided downloadUrl.
It sends the audio stream to an underlying speech recognition service.
It receives the transcribed text from the service.
Execution Model: Blocking – the node waits for the transcription to complete before the workflow proceeds.
Response Handling:
Success: Returns a JSON object containing the transcription.
Failure: Returns null in the transcription field and includes the appropriate statusCode and an error message.
Configuration Schema
The Speech Recognition Node has a straightforward configuration:
downloadUrl
string
✅
The direct URL to the audio file for transcription.
name
string
Optional
Optional display name for this node instance.
description
string
Optional
Optional description to document this node's purpose.
Note: No authentication is inherently required for the node itself to function, but the downloadUrl might require authentication handled via $secret in headers.
Output Schema
The node's output port (speechRecognitionResult) will always conform to the following schema:
Field
Type
Always
Description
transcription
string
✅
Contains the transcribed text.
statusCode
number
✅
HTTP-style status code for the outcome.
error
string
null
No
statusCode Details:
200: Success – Transcription generated successfully.
400: Invalid or inaccessible URL – The provided downloadUrl could not be resolved or accessed.
422: Unsupported file format – The file at downloadUrl is not a recognized audio format.
500: Internal transcription error – A service-side failure occurred during transcription.
Examples
Success Example
Configuration:
JSON
{
"downloadUrl": "https://example.com/audio-file.wav",
"name": "Meeting Transcription",
"description": "Transcribes the weekly team meeting audio."
}
Output:
JSON
{
"transcription": "This is the transcript of the audio file."
"statusCode": 200,
"error": null
}
Failure Example (Invalid URL)
Configuration:
JSON
{
"downloadUrl": "https://example.com/invalid-audio.wav"
}
Output:
JSON
{
"transcription": null,
"statusCode": 400,
"error": "Invalid or inaccessible URL: 'https://example.com/invalid-audio.wav' could not be resolved."
}
Single-Node Test API
For testing a node in isolation (e.g., via the UI "Test" button or a dedicated API), the following endpoint is used:
Path: skill-runtime/workflows/nodes/SpeechRecognition/execute
Method: POST
Purpose: Execute one node in isolation.
Request Body:
JSON
{
"config": {
"downloadUrl": "https://example.com/audio.wav"
},
"input": {}
}
Error Handling
Invalid or inaccessible URL
400
downloadUrl doesn't resolve or is forbidden.
Unsupported file format
422
Non-audio file or unsupported audio codec.
Internal transcription error
500
Service-side failure.
Security Notes
Secured Audio Files: If the audio file requires signed or secured access (e.g., from a private S3 bucket), use $secret to inject necessary headers or tokens into the downloadUrl resolution process.
Sensitive URL Logging: Do not log or persist sensitive downloadUrl values directly in workflow configurations or logs.
Transcription Content Redaction: Implement logging policies to redact transcription contents if they are flagged as sensitive, to prevent PII leakage in logs.
External Service Integration: Be aware that the underlying speech recognition service may be an external third-party provider. Ensure compliance with data privacy regulations regarding data sent to such services.
Last updated