---
comments: true
---
# Multilingual Speech Recognition pipeline User Guide
## 1. Introduction to Multilingual Speech Recognition pipeline
Speech recognition is an advanced tool that can automatically convert spoken languages into corresponding text or commands. This technology plays an important role in various fields such as intelligent customer service, voice assistants, and meeting records. Multilingual speech recognition supports automatic language detection and recognition of multiple languages.
| Method |
Description |
Parameter |
Parameter Type |
Parameter Description |
Default Value |
print() |
Print the result to the terminal |
format_json |
bool |
Whether to format the output content using JSON indentation |
True |
indent |
int |
Specify the indentation level to beautify the output JSON data, making it more readable. Effective only when format_json is True |
4 |
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False will retain the original characters. Effective only when format_json is True |
False |
save_to_json() |
Save the result as a JSON file |
save_path |
str |
Path to save the file. When it is a directory, the saved file name is consistent with the input file type naming |
None |
indent |
int |
Specify the indentation level to beautify the output JSON data, making it more readable. Effective only when format_json is True |
4 |
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False will retain the original characters. Effective only when format_json is True |
False |
- Calling the `print()` method will print the result to the terminal, with the printed content explained as follows:
- `input_path`: The path where the input audio is stored
- `result`: Recognition result
- `text`: The text result of speech recognition
- `segments`: The result text with timestamps
* `id`: ID
* `seek`: Audio segment pointer
* `start`: Segment start time
* `end`: Segment end time
* `text`: Text recognized in the segment
* `tokens`: Token IDs of the segment text
* `temperature`: Speed variation ratio
* `avg_logprob`: Average log probability
* `compression_ratio`: Compression ratio
* `no_speech_prob`: Non-speech probability
- `language`: Recognized language
- Calling the `save_to_json()` method will save the above content to the specified `save_path`. If specified as a directory, the saved path will be `save_path/{your_audio_basename}.json`; if specified as a file, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, the `numpy.array` types will be converted to lists.
* Additionally, it also supports obtaining visualized images and prediction results through attributes, as follows:
Multilingual Service Call Examples
Python
import base64
import requests
API_URL = "http://localhost:8080/video-classification" # Service URL
video_path = "./demo.mp4"
output_video_path = "./out.mp4"
# Encode local video to Base64
with open(video_path, "rb") as file:
video_bytes = file.read()
video_data = base64.b64encode(video_bytes).decode("ascii")
payload = {"video": video_data} # Base64 encoded file content or video URL
# Call API
response = requests.post(API_URL, json=payload)
# Process API response
assert response.status_code == 200
result = response.json()["result"]
with open(output_video_path, "wb") as file:
file.write(base64.b64decode(result["video"]))
print(f"Output video saved at {output_video_path}")
print("\nCategories:")
print(result["categories"])
C++
#include <iostream>
#include "cpp-httplib/httplib.h" // https://github.com/Huiyicc/cpp-httplib
#include "nlohmann/json.hpp" // https://github.com/nlohmann/json
#include "base64.hpp" // https://github.com/tobiaslocker/base64
int main() {
httplib::Client client("localhost:8080");
const std::string videoPath = "./demo.mp4";
const std::string outputImagePath = "./out.mp4";
httplib::Headers headers = {
{"Content-Type", "application/json"}
};
// Encode local video to Base64
std::ifstream file(videoPath, std::ios::binary | std::ios::ate);
std::streamsize size = file.tellg();
file.seekg(0, std::ios::beg);
std::vector<char> buffer(size);
if (!file.read(buffer.data(), size)) {
std::cerr << "Error reading file." << std::endl;
return 1;
}
std::string bufferStr(reinterpret_cast<const char*>(buffer.data()), buffer.size());
std::string encodedImage = base64::to_base64(bufferStr);
nlohmann::json jsonObj;
jsonObj["video"] = encodedImage;
std::string body = jsonObj.dump();
// Call API
auto response = client.Post("/video-classification", headers, body, "application/json");
// Process API response
if (response && response->status == 200) {
nlohmann::json jsonResponse = nlohmann::json::parse(response->body);
auto result = jsonResponse["result"];
encodedImage = result["video"];
std::string decodedString = base64::from_base64(encodedImage);
std::vector<unsigned char> decodedImage(decodedString.begin(), decodedString.end());
std::ofstream outputImage(outPutImagePath, std::ios::binary | std::ios::out);
if (outputImage.is_open()) {
outputImage.write(reinterpret_cast<char*>(decodedImage.data()), decodedImage.size());
outputImage.close();
std::cout << "Output video saved at " << outPutImagePath << std::endl;
} else {
std::cerr << "Unable to open file for writing: " << outPutImagePath << std::endl;
}
auto categories = result["categories"];
std::cout << "\nCategories:" << std::endl;
for (const auto& category : categories) {
std::cout << category << std::endl;
}
} else {
std::cout << "Failed to send HTTP request." << std::endl;
return 1;
}
return 0;
}
Java
import okhttp3.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.node.ObjectNode;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;
public class Main {
public static void main(String[] args) throws IOException {
String API_URL = "http://localhost:8080/video-classification"; // Service URL
String videoPath = "./demo.mp4"; // Local video
String outputImagePath = "./out.mp4"; // Output video
// Encode local video to Base64
File file = new File(videoPath);
byte[] fileContent = java.nio.file.Files.readAllBytes(file.toPath());
String videoData = Base64.getEncoder().encodeToString(fileContent);
ObjectMapper objectMapper = new ObjectMapper();
ObjectNode params = objectMapper.createObjectNode();
params.put("video", videoData); // Base64 encoded file content or video URL
// Create OkHttpClient instance
OkHttpClient client = new OkHttpClient();
MediaType JSON = MediaType.Companion.get("application/json; charset=utf-8");
RequestBody body = RequestBody.Companion.create(params.toString(), JSON);
Request request = new Request.Builder()
.url(API_URL)
.post(body)
.build();
// Call API and process API response
try (Response response = client.newCall(request).execute()) {
if (response.isSuccessful()) {
String responseBody = response.body().string();
JsonNode resultNode = objectMapper.readTree(responseBody);
JsonNode result = resultNode.get("result");
String base64Image = result.get("video").asText();
JsonNode categories = result.get("categories");
byte[] videoBytes = Base64.getDecoder().decode(base64Image);
try (FileOutputStream fos = new FileOutputStream(outputImagePath)) {
fos.write(videoBytes);
}
System.out.println("Output video saved at " + outputImagePath);
System.out.println("\nCategories: " + categories.toString());
} else {
System.err.println("Request failed with code: " + response.code());
}
}
}
}
Go
package main
import (
"bytes"
"encoding/base64"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
)
func main() {
API_URL := "http://localhost:8080/video-classification"
videoPath := "./demo.mp4"
outputImagePath := "./out.mp4"
// Base64 encode the local video
videoBytes, err := ioutil.ReadFile(videoPath)
if err != nil {
fmt.Println("Error reading video file:", err)
return
}
videoData := base64.StdEncoding.EncodeToString(videoBytes)
payload := map[string]string{"video": videoData} // Base64 encoded file content or video URL
payloadBytes, err := json.Marshal(payload)
if err != nil {
fmt.Println("Error marshaling payload:", err)
return
}
// Call the API
client := &http.Client{}
req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes))
if err != nil {
fmt.Println("Error creating request:", err)
return
}
res, err := client.Do(req)
if err != nil {
fmt.Println("Error sending request:", err)
return
}
defer res.Body.Close()
// Handle the API response
body, err := ioutil.ReadAll(res.Body)
if err != nil {
fmt.Println("Error reading response body:", err)
return
}
type Response struct {
Result struct {
Image string `json:"video"`
Categories []map[string]interface{} `json:"categories"`
} `json:"result"`
}
var respData Response
err = json.Unmarshal([]byte(string(body)), &respData)
if err != nil {
fmt.Println("Error unmarshaling response body:", err)
return
}
outputImageData, err := base64.StdEncoding.DecodeString(respData.Result.Image)
if err != nil {
fmt.Println("Error decoding base64 video data:", err)
return
}
err = ioutil.WriteFile(outputImagePath, outputImageData, 0644)
if err != nil {
fmt.Println("Error writing video to file:", err)
return
}
fmt.Printf("Image saved at %s.mp4\n", outputImagePath)
fmt.Println("\nCategories:")
for _, category := range respData.Result.Categories {
fmt.Println(category)
}
}
C#
using System;
using System.IO;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;
class Program
{
static readonly string API_URL = "http://localhost:8080/video-classification";
static readonly string videoPath = "./demo.mp4";
static readonly string outputImagePath = "./out.mp4";
static async Task Main(string[] args)
{
var httpClient = new HttpClient();
// Base64 encode the local video
byte[] videoBytes = File.ReadAllBytes(videoPath);
string video_data = Convert.ToBase64String(videoBytes);
var payload = new JObject{ { "video", video_data } }; // Base64 encoded file content or video URL
var content = new StringContent(payload.ToString(), Encoding.UTF8, "application/json");
// Call the API
HttpResponseMessage response = await httpClient.PostAsync(API_URL, content);
response.EnsureSuccessStatusCode();
// Handle the API response
string responseBody = await response.Content.ReadAsStringAsync();
JObject jsonResponse = JObject.Parse(responseBody);
string base64Image = jsonResponse["result"]["video"].ToString();
byte[] outputImageBytes = Convert.FromBase64String(base64Image);
File.WriteAllBytes(outputImagePath, outputImageBytes);
Console.WriteLine($"Output video saved at {outputImagePath}");
Console.WriteLine("\nCategories:");
Console.WriteLine(jsonResponse["result"]["categories"].ToString());
}
}
Node.js
const axios = require('axios');
const fs = require('fs');
const API_URL = 'http://localhost:8080/video-classification'
const videoPath = './demo.mp4'
const outputImagePath = "./out.mp4";
let config = {
method: 'POST',
maxBodyLength: Infinity,
url: API_URL,
data: JSON.stringify({
'video': encodeImageToBase64(videoPath) // Base64 encoded file content or video URL
})
};
// Base64 encode the local video
function encodeImageToBase64(filePath) {
const bitmap = fs.readFileSync(filePath);
return Buffer.from(bitmap).toString('base64');
}
// Call the API
axios.request(config)
.then((response) => {
// Process the API response
const result = response.data["result"];
const videoBuffer = Buffer.from(result["video"], 'base64');
fs.writeFile(outputImagePath, videoBuffer, (err) => {
if (err) throw err;
console.log(`Output video saved at ${outputImagePath}`);
});
console.log("\nCategories:");
console.log(result["categories"]);
})
.catch((error) => {
console.log(error);
});
PHP
<?php
$API_URL = "http://localhost:8080/video-classification"; // Service URL
$video_path = "./demo.mp4";
$output_video_path = "./out.mp4";
// Base64 encode the local video
$video_data = base64_encode(file_get_contents($video_path));
$payload = array("video" => $video_data); // Base64 encoded file content or video URL
// Call the API
$ch = curl_init($API_URL);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
// Process the API response
$result = json_decode($response, true)["result"];
file_put_contents($output_video_path, base64_decode($result["video"]));
echo "Output video saved at " . $output_video_path . "\n";
echo "\nCategories:\n";
print_r($result["categories"]);
?>
API Reference
For the main operations provided by the service:
- The HTTP request method is POST.
- Both the request body and response body are JSON data (JSON objects).
- When the request is processed successfully, the response status code is
200, and the properties of the response body are as follows:
| Name |
Type |
Meaning |
logId |
string |
The UUID of the request. |
errorCode |
integer |
Error code. Fixed as 0. |
errorMsg |
string |
Error message. Fixed as "Success". |
result |
object |
The result of the operation. |
- When the request is not processed successfully, the properties of the response body are as follows:
| Name |
Type |
Meaning |
logId |
string |
The UUID of the request. |
errorCode |
integer |
Error code. Same as the response status code. |
errorMsg |
string |
Error message. |
The main operations provided by the service are as follows:
Perform multilingual speech recognition on audio.
POST /multilingual-speech-recognition
- The properties of the request body are as follows:
| Name |
Type |
Meaning |
Required |
audio |
string |
The URL or path of the audio file accessible by the server. |
Yes |
- When the request is processed successfully, the
result of the response body has the following properties:
| Name |
Type |
Meaning |
text |
string |
The text result of speech recognition. |
segments |
array |
The result text with timestamps. |
language |
string |
The recognized language. |
Each element in segments is an object with the following properties:
| Name |
Type |
Meaning |
id |
integer |
The ID of the audio segment. |
seek |
integer |
The pointer of the audio segment. |
start |
number |
The start time of the audio segment. |
end |
number |
The end time of the audio segment. |
text |
string |
The recognized text of the audio segment. |
tokens |
array |
The token IDs of the audio segment. |
temperature |
number |
The speed change ratio. |
avgLogProb |
number |
The average log probability. |
compressionRatio |
number |
The compression ratio. |
noSpeechProb |
number |
The probability of no speech. |