---
comments: true
---
# Document Image Preprocessing Pipeline Tutorial
## 1. Introduction to the Do Pipeline
The document image preprocessing pipeline integrates two major functions: document orientation classification and geometric distortion correction. The document orientation classification can automatically identify the four orientations of a document (0°, 90°, 180°, 270°) to ensure that the document is processed in the correct direction for subsequent tasks. The geometric distortion correction model is used to correct geometric distortions that occur during the document's photographing or scanning process, restoring the document to its original shape and proportions. This is suitable for digital document management, preprocessing for doc_preprocessor recognition, and any scenario where improving document image quality is necessary. Through automated orientation correction and distortion correction, this module significantly enhances the accuracy and efficiency of document processing, providing users with a more reliable foundation for image analysis. The pipeline also offers flexible service deployment options, supporting invocation using various programming languages on multiple hardware platforms. Moreover, it provides the capability for further development, allowing you to train and fine-tune on your own dataset based on this pipeline, with the trained models being seamlessly integrable.
**The general document image preprocessing pipeline includes optional document image orientation classification module and document image correction module** with the following models included.
| Parameter |
Description |
Type |
Options |
Default |
input |
Data to be predicted, supporting various input types, required |
Python Var|str|list |
- Python Var: Such as image data represented by
numpy.ndarray
- str: Such as the local path of an image file or PDF file:
/root/data/img.jpg; As URL link, such as the network URL of an image file or PDF file: example; As a local directory, which should contain images to be predicted, such as a local path: /root/data/ (currently does not support directory prediction for PDFs, PDF files need to be specified to the specific file path)
- List: List elements must be of the above types, such as
[numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"], ["/root/data1", "/root/data2"]
|
None |
use_doc_orientation_classify |
Whether to use the document orientation classification module |
bool|None |
- bool:
True or False;
- None: If set to
None, the default value initialized by the pipeline will be used, initialized to True;
|
None |
use_doc_unwarping |
Whether to use the document unwarping correction module |
bool|None |
- bool:
True or False;
- None: If set to
None, the default value initialized by the pipeline will be used, initialized to True;
|
None |
(3) Process the prediction results, where the prediction result for each sample is of `dict` type. Additionally, these results support operations such as printing, saving as an image, and saving as a `json` file.
| Method |
Description |
Parameter |
Type |
Description |
Default |
print() |
Prints the results to the terminal |
format_json |
bool |
Whether to format the output using JSON indentation |
True |
indent |
int |
Specifies the indentation level to beautify the output JSON data for better readability, effective only when format_json is True |
4 |
ensure_ascii |
bool |
Controls whether to escape non-ASCII characters as Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters, effective only when format_json is True |
False |
save_to_json() |
Saves the results as a JSON format file |
save_path |
str |
The file path to save, naming consistent with the input file type when it is a directory |
None |
indent |
int |
Specifies the indentation level to beautify the output JSON data for better readability, effective only when format_json is True |
4 |
ensure_ascii |
bool |
Controls whether to escape non-ASCII characters as Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters, effective only when format_json is True |
False |
save_to_img() |
Saves the results as an image format file |
save_path |
str |
The file path to save, supporting both directory or file path |
None |
- Calling the `print()` method will output the results to the terminal. The content printed to the terminal is explained as follows:
- `input_path`: `(str)` The input path of the image to be predicted.
- `model_settings`: `(Dict[str, bool])` Model parameters required for configuring the pipeline.
- `use_doc_orientation_classify`: `(bool)` Controls whether to enable the document orientation classification module.
- `use_doc_unwarping`: `(bool)` Controls whether to enable the document unwarping module.
- `angle`: `(int)` The prediction result of the document orientation classification. When enabled, the values are [0, 90, 180, 270]; when not enabled, it is -1.
- Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the path will be `save_path/{your_img_basename}.json`; if a file is specified, it will be saved directly to that file. Since JSON files do not support saving NumPy arrays, any `numpy.array` types will be converted to lists.
- Calling the `save_to_img()` method will save the visualized results to the specified `save_path`. If a directory is specified, the path will be `save_path/{your_img_basename}_doc_preprocessor_res_img.{your_img_extension}`; if a file is specified, it will be saved directly to that file. (Since the pipeline typically includes multiple result images, it is not recommended to specify a specific file path directly, as multiple images may be overwritten, leaving only the last image.)
* Additionally, it is also possible to obtain visualized images with results and prediction outcomes through attributes, as detailed below:
API Reference
For the main operations provided by the service:
- The HTTP request method is POST.
- Both the request body and response body are JSON data (JSON objects).
- When the request is processed successfully, the response status code is
200, and the attributes of the response body are as follows:
| Name |
Type |
Meaning |
logId |
string |
The UUID of the request. |
errorCode |
integer |
Error code. Fixed as 0. |
errorMsg |
string |
Error message. Fixed as "Success". |
result |
object |
The result of the operation. |
- When the request is not processed successfully, the attributes of the response body are as follows:
| Name |
Type |
Meaning |
logId |
string |
The UUID of the request. |
errorCode |
integer |
Error code. Same as the response status code. |
errorMsg |
string |
Error message. |
The main operations provided by the service are as follows:
Obtain the document image preprocessing results.
POST /document-preprocessing
- The attributes of the request body are as follows:
| Name |
Type |
Meaning |
Required |
file |
string |
The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. By default, for PDF files exceeding 10 pages, only the first 10 pages will be processed.
To remove the page limit, please add the following configuration to the pipeline configuration file:
Serving:
extra:
max_num_input_imgs: null
|
Yes |
fileType |
integer | null |
The type of the file. 0 for PDF files, 1 for image files. If this attribute is missing, the file type will be inferred from the URL. |
No |
useDocOrientationClassify |
boolean | null |
Please refer to the description of the use_doc_orientation_classify parameter of the pipeline object's predict method. |
No |
useDocUnwarping |
boolean | null |
Please refer to the description of the use_doc_unwarping parameter of the pipeline object's predict method. |
No |
- When the request is processed successfully, the
result in the response body has the following attributes:
| Name |
Type |
Meaning |
docPreprocessingResults |
object |
Document image preprocessing results. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file. |
dataInfo |
object |
Information about the input data. |
Each element in docPreprocessingResults is an object with the following attributes:
| Name |
Type |
Meaning |
outputImage |
string |
The preprocessed image. The image is in PNG format and is Base64-encoded. |
prunedResult |
object |
A simplified version of the res field in the JSON representation of the result generated by the pipeline object's predict method, excluding the input_path and the page_index fields. |
docPreprocessingImage |
string | null |
The visualization result image. The image is in JPEG format and is Base64-encoded. |
inputImage |
string | null |
The input image. The image is in JPEG format and is Base64-encoded. |
Multi-language Service Call Example
Python
import base64
import requests
API_URL = "http://localhost:8080/document-preprocessing"
file_path = "./demo.jpg"
with open(file_path, "rb") as file:
file_bytes = file.read()
file_data = base64.b64encode(file_bytes).decode("ascii")
payload = {"file": file_data, "fileType": 1}
response = requests.post(API_URL, json=payload)
assert response.status_code == 200
result = response.json()["result"]
for i, res in enumerate(result["docPreprocessingResults"]):
print(res["prunedResult"])
output_img_path = f"out_{i}.png"
with open(output_img_path, "wb") as f:
f.write(base64.b64decode(res["outputImage"]))
print(f"Output image saved at {output_img_path}")
C++
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include "cpp-httplib/httplib.h" // https://github.com/Huiyicc/cpp-httplib
#include "nlohmann/json.hpp" // https://github.com/nlohmann/json
#include "base64.hpp" // https://github.com/tobiaslocker/base64
int main() {
httplib::Client client("localhost", 8080);
const std::string filePath = "./demo.jpg";
std::ifstream file(filePath, std::ios::binary | std::ios::ate);
if (!file) {
std::cerr << "Error opening file: " << filePath << std::endl;
return 1;
}
std::streamsize size = file.tellg();
file.seekg(0, std::ios::beg);
std::vector buffer(size);
if (!file.read(buffer.data(), size)) {
std::cerr << "Error reading file." << std::endl;
return 1;
}
std::string bufferStr(buffer.data(), static_cast(size));
std::string encodedFile = base64::to_base64(bufferStr);
nlohmann::json jsonObj;
jsonObj["file"] = encodedFile;
jsonObj["fileType"] = 1;
auto response = client.Post("/document-preprocessing", jsonObj.dump(), "application/json");
if (response && response->status == 200) {
nlohmann::json jsonResponse = nlohmann::json::parse(response->body);
auto result = jsonResponse["result"];
if (!result.is_object() || !result["docPreprocessingResults"].is_array()) {
std::cerr << "Unexpected response format." << std::endl;
return 1;
}
for (size_t i = 0; i < result["docPreprocessingResults"].size(); ++i) {
auto res = result["docPreprocessingResults"][i];
if (res.contains("prunedResult")) {
std::cout << "Preprocessed result: " << res["prunedResult"].dump() << std::endl;
}
if (res.contains("outputImage")) {
std::string outputImgPath = "out_" + std::to_string(i) + ".png";
std::string decodedImage = base64::from_base64(res["outputImage"].get());
std::ofstream outFile(outputImgPath, std::ios::binary);
if (outFile.is_open()) {
outFile.write(decodedImage.c_str(), decodedImage.size());
outFile.close();
std::cout << "Saved image: " << outputImgPath << std::endl;
} else {
std::cerr << "Failed to write image: " << outputImgPath << std::endl;
}
}
}
} else {
std::cerr << "Request failed." << std::endl;
if (response) {
std::cerr << "HTTP status: " << response->status << std::endl;
std::cerr << "Response body: " << response->body << std::endl;
}
return 1;
}
return 0;
}
Java
import okhttp3.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.node.ObjectNode;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;
public class Main {
public static void main(String[] args) throws IOException {
String API_URL = "http://localhost:8080/document-preprocessing";
String imagePath = "./demo.jpg";
File file = new File(imagePath);
byte[] fileContent = java.nio.file.Files.readAllBytes(file.toPath());
String base64Image = Base64.getEncoder().encodeToString(fileContent);
ObjectMapper objectMapper = new ObjectMapper();
ObjectNode payload = objectMapper.createObjectNode();
payload.put("file", base64Image);
payload.put("fileType", 1);
OkHttpClient client = new OkHttpClient();
MediaType JSON = MediaType.get("application/json; charset=utf-8");
RequestBody body = RequestBody.create(JSON, payload.toString());
Request request = new Request.Builder()
.url(API_URL)
.post(body)
.build();
try (Response response = client.newCall(request).execute()) {
if (response.isSuccessful()) {
String responseBody = response.body().string();
JsonNode root = objectMapper.readTree(responseBody);
JsonNode result = root.get("result");
JsonNode docPreprocessingResults = result.get("docPreprocessingResults");
for (int i = 0; i < docPreprocessingResults.size(); i++) {
JsonNode item = docPreprocessingResults.get(i);
int finalI = i;
JsonNode prunedResult = item.get("prunedResult");
System.out.println("Pruned Result [" + i + "]: " + prunedResult.toString());
String outputImgBase64 = item.get("outputImage").asText();
byte[] outputImgBytes = Base64.getDecoder().decode(outputImgBase64);
String outputImgPath = "out_" + finalI + ".png";
try (FileOutputStream fos = new FileOutputStream(outputImgPath)) {
fos.write(outputImgBytes);
System.out.println("Saved output image: " + outputImgPath);
}
JsonNode inputImageNode = item.get("inputImage");
if (inputImageNode != null && !inputImageNode.isNull()) {
String inputImageBase64 = inputImageNode.asText();
byte[] inputImageBytes = Base64.getDecoder().decode(inputImageBase64);
String inputImgPath = "inputImage_" + i + ".jpg";
try (FileOutputStream fos = new FileOutputStream(inputImgPath)) {
fos.write(inputImageBytes);
System.out.println("Saved input image to: " + inputImgPath);
}
}
}
} else {
System.err.println("Request failed with HTTP code: " + response.code());
}
}
}
}
Go
package main
import (
"bytes"
"encoding/base64"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"os"
)
func main() {
API_URL := "http://localhost:8080/document-preprocessing"
filePath := "./demo.jpg"
fileBytes, err := ioutil.ReadFile(filePath)
if err != nil {
fmt.Printf("Error reading file: %v\n", err)
return
}
fileData := base64.StdEncoding.EncodeToString(fileBytes)
payload := map[string]interface{}{
"file": fileData,
"fileType": 1,
}
payloadBytes, err := json.Marshal(payload)
if err != nil {
fmt.Printf("Error marshaling payload: %v\n", err)
return
}
client := &http.Client{}
req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes))
if err != nil {
fmt.Printf("Error creating request: %v\n", err)
return
}
req.Header.Set("Content-Type", "application/json")
res, err := client.Do(req)
if err != nil {
fmt.Printf("Error sending request: %v\n", err)
return
}
defer res.Body.Close()
if res.StatusCode != http.StatusOK {
fmt.Printf("Unexpected status code: %d\n", res.StatusCode)
return
}
body, err := ioutil.ReadAll(res.Body)
if err != nil {
fmt.Printf("Error reading response body: %v\n", err)
return
}
type DocPreprocessingResult struct {
PrunedResult map[string]interface{} `json:"prunedResult"`
OutputImage string `json:"outputImage"`
DocPreprocessingImage *string `json:"docPreprocessingImage"`
InputImage *string `json:"inputImage"`
}
type Response struct {
Result struct {
DocPreprocessingResults []DocPreprocessingResult `json:"docPreprocessingResults"`
DataInfo interface{} `json:"dataInfo"`
} `json:"result"`
}
var respData Response
if err := json.Unmarshal(body, &respData); err != nil {
fmt.Printf("Error unmarshaling response: %v\n", err)
return
}
for i, res := range respData.Result.DocPreprocessingResults {
fmt.Printf("Result %d - prunedResult: %+v\n", i, res.PrunedResult)
imgBytes, err := base64.StdEncoding.DecodeString(res.OutputImage)
if err != nil {
fmt.Printf("Error decoding outputImage at index %d: %v\n", i, err)
continue
}
filename := fmt.Sprintf("out_%d.png", i)
if err := os.WriteFile(filename, imgBytes, 0644); err != nil {
fmt.Printf("Error saving image %s: %v\n", filename, err)
continue
}
fmt.Printf("Saved output image to %s\n", filename)
}
}
C#
using System;
using System.IO;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;
class Program
{
static readonly string API_URL = "http://localhost:8080/document-preprocessing";
static readonly string inputFilePath = "./demo.jpg";
static async Task Main(string[] args)
{
var httpClient = new HttpClient();
byte[] fileBytes = File.ReadAllBytes(inputFilePath);
string fileData = Convert.ToBase64String(fileBytes);
var payload = new JObject
{
{ "file", fileData },
{ "fileType", 1 }
};
var content = new StringContent(payload.ToString(), Encoding.UTF8, "application/json");
HttpResponseMessage response = await httpClient.PostAsync(API_URL, content);
response.EnsureSuccessStatusCode();
string responseBody = await response.Content.ReadAsStringAsync();
JObject jsonResponse = JObject.Parse(responseBody);
JArray docPreResults = (JArray)jsonResponse["result"]["docPreprocessingResults"];
for (int i = 0; i < docPreResults.Count; i++)
{
var res = docPreResults[i];
Console.WriteLine($"[{i}] prunedResult:\n{res["prunedResult"]}");
string base64Image = res["outputImage"]?.ToString();
if (!string.IsNullOrEmpty(base64Image))
{
string outputPath = $"out_{i}.png";
byte[] imageBytes = Convert.FromBase64String(base64Image);
File.WriteAllBytes(outputPath, imageBytes);
Console.WriteLine($"Output image saved at {outputPath}");
}
else
{
Console.WriteLine($"outputImage at index {i} is null.");
}
}
}
}
Node.js
const axios = require('axios');
const fs = require('fs');
const path = require('path');
const API_URL = 'http://localhost:8080/document-preprocessing';
const imagePath = './demo.jpg';
function encodeImageToBase64(filePath) {
const bitmap = fs.readFileSync(filePath);
return Buffer.from(bitmap).toString('base64');
}
const payload = {
file: encodeImageToBase64(imagePath),
fileType: 1
};
axios.post(API_URL, payload, {
headers: {
'Content-Type': 'application/json'
},
maxBodyLength: Infinity
})
.then((response) => {
const results = response.data.result.docPreprocessingResults;
results.forEach((res, index) => {
console.log(`\n[${index}] prunedResult:`);
console.log(res.prunedResult);
const base64Image = res.outputImage;
if (base64Image) {
const outputImagePath = `out_${index}.png`;
const imageBuffer = Buffer.from(base64Image, 'base64');
fs.writeFileSync(outputImagePath, imageBuffer);
console.log(`Output image saved at ${outputImagePath}`);
} else {
console.log(`outputImage at index ${index} is null.`);
}
});
})
.catch((error) => {
console.error('API error:', error.message);
});
PHP
<?php
$API_URL = "http://localhost:8080/document-preprocessing";
$image_path = "./demo.jpg";
$output_image_path = "./out_0.png";
$image_data = base64_encode(file_get_contents($image_path));
$payload = array("file" => $image_data, "fileType" => 1);
$ch = curl_init($API_URL);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload));
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
$result = json_decode($response, true)["result"]["docPreprocessingResults"];
foreach ($result as $i => $item) {
echo "[$i] prunedResult:\n";
print_r($item["prunedResult"]);
if (!empty($item["outputImage"])) {
$output_image_path = "out_" . $i . ".png";
file_put_contents($output_image_path, base64_decode($item["outputImage"]));
echo "Output image saved at $output_image_path\n";
} else {
echo "No outputImage found for item $i\n";
}
}
?>