## compute_bleu算法原理详解

### 1. BLEU算法概述

BLEU (Bilingual Evaluation Understudy) 是评估**机器翻译**和**文本生成**质量的标准算法，通过计算候选翻译与参考翻译的**N-gram重叠度**来衡量翻译质量。

### 2. 核心算法流程

```mermaid
graph TD
    A[输入: 候选翻译 + 参考翻译] --> B[提取N-gram]
    B --> C[计算N-gram重叠]
    C --> D[计算精确度]
    D --> E[几何平均]
    E --> F[计算简洁惩罚BP]
    F --> G[最终BLEU分数]
    
    B --> B1[1-gram: 单词级别]
    B --> B2[2-gram: 词对级别]
    B --> B3[3-gram: 三词组合]
    B --> B4[4-gram: 四词组合]
    
    style A fill:#e1f5fe
    style G fill:#e8f5e8
    style E fill:#fff3e0
    style F fill:#ffebee
```

### 3. 详细算法步骤

#### 3.1 N-gram提取

````python
def _get_ngrams(segment, max_order):
    """提取所有n-gram直到最大阶数"""
    ngram_counts = collections.Counter()
    for order in range(1, max_order + 1):
        for i in range(0, len(segment) - order + 1):
            ngram = tuple(segment[i : i + order])
            ngram_counts[ngram] += 1
    return ngram_counts
````

**示例：**
```python
segment = ["the", "cat", "is", "on", "the", "mat"]

# 1-gram: ("the",), ("cat",), ("is",), ("on",), ("the",), ("mat",)
# 2-gram: ("the","cat"), ("cat","is"), ("is","on"), ("on","the"), ("the","mat")
# 3-gram: ("the","cat","is"), ("cat","is","on"), ("is","on","the"), ("on","the","mat")
# 4-gram: ("the","cat","is","on"), ("cat","is","on","the"), ("is","on","the","mat")
```

#### 3.2 N-gram重叠计算

````python
# 合并多个参考翻译的n-gram
merged_ref_ngram_counts = collections.Counter()
for reference in references:
    merged_ref_ngram_counts |= _get_ngrams(reference, max_order)

# 计算候选翻译的n-gram
translation_ngram_counts = _get_ngrams(translation, max_order)

# 计算重叠
overlap = translation_ngram_counts & merged_ref_ngram_counts
for ngram in overlap:
    matches_by_order[len(ngram) - 1] += overlap[ngram]
````

**重叠计算原理：**
- 对每个n-gram，取候选翻译和参考翻译中的**最小出现次数**
- 避免重复计算同一个n-gram

#### 3.3 精确度计算

````python
precisions = [0] * max_order
for i in range(0, max_order):
    if smooth:
        # Lin et al. 2004 平滑
        precisions[i] = (matches_by_order[i] + 1.0) / (
            possible_matches_by_order[i] + 1.0
        )
    else:
        if possible_matches_by_order[i] > 0:
            precisions[i] = (
                float(matches_by_order[i]) / possible_matches_by_order[i]
            )
        else:
            precisions[i] = 0.0
````

**精确度公式：**
```
P_n = 匹配的n-gram数量 / 候选翻译中的n-gram总数
```

#### 3.4 几何平均计算

````python
if min(precisions) > 0:
    p_log_sum = sum((1.0 / max_order) * math.log(p) for p in precisions)
    geo_mean = math.exp(p_log_sum)
else:
    geo_mean = 0
````

**几何平均公式：**
```
geo_mean = (P_1 × P_2 × P_3 × P_4)^(1/4)
```

#### 3.5 简洁惩罚 (Brevity Penalty)

````python
ratio = float(translation_length) / reference_length

if ratio > 1.0:
    bp = 1.0  # 候选翻译较长，无惩罚
else:
    bp = math.exp(1 - 1.0 / ratio)  # 候选翻译较短，施加惩罚
````

**BP作用：**
- 防止过短的翻译获得虚高分数
- 鼓励生成适当长度的翻译

### 4. 最终BLEU分数

````python
bleu = geo_mean * bp
````

**完整公式：**
```
BLEU = BP × exp(∑(w_n × log(P_n)))
```

其中：
- `BP`: 简洁惩罚因子
- `w_n`: N-gram权重（通常为1/N）
- `P_n`: N-gram精确度

### 5. 算法示例演示

#### 5.1 具体计算过程

**候选翻译:** "the cat is on the mat"
**参考翻译:** "the cat sits on the mat"

**步骤1: N-gram提取**
```python
# 候选翻译 n-grams:
1-gram: the(2), cat(1), is(1), on(1), mat(1)
2-gram: (the,cat)(1), (cat,is)(1), (is,on)(1), (on,the)(1), (the,mat)(1)

# 参考翻译 n-grams:
1-gram: the(2), cat(1), sits(1), on(1), mat(1)  
2-gram: (the,cat)(1), (cat,sits)(1), (sits,on)(1), (on,the)(1), (the,mat)(1)
```

**步骤2: 重叠计算**
```python
# 1-gram匹配: the(2), cat(1), on(1), mat(1) = 5个
# 2-gram匹配: (the,cat)(1), (on,the)(1), (the,mat)(1) = 3个
# 3-gram匹配: 0个
# 4-gram匹配: 0个
```

**步骤3: 精确度计算**
```python
P1 = 5/6 = 0.833  # 5个匹配 / 6个总计
P2 = 3/5 = 0.600  # 3个匹配 / 5个总计
P3 = 0/4 = 0.000  # 0个匹配 / 4个总计
P4 = 0/3 = 0.000  # 0个匹配 / 3个总计
```

**步骤4: 几何平均**
```python
# 由于P3和P4为0，几何平均为0
geo_mean = 0
```

**步骤5: 简洁惩罚**
```python
ratio = 6/6 = 1.0
BP = 1.0  # 无惩罚
```

**最终BLEU: 0 × 1.0 = 0**

### 6. 平滑策略

当某个n-gram精确度为0时，几何平均也为0，导致BLEU分数为0。平滑策略解决这个问题：

````python
if smooth:
    precisions[i] = (matches_by_order[i] + 1.0) / (
        possible_matches_by_order[i] + 1.0
    )
````

**Lin et al. 2004平滑：** 给分子分母都加1，避免零值。

### 7. 多参考翻译处理

````python
# 合并多个参考翻译
merged_ref_ngram_counts = collections.Counter()
for reference in references:
    merged_ref_ngram_counts |= _get_ngrams(reference, max_order)
````

**策略：** 对每个n-gram，在所有参考翻译中取**最大出现次数**。

### 8. 算法特点

#### 8.1 优点
- **标准化**: 国际通用的评估标准
- **多粒度**: 同时考虑1-4gram的匹配
- **鲁棒性**: 处理多参考翻译
- **惩罚机制**: 防止过短翻译

#### 8.2 局限性
- **词序敏感**: 相同词汇不同顺序得分不同
- **同义词盲**: 不识别语义相似的词汇
- **参考依赖**: 依赖高质量参考翻译

### 9. 在PaddleOCR中的应用效果

根据文档显示的典型BLEU分数：

| 模型 | 任务 | BLEU分数 | 说明 |
|------|------|----------|------|
| UniMERNet | 数学公式识别 | 85.91% | 高质量公式识别 |
| LaTeX-OCR | LaTeX公式识别 | ~80% | 复杂公式处理 |
| PP-FormulaNet | 公式识别 | ~75% | 平衡精度与速度 |

BLEU算法为PaddleOCR的序列生成任务（特别是公式识别）提供了国际标准的质量评估机制，能够全面衡量生成文本的准确性和流畅性。