|
|
@@ -1,50 +1,64 @@
|
|
|
+<div id="top"></div>
|
|
|
+<div align="center">
|
|
|
|
|
|
+[](https://github.com/magicpdf/Magic-PDF)
|
|
|
+[](https://github.com/magicpdf/Magic-PDF)
|
|
|
+[](https://github.com/magicpdf/Magic-PDF/tree/main/LICENSE)
|
|
|
+[](https://github.com/magicpdf/Magic-PDF/issues)
|
|
|
+[](https://github.com/magicpdf/Magic-PDF/issues)
|
|
|
+
|
|
|
+[English](README.md) | [简体中文](README_zh-CN.md)
|
|
|
+
|
|
|
+</div>
|
|
|
+
|
|
|
+<div align="center">
|
|
|
+
|
|
|
+</div>
|
|
|
|
|
|
# Magic-PDF
|
|
|
|
|
|
-便捷、准确的将PDF转换成Markdown文档
|
|
|
+## Introduction
|
|
|
|
|
|
+Magic-PDF is a tool designed to convert PDF documents into markdown format, capable of processing files stored locally or on object storage supporting S3 protocol.
|
|
|
|
|
|
-### 上手指南
|
|
|
+Key features include:
|
|
|
|
|
|
-###### 开发前的配置要求
|
|
|
+- Support for multiple front-end model inputs
|
|
|
+- Removal of headers, footers, footnotes, and page numbers
|
|
|
+- Human-readable layout formatting
|
|
|
+- Extraction and display of images and tables within markdown
|
|
|
+- Conversion of equations into LaTeX format
|
|
|
+- Automatic detection and conversion of garbled PDFs
|
|
|
+- Compatibility with CPU and GPU environments
|
|
|
+- Available for Windows, Linux, and macOS platforms
|
|
|
|
|
|
-python 3.9+
|
|
|
+## Getting Started
|
|
|
|
|
|
-###### **安装步骤**
|
|
|
+### Requirements
|
|
|
|
|
|
-1.Clone the repo
|
|
|
+- Python 3.9 or newer
|
|
|
|
|
|
-```sh
|
|
|
-git clone https://github.com/magicpdf/Magic-PDF.git
|
|
|
-```
|
|
|
+### Usage Instructions
|
|
|
|
|
|
-2.Install the requirements
|
|
|
+1. **Install Magic-PDF**
|
|
|
|
|
|
-```sh
|
|
|
-cd Magic-PDF
|
|
|
-pip install -r requirements.txt
|
|
|
+```bash
|
|
|
+pip install magic-pdf[cpu] # Install the CPU version
|
|
|
+or
|
|
|
+pip install magic-pdf[gpu] # Install the GPU version
|
|
|
```
|
|
|
|
|
|
-3.Run the command line
|
|
|
+2. **Usage via Command Line**
|
|
|
|
|
|
-```sh
|
|
|
-linux/osx
|
|
|
-export PYTHONPATH=.
|
|
|
-win
|
|
|
-$env:PYTHONPATH += ";.\Magic-PDF\magic_pdf"
|
|
|
-```
|
|
|
-```
|
|
|
-python magic_pdf/cli/magicpdf.py --help
|
|
|
+```bash
|
|
|
+magic-pdf --help
|
|
|
```
|
|
|
|
|
|
-### 版权说明
|
|
|
+## License Information
|
|
|
|
|
|
-[LICENSE.md](https://github.com/magicpdf/Magic-PDF/blob/master/LICENSE.md)
|
|
|
+See [LICENSE.md](https://github.com/magicpdf/Magic-PDF/blob/master/LICENSE.md) for details.
|
|
|
|
|
|
-### 鸣谢
|
|
|
+## Acknowledgments
|
|
|
|
|
|
+- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
|
|
|
- [PyMuPDF](https://github.com/pymupdf/PyMuPDF)
|
|
|
-
|
|
|
-
|
|
|
-
|