{ "cells": [ { "cell_type": "markdown", "id": "874f60a8", "metadata": {}, "source": [ "#### 练习7. Naive RAG\n", "1. 下载上市公司年报数据,数据要求\n", " 1. 9家公司:3家上交所,3家港交所,3家美国上市公司\n", " 2. 每一家上市公司至少过去连续3年的财报文件\n", "2. Indexing:对文件进行读取,切片,向量化保存到向量库中\n", "3. Retrieval & Generation:提出任意和年报有关的问题,能够精确的找出相关问题的片段并回答。问题至少包括以下几个方面\n", " 1. 语义理解类型问题:xxxx公司xxxx年业绩主要的变化\n", " 2. 指标提取类:xxxx年xx 公司\n", " 3. 【可选】指标计算类:xxxx年xx公司的流动比率(Current Ratio)是多少?\n" ] }, { "cell_type": "code", "execution_count": 68, "id": "a3404d48", "metadata": {}, "outputs": [], "source": [ "import os\n", "from dotenv import load_dotenv\n", "from langchain_community.embeddings import JinaEmbeddings\n", "from langchain_openai import ChatOpenAI\n", "import weaviate\n", "from langchain_weaviate import WeaviateVectorStore\n", "\n", "load_dotenv()\n", "os.environ[\"HF_ENDPOINT\"] = \"https://hf-mirror.com\"\n", "jina_api_key = os.getenv('JINA_API_KEY')\n", "qwen_base_url = os.getenv('BAILIAN_API_BASE_URL')\n", "qwen_api_key = os.getenv('BAILIAN_API_KEY')\n", "\n", "embeddings = JinaEmbeddings(\n", " jina_api_key=jina_api_key,\n", " model_name=\"jina-embeddings-v4\",\n", ")\n", "\n", "client = weaviate.connect_to_local()\n", "\n", "data_dir = \"data/年报\"\n", "\n", "# vectorstore = WeaviateVectorStore(\n", "# client=client,\n", "# index_name=\"AnnualReports\",\n", "# text_key=\"content\",\n", "# embedding=embeddings\n", "# )\n", "\n", "vectorstore = WeaviateVectorStore(\n", " client=client,\n", " index_name=\"AnnualReports_test7\",\n", " text_key=\"text\",\n", " embedding=embeddings\n", ")\n", "\n", "llm = ChatOpenAI(\n", " base_url=qwen_base_url,\n", " api_key=qwen_api_key,\n", " model=\"qwen3-30b-a3b\",\n", " temperature=0.1,\n", " streaming=True\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "52b58233", "metadata": {}, "outputs": [], "source": [ "from langchain.docstore.document import Document\n", "from unstructured.partition.pdf import partition_pdf\n", "from unstructured.chunking.title import chunk_by_title\n", "\n", "def load_with_unstructured(file_path, metadata=None):\n", " \"\"\"\n", " 使用unstructured加载PDF文档,保持结构化内容\n", " \"\"\"\n", " if metadata is None:\n", " metadata = {\"source\": file_path}\n", " \n", " try:\n", " # 使用unstructured处理PDF,保持表格结构\n", " elements = partition_pdf(\n", " filename=file_path,\n", " strategy=\"auto\",\n", " infer_table_structure=True, # 推断表格结构\n", " include_page_breaks=True, # 包含页面分隔\n", " extract_images_in_pdf=False, # 不提取图片\n", " languages=[\"eng\", \"chi_sim\", \"chi_tra\"] # 支持中英文\n", " )\n", " \n", " documents = []\n", " \n", " for element in elements:\n", " # 检查元素类型\n", " element_type = str(type(element)).split('.')[-1].replace(\"'>\", \"\")\n", " \n", " # 获取元素内容\n", " content = str(element).strip()\n", " if not content:\n", " continue\n", " \n", " # 创建文档\n", " doc = Document(\n", " page_content=content,\n", " metadata={\n", " **metadata,\n", " \"element_type\": element_type,\n", " \"page_number\": getattr(element, \"metadata\", {})[\"page_number\"]\n", " }\n", " )\n", " documents.append(doc)\n", " \n", " return documents\n", " \n", " except Exception as e:\n", " print(f\"使用unstructured处理文件时出错: {str(e)}\")\n", " # 如果unstructured失败,降级使用PyPDF\n", " print(f\" 尝试使用PyPDF作为备选方案...\")\n", " return load_with_pypdf_fallback(file_path, metadata)\n", "\n", "def load_with_pypdf_fallback(file_path, metadata=None):\n", " \"\"\"\n", " 当unstructured失败时的PyPDF备选方案\n", " \"\"\"\n", " try:\n", " from langchain.document_loaders import PyPDFLoader\n", " \n", " loader = PyPDFLoader(file_path)\n", " documents = loader.load()\n", " \n", " processed_docs = []\n", " for doc in documents:\n", " doc.metadata.update(metadata or {})\n", " doc.metadata['element_type'] = 'text' # PyPDF只能处理文本\n", " processed_docs.append(doc)\n", " \n", " print(f\" ✓ PyPDF备选方案成功处理\")\n", " return processed_docs\n", " \n", " except Exception as e:\n", " print(f\" ❌ PyPDF备选方案也失败: {str(e)}\")\n", " return []\n", "\n", "def create_smart_chunks(documents, max_chunk_size=1000, chunk_overlap=200):\n", " \"\"\"\n", " 创建智能分块,确保表格完整性\n", " \"\"\"\n", " chunked_docs = []\n", " \n", " for doc in documents:\n", " element_type = doc.metadata[\"element_type\"]\n", " \n", " # 表格类型的文档保持完整,不进行分块\n", " if \"Table\" in element_type:\n", " chunked_docs.append(doc)\n", " else:\n", " # 对文本内容进行分块\n", " content = doc.page_content\n", " \n", " if len(content) <= max_chunk_size:\n", " chunked_docs.append(doc)\n", " else:\n", " # 按段落分割\n", " paragraphs = content.split('\\n\\n')\n", " current_chunk = \"\"\n", " \n", " for paragraph in paragraphs:\n", " # 检查添加当前段落是否会超出大小限制\n", " potential_chunk = current_chunk + \"\\n\\n\" + paragraph if current_chunk else paragraph\n", " \n", " if len(potential_chunk) <= max_chunk_size:\n", " current_chunk = potential_chunk\n", " else:\n", " # 保存当前块\n", " if current_chunk:\n", " chunk_doc = Document(\n", " page_content=current_chunk,\n", " metadata=doc.metadata.copy()\n", " )\n", " chunked_docs.append(chunk_doc)\n", " \n", " # 开始新块\n", " current_chunk = paragraph\n", " \n", " # 保存最后一块\n", " if current_chunk:\n", " chunk_doc = Document(\n", " page_content=current_chunk,\n", " metadata=doc.metadata.copy()\n", " )\n", " chunked_docs.append(chunk_doc)\n", " \n", " return chunked_docs\n", "\n", "def process_documents_with_unstructured(data_dir, vectorstore):\n", " \"\"\"\n", " 使用unstructured处理文档并索引到向量存储\n", " \"\"\"\n", " total_chunks = 0\n", " successful_files = 0\n", " failed_files = 0\n", " \n", " for company_folder in os.listdir(data_dir):\n", " company_path = os.path.join(data_dir, company_folder)\n", " if os.path.isdir(company_path):\n", " print(f\"\\n正在处理:{company_folder} 的年报...\")\n", " \n", " # 获取PDF文件\n", " pdf_files = [f for f in os.listdir(company_path) if f.endswith('.pdf')]\n", " \n", " for pdf_file in pdf_files:\n", " pdf_path = os.path.join(company_path, pdf_file)\n", " print(f\"\\n 加载文件:{pdf_file}\")\n", " \n", " try:\n", " # 使用unstructured加载文档\n", " metadata = {\n", " 'company': company_folder,\n", " 'file': pdf_file\n", " }\n", " documents = load_with_unstructured(pdf_path, metadata)\n", " \n", " if not documents:\n", " print(f\" ❌ 无法从文件 {pdf_file} 中提取内容\")\n", " failed_files += 1\n", " continue\n", " \n", " # 创建智能分块\n", " split_docs = create_smart_chunks(documents)\n", " print(f\" 分割成 {len(split_docs)} 个文本块(包含表格块)\")\n", " \n", " # 统计表格和文本块数量\n", " table_count = sum(1 for doc in split_docs if \"Table\" in doc.metadata[\"element_type\"])\n", " text_count = len(split_docs) - table_count\n", " print(f\" - 表格块: {table_count} 个\")\n", " print(f\" - 文本块: {text_count} 个\")\n", " \n", " # 添加到向量存储\n", " print(\" 添加文档到向量存储...\")\n", " vectorstore.add_documents(split_docs)\n", " \n", " total_chunks += len(split_docs)\n", " successful_files += 1\n", " print(f\" ✓ 已处理完成,累计文本块数:{total_chunks}\")\n", " \n", " except Exception as e:\n", " print(f\" ❌ 处理文件 {pdf_file} 时出错: {str(e)}\")\n", " failed_files += 1\n", " continue\n", " \n", " print(f\"\\n🎉 所有文档处理完成!\")\n", " print(f\"📊 总计处理了 {total_chunks} 个文本块\")\n", " print(f\"✅ 成功处理 {successful_files} 个文件\")\n", " print(f\"❌ 失败 {failed_files} 个文件\")\n", " print(\"✅ 所有文档已成功索引到 Weaviate!\")\n", "\n", "# 执行文档处理\n", "process_documents_with_unstructured(data_dir, vectorstore)" ] }, { "cell_type": "code", "execution_count": 69, "id": "6587f75c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MinerU JSON处理函数已定义完成\n" ] } ], "source": [ "# MinerU解析后json处理\n", "import json\n", "import os\n", "from langchain.docstore.document import Document\n", "from bs4 import BeautifulSoup\n", "import pandas as pd\n", "from io import StringIO\n", "\n", "def extract_text_from_block(block):\n", " \"\"\"从块中提取文本内容\"\"\"\n", " text_parts = []\n", " \n", " if 'lines' in block:\n", " for line in block['lines']:\n", " if 'spans' in line:\n", " for span in line['spans']:\n", " if span.get('type') == 'text' and 'content' in span:\n", " text_parts.append(span['content'])\n", " \n", " return ' '.join(text_parts).strip()\n", "\n", "def extract_table_html_from_block(block):\n", " \"\"\"从表格块中提取HTML内容\"\"\"\n", " if block.get('type') != 'table':\n", " return None\n", " \n", " # 在blocks -> lines -> spans中查找HTML\n", " blocks = block.get('blocks', [])\n", " for sub_block in blocks:\n", " lines = sub_block.get('lines', [])\n", " for line in lines:\n", " spans = line.get('spans', [])\n", " for span in spans:\n", " if span.get('type') == 'table' and 'html' in span:\n", " return span['html']\n", " \n", " return None\n", "\n", "def extract_table_text_with_pandas(table_html):\n", " \"\"\"使用pandas从表格HTML中提取结构化文本内容\"\"\"\n", " if not table_html:\n", " return \"\"\n", " \n", " try:\n", " # 使用pandas读取HTML表格\n", " dfs = pd.read_html(StringIO(table_html))\n", " if not dfs:\n", " return \"\"\n", " \n", " # 获取第一个表格(通常只有一个)\n", " df = dfs[0]\n", " \n", " # 处理空值和清理数据\n", " df = df.fillna('')\n", " \n", " # 转换为更适合嵌入的文本格式\n", " table_text_parts = []\n", " \n", " # 添加列标题\n", " headers = [str(col) for col in df.columns if str(col) != 'nan']\n", " if headers:\n", " table_text_parts.append(\"表格列标题: \" + \" | \".join(headers))\n", " \n", " # 添加每一行数据\n", " for idx, row in df.iterrows():\n", " row_data = []\n", " for col_idx, value in enumerate(row):\n", " if str(value).strip() and str(value) != 'nan':\n", " # 如果有列名,使用\"列名: 值\"的格式\n", " if col_idx < len(headers) and headers[col_idx]:\n", " row_data.append(f\"{headers[col_idx]}: {value}\")\n", " else:\n", " row_data.append(str(value))\n", " \n", " if row_data:\n", " table_text_parts.append(\" | \".join(row_data))\n", " \n", " result = '\\n'.join(table_text_parts)\n", " \n", " # 如果pandas处理失败,降级使用BeautifulSoup\n", " if not result.strip():\n", " return extract_table_text_fallback(table_html)\n", " \n", " return result\n", " \n", " except Exception as e:\n", " print(f\"pandas表格解析错误,降级使用BeautifulSoup: {e}\")\n", " return extract_table_text_fallback(table_html)\n", "\n", "def extract_table_text_fallback(table_html):\n", " \"\"\"当pandas失败时的BeautifulSoup备选方案\"\"\"\n", " if not table_html:\n", " return \"\"\n", " \n", " try:\n", " soup = BeautifulSoup(table_html, 'html.parser')\n", " table = soup.find('table')\n", " if not table:\n", " return \"\"\n", " \n", " rows = []\n", " for tr in table.find_all('tr'):\n", " cells = []\n", " for td in tr.find_all(['td', 'th']):\n", " cell_text = td.get_text(strip=True)\n", " if cell_text:\n", " cells.append(cell_text)\n", " if cells:\n", " rows.append(' | '.join(cells))\n", " \n", " return '\\n'.join(rows)\n", " except Exception as e:\n", " print(f\"BeautifulSoup表格解析也失败: {e}\")\n", " return \"\"\n", "\n", "def extract_table_text(table_html):\n", " \"\"\"优化版表格文本提取 - 简洁清晰的格式\"\"\"\n", " if not table_html:\n", " return \"\"\n", " \n", " try:\n", " # 使用pandas提取表格结构\n", " dfs = pd.read_html(StringIO(table_html))\n", " if dfs:\n", " df = dfs[0].fillna('')\n", " \n", " # 智能识别表格类型\n", " html_lower = table_html.lower()\n", " content_text = ' '.join([str(cell) for row in df.values for cell in row if pd.notna(cell)]).lower()\n", " \n", " # 识别表格类型\n", " table_type = \"\"\n", " if (any(term in html_lower for term in ['consolidated statements of operations', 'income statement', 'profit loss', 'earnings', '合并利润表', '利润表', '收益表']) or\n", " any(term in content_text for term in ['net sales', 'total net sales', 'net income', 'gross profit', 'operating income', 'revenue', '营业收入', '净利润', '营业利润', '利润总额', '基本每股收益'])):\n", " table_type = \"合并利润表/损益表\"\n", " \n", " elif (any(term in html_lower for term in ['balance sheet', 'financial position', 'statement of financial position', '合并资产负债表', '资产负债表', '财务状况表']) or\n", " any(term in content_text for term in ['total assets', 'current assets', 'total liabilities', 'shareholders equity', '资产总计', '流动资产', '非流动资产', '负债总计', '股东权益', '所有者权益', '流动负债', '非流动负债'])):\n", " table_type = \"合并资产负债表\"\n", " \n", " elif (any(term in html_lower for term in ['cash flow', 'cash flows', '合并现金流量表', '现金流量表']) or\n", " any(term in content_text for term in ['operating activities', 'investing activities', 'financing activities', '经营活动产生的现金流量', '投资活动产生的现金流量', '筹资活动产生的现金流量', '现金及现金等价物'])):\n", " table_type = \"合并现金流量表\"\n", " \n", " else:\n", " table_type = \"财务数据表\"\n", " \n", " # 生成结构化文本摘要\n", " table_rows = []\n", " for _, row in df.iterrows():\n", " row_data = []\n", " for col_idx, value in enumerate(row):\n", " if pd.notna(value) and str(value).strip():\n", " text = str(value).strip()\n", " col_name = str(df.columns[col_idx]) if col_idx < len(df.columns) and str(df.columns[col_idx]) != 'nan' else \"\"\n", " \n", " # 只对重要的财务数据添加列名\n", " if col_name and any(term in text.lower() for term in [\n", " 'assets', 'liabilities', 'equity', 'revenue', 'income', 'profit', 'cash',\n", " '资产', '负债', '权益', '收入', '利润', '现金', '流动', '非流动'\n", " ]):\n", " row_data.append(f\"{col_name}: {text}\")\n", " else:\n", " row_data.append(text)\n", " \n", " if row_data:\n", " table_rows.append(\" | \".join(row_data))\n", " \n", " # 构建最终输出 - 更简洁的格式\n", " if table_type and table_rows:\n", " result = f\"\"\"[{table_type}]\n", "{chr(10).join(table_rows[:10])}\n", "\n", "{table_html.strip()}\"\"\"\n", " else:\n", " result = table_html.strip()\n", " \n", " return result\n", " \n", " except Exception as e:\n", " print(f\"表格解析失败,使用原始HTML: {e}\")\n", " return table_html.strip()\n", " \n", " return table_html.strip()\n", "\n", "def extract_table_text_hybrid(table_html):\n", " \"\"\"混合方案:HTML + 结构化摘要\"\"\"\n", " if not table_html:\n", " return \"\"\n", " \n", " # 1. 保留完整HTML(供展示)\n", " html_content = table_html.strip()\n", " \n", " # 2. 添加结构化文本摘要(供搜索)\n", " try:\n", " dfs = pd.read_html(StringIO(table_html))\n", " if dfs:\n", " df = dfs[0].fillna('')\n", " \n", " # 生成关键摘要\n", " summary_parts = [\"=== 表格数据摘要 ===\"]\n", " for _, row in df.iterrows():\n", " row_text = []\n", " for col, value in zip(df.columns, row):\n", " if str(value).strip() and str(value) != 'nan':\n", " row_text.append(f\"{col}: {value}\")\n", " if row_text:\n", " summary_parts.append(\" | \".join(row_text))\n", " \n", " summary = \"\\n\".join(summary_parts)\n", " \n", " # 3. 组合:摘要 + HTML\n", " return f\"{summary}\\n\\n=== 原始表格 ===\\n{html_content}\"\n", " except:\n", " pass\n", " \n", " return html_content\n", "\n", "def process_mineru_json(json_file_path, metadata=None):\n", " \"\"\"\n", " 处理MinerU生成的JSON文件 - 改进版,解决表格分割问题\n", " \n", " Args:\n", " json_file_path: JSON文件路径\n", " metadata: 额外的元数据\n", " \n", " Returns:\n", " List[Document]: 处理后的文档列表\n", " \"\"\"\n", " if metadata is None:\n", " metadata = {\"source\": json_file_path}\n", " \n", " try:\n", " with open(json_file_path, 'r', encoding='utf-8') as f:\n", " data = json.load(f)\n", " \n", " documents = []\n", " \n", " # 处理每一页\n", " for page_idx, page_info in enumerate(data.get('pdf_info', []), 1):\n", " para_blocks = page_info.get('para_blocks', [])\n", " \n", " # 找出所有表格块的位置\n", " table_blocks_info = []\n", " for i, block in enumerate(para_blocks):\n", " if block.get('type') == 'table':\n", " table_blocks_info.append({'index': i, 'block': block})\n", " \n", " # 改进的表格合并逻辑\n", " processed_table_indices = set()\n", " \n", " for table_info in table_blocks_info:\n", " table_idx = table_info['index']\n", " if table_idx in processed_table_indices:\n", " continue\n", " \n", " # 收集相邻的表格块(更激进的合并策略)\n", " related_tables = [table_info]\n", " related_indices = [table_idx]\n", " \n", " # 向前和向后查找相邻的表格块\n", " for other_table in table_blocks_info:\n", " other_idx = other_table['index']\n", " if other_idx != table_idx and other_idx not in processed_table_indices:\n", " # 如果距离很近(8个块以内),尝试合并\n", " if abs(other_idx - table_idx) <= 8:\n", " # 检查是否是财务相关的表格\n", " if is_financial_related_table(table_info['block'], other_table['block']):\n", " related_tables.append(other_table)\n", " related_indices.append(other_idx)\n", " \n", " # 标记这些表格为已处理\n", " processed_table_indices.update(related_indices)\n", " \n", " # 收集上下文标题\n", " title_blocks = []\n", " min_idx = min(related_indices)\n", " max_idx = max(related_indices)\n", " \n", " # 扩大搜索范围,查找更远的上下文\n", " for j in range(max(0, min_idx - 10), min_idx):\n", " if j not in related_indices:\n", " prev_block = para_blocks[j]\n", " prev_text = extract_text_from_block(prev_block)\n", " \n", " # 更宽松的标题识别条件\n", " if prev_text and len(prev_text) < 500:\n", " if any(keyword in prev_text.lower() for keyword in [\n", " 'consolidated statements of operations', 'income statement', \n", " 'statements of operations', 'operations', 'financial',\n", " 'consolidated', 'statements', 'apple', 'net sales', 'net income',\n", " 'table', 'statement', 'income', 'balance', 'cash', 'revenue', \n", " 'years ended', 'comprehensive income', 'financial position',\n", " '报表', '财务', '收益', '利润', '损益', '资产负债', '现金流'\n", " ]):\n", " title_blocks.insert(0, prev_text)\n", " \n", " # 构建合并的表格内容\n", " merged_content_parts = []\n", " \n", " # 添加标题\n", " if title_blocks:\n", " merged_content_parts.extend(title_blocks)\n", " \n", " # 合并所有相关表格\n", " all_table_htmls = []\n", " all_table_texts = []\n", " \n", " for table_info in sorted(related_tables, key=lambda x: x['index']):\n", " table_html = extract_table_html_from_block(table_info['block'])\n", " if table_html:\n", " all_table_htmls.append(table_html)\n", " table_text = extract_table_text(table_html)\n", " if table_text:\n", " all_table_texts.append(table_text)\n", " else:\n", " # 如果没有HTML,尝试提取文本\n", " table_text = extract_text_from_block(table_info['block'])\n", " if table_text:\n", " all_table_texts.append(table_text)\n", " \n", " # 如果有多个表格,创建一个综合的表格文档\n", " if all_table_texts:\n", " if len(all_table_texts) > 1:\n", " # 多表格合并 - 标记为merged类型\n", " combined_text = \"\\n\\n=== 合并表格内容 ===\\n\\n\".join(all_table_texts)\n", " merged_content_parts.append(combined_text)\n", " element_type = \"table_html_merged\"\n", " else:\n", " # 单个表格\n", " merged_content_parts.append(all_table_texts[0])\n", " element_type = \"table_html\"\n", " \n", " if merged_content_parts:\n", " table_doc = Document(\n", " page_content='\\n\\n'.join(merged_content_parts),\n", " metadata={\n", " **metadata,\n", " \"page_number\": page_idx,\n", " \"element_type\": element_type,\n", " \"block_indices\": related_indices,\n", " \"table_count\": len(related_tables)\n", " }\n", " )\n", " documents.append(table_doc)\n", " \n", " # 处理非表格内容(保持原有逻辑)\n", " current_chunk = []\n", " current_chunk_size = 0\n", " max_chunk_size = 1000\n", " min_chunk_size = 100\n", " \n", " for i, block in enumerate(para_blocks):\n", " if block.get('type') != 'table': # 跳过表格块\n", " text_content = extract_text_from_block(block)\n", " if text_content:\n", " if current_chunk_size + len(text_content) > max_chunk_size and current_chunk:\n", " chunk_content = '\\n\\n'.join(current_chunk)\n", " if len(chunk_content) >= min_chunk_size:\n", " chunk_doc = Document(\n", " page_content=chunk_content,\n", " metadata={\n", " **metadata,\n", " \"page_number\": page_idx,\n", " \"element_type\": \"text\"\n", " }\n", " )\n", " documents.append(chunk_doc)\n", " current_chunk = [text_content]\n", " current_chunk_size = len(text_content)\n", " else:\n", " current_chunk.append(text_content)\n", " current_chunk_size += len(text_content)\n", " \n", " # 处理页面结束时剩余的chunk\n", " if current_chunk:\n", " chunk_content = '\\n\\n'.join(current_chunk)\n", " if len(chunk_content) >= min_chunk_size:\n", " chunk_doc = Document(\n", " page_content=chunk_content,\n", " metadata={\n", " **metadata,\n", " \"page_number\": page_idx,\n", " \"element_type\": \"text\"\n", " }\n", " )\n", " documents.append(chunk_doc)\n", " \n", " return documents\n", " \n", " except Exception as e:\n", " print(f\"处理JSON文件时出错: {e}\")\n", " return []\n", "\n", "def is_financial_related_table(block1, block2):\n", " \"\"\"\n", " 检查两个表格块是否是相关的财务表格\n", " \"\"\"\n", " # 提取两个表格的HTML内容\n", " html1 = extract_table_html_from_block(block1)\n", " html2 = extract_table_html_from_block(block2)\n", " \n", " if not html1 or not html2:\n", " return False\n", " \n", " financial_keywords = [\n", " 'net sales', 'total net sales', 'revenue', 'income', 'profit', 'loss',\n", " 'operating', 'gross', 'expenses', 'cost', '$', 'million', 'billion',\n", " 'consolidated', 'statements', 'operations',\n", " '营业收入', '营业利润', '利润总额', '净利润', '人民币', '百万元', '千元',\n", " '资产', '负债', '权益', '现金流量', '合并', '报表', '财务',\n", " '通信服务收入', '销售产品收入', '网络运营', '折旧', '摊销',\n", " '流动资产', '非流动资产', '流动负债', '股东权益', '每股收益'\n", " ]\n", " \n", " html1_lower = html1.lower()\n", " html2_lower = html2.lower()\n", " \n", " # 计算每个表格包含的财务关键词数量\n", " count1 = sum(1 for kw in financial_keywords if kw in html1_lower)\n", " count2 = sum(1 for kw in financial_keywords if kw in html2_lower)\n", " \n", " # 如果都包含足够的财务关键词,认为是相关的\n", " return count1 >= 2 and count2 >= 2\n", "\n", "def process_mineru_directory(data_dir, vectorstore, skip_files=None, batch_size=100):\n", " \"\"\"\n", " 处理包含MinerU JSON文件的目录\n", " \n", " Args:\n", " data_dir: 数据目录路径\n", " vectorstore: 向量存储实例\n", " skip_files: 要跳过的文件列表 (默认为None)\n", " batch_size: 批处理大小,避免超过向量化限制 (默认100)\n", " \"\"\"\n", " if skip_files is None:\n", " skip_files = []\n", " \n", " total_chunks = 0\n", " successful_files = 0\n", " failed_files = 0\n", " processed_files = []\n", " \n", " for company_folder in os.listdir(data_dir):\n", " company_path = os.path.join(data_dir, company_folder)\n", " if os.path.isdir(company_path):\n", " print(f\"\\n正在处理:{company_folder} 的年报...\")\n", " \n", " # 获取JSON文件\n", " json_files = [f for f in os.listdir(company_path) if f.endswith('.json')]\n", " \n", " for json_file in json_files:\n", " # 检查是否需要跳过此文件\n", " if json_file in skip_files:\n", " print(f\"\\n ⏭️ 跳过文件:{json_file}\")\n", " continue\n", " \n", " json_path = os.path.join(company_path, json_file)\n", " print(f\"\\n 加载文件:{json_file}\")\n", " \n", " try:\n", " # 验证JSON文件格式\n", " with open(json_path, 'r', encoding='utf-8') as f:\n", " test_data = json.load(f)\n", " \n", " # 处理JSON文件\n", " metadata = {\n", " 'company': company_folder,\n", " 'file': json_file.replace('.json', '.pdf') # 假设JSON对应PDF\n", " }\n", " documents = process_mineru_json(json_path, metadata)\n", " \n", " if not documents:\n", " print(f\" ❌ 无法从文件 {json_file} 中提取内容\")\n", " failed_files += 1\n", " continue\n", " \n", " print(f\" 提取到 {len(documents)} 个文档块\")\n", " \n", " # 统计表格和文本块数量\n", " table_count = sum(1 for doc in documents if doc.metadata.get(\"element_type\") == \"table_html\")\n", " text_count = len(documents) - table_count\n", " print(f\" - HTML表格块: {table_count} 个\")\n", " print(f\" - 文本块: {text_count} 个\")\n", " \n", " # 分批添加到向量存储,避免超过512项限制\n", " print(f\" 分批添加文档到向量存储(批大小: {batch_size})...\")\n", " for i in range(0, len(documents), batch_size):\n", " batch = documents[i:i + batch_size]\n", " try:\n", " vectorstore.add_documents(batch)\n", " print(f\" ✓ 已处理批次 {i//batch_size + 1}/{(len(documents)-1)//batch_size + 1}\")\n", " except Exception as batch_error:\n", " print(f\" ❌ 批次 {i//batch_size + 1} 处理失败: {str(batch_error)}\")\n", " # 如果批处理失败,尝试单个处理\n", " print(f\" 🔄 尝试单个文档处理...\")\n", " for doc in batch:\n", " try:\n", " vectorstore.add_documents([doc])\n", " except Exception as single_error:\n", " print(f\" ❌ 单个文档处理失败: {str(single_error)}\")\n", " continue\n", " \n", " total_chunks += len(documents)\n", " successful_files += 1\n", " processed_files.append(json_file)\n", " print(f\" ✓ 已处理完成,累计文本块数:{total_chunks}\")\n", " \n", " except json.JSONDecodeError as je:\n", " print(f\" ❌ JSON格式错误 {json_file}: {str(je)}\")\n", " print(f\" 建议检查文件是否完整或重新生成\")\n", " failed_files += 1\n", " continue\n", " except Exception as e:\n", " error_msg = str(e)\n", " if \"512 items\" in error_msg:\n", " print(f\" ❌ 文档数量超限 {json_file}: 尝试减小batch_size\")\n", " elif \"Internal server error\" in error_msg:\n", " print(f\" ❌ 服务器错误 {json_file}: 建议稍后重试\")\n", " else:\n", " print(f\" ❌ 处理文件 {json_file} 时出错: {error_msg}\")\n", " failed_files += 1\n", " continue\n", " \n", " print(f\"\\n🎉 所有JSON文档处理完成!\")\n", " print(f\"📊 总计处理了 {total_chunks} 个文本块\")\n", " print(f\"✅ 成功处理 {successful_files} 个文件\")\n", " print(f\"❌ 失败 {failed_files} 个文件\")\n", " \n", " if processed_files:\n", " print(f\"\\n✅ 成功处理的文件:\")\n", " for pf in processed_files:\n", " print(f\" - {pf}\")\n", " \n", " if failed_files > 0:\n", " print(f\"\\n⚠️ 失败文件处理建议:\")\n", " print(f\" - JSON格式错误:检查文件完整性,重新生成MinerU解析\")\n", " print(f\" - 512项限制:使用更小的batch_size参数\")\n", " print(f\" - 服务器错误:稍后重试,或检查网络连接\")\n", " print(f\" - 使用skip_files参数跳过已处理的文件\")\n", " \n", " print(\"✅ 所有文档已成功索引到 Weaviate!\")\n", " \n", " return {\n", " 'total_chunks': total_chunks,\n", " 'successful_files': successful_files,\n", " 'failed_files': failed_files,\n", " 'processed_files': processed_files\n", " }\n", "\n", "def retry_failed_files(data_dir, vectorstore, failed_files, batch_size=30):\n", " \"\"\"\n", " 专门重试失败的文件\n", " \"\"\"\n", " print(f\"🔄 重试 {len(failed_files)} 个失败的文件...\")\n", " \n", " for failed_file in failed_files:\n", " print(f\"\\n🎯 重试处理: {failed_file}\")\n", " \n", " # 查找文件所在的公司文件夹\n", " found = False\n", " for company_folder in os.listdir(data_dir):\n", " company_path = os.path.join(data_dir, company_folder)\n", " if os.path.isdir(company_path):\n", " json_path = os.path.join(company_path, failed_file)\n", " if os.path.exists(json_path):\n", " found = True\n", " print(f\" 📁 在 {company_folder} 中找到文件\")\n", " \n", " try:\n", " # 验证JSON格式\n", " with open(json_path, 'r', encoding='utf-8') as f:\n", " data = json.load(f)\n", " \n", " # 处理文件\n", " metadata = {\n", " 'company': company_folder,\n", " 'file': failed_file.replace('.json', '.pdf')\n", " }\n", " documents = process_mineru_json(json_path, metadata)\n", " \n", " if documents:\n", " print(f\" ✅ 提取到 {len(documents)} 个文档块\")\n", " \n", " # 超小批次处理\n", " for i in range(0, len(documents), batch_size):\n", " batch = documents[i:i + batch_size]\n", " try:\n", " vectorstore.add_documents(batch)\n", " print(f\" ✓ 批次 {i//batch_size + 1} 处理成功\")\n", " except Exception as e:\n", " print(f\" ❌ 批次失败: {str(e)}\")\n", " # 逐个处理\n", " for doc in batch:\n", " try:\n", " vectorstore.add_documents([doc])\n", " except:\n", " continue\n", " \n", " print(f\" ✅ {failed_file} 重试成功!\")\n", " else:\n", " print(f\" ❌ {failed_file} 无法提取内容\")\n", " \n", " except json.JSONDecodeError:\n", " print(f\" ❌ {failed_file} JSON格式错误,建议重新生成\")\n", " except Exception as e:\n", " print(f\" ❌ {failed_file} 重试失败: {str(e)}\")\n", " \n", " break\n", " \n", " if not found:\n", " print(f\" ❌ 未找到文件: {failed_file}\")\n", "\n", "# 如果您有MinerU JSON文件,请使用以下代码处理\n", "# process_mineru_directory(\"data/年报_json\", vectorstore)\n", "\n", "print(\"MinerU JSON处理函数已定义完成\")" ] }, { "cell_type": "code", "execution_count": 70, "id": "a8d18467", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "正在处理:中国移动(A股)[600941] 的年报...\n", "\n", " 加载文件:中国移动:2023年度报告.json\n", " 提取到 420 个文档块\n", " - HTML表格块: 194 个\n", " - 文本块: 226 个\n", " 分批添加文档到向量存储(批大小: 50)...\n", " 提取到 420 个文档块\n", " - HTML表格块: 194 个\n", " - 文本块: 226 个\n", " 分批添加文档到向量存储(批大小: 50)...\n", " ❌ 批次 1 处理失败: HTTPSConnectionPool(host='api.jina.ai', port=443): Max retries exceeded with url: /v1/embeddings (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))\n", " 🔄 尝试单个文档处理...\n", " ❌ 批次 1 处理失败: HTTPSConnectionPool(host='api.jina.ai', port=443): Max retries exceeded with url: /v1/embeddings (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1016)')))\n", " 🔄 尝试单个文档处理...\n", " ✓ 已处理批次 2/9\n", " ✓ 已处理批次 2/9\n", " ✓ 已处理批次 3/9\n", " ✓ 已处理批次 3/9\n", " ✓ 已处理批次 4/9\n", " ✓ 已处理批次 4/9\n", " ✓ 已处理批次 5/9\n", " ✓ 已处理批次 5/9\n", " ✓ 已处理批次 6/9\n", " ✓ 已处理批次 6/9\n", " ✓ 已处理批次 7/9\n", " ✓ 已处理批次 7/9\n", " ✓ 已处理批次 8/9\n", " ✓ 已处理批次 8/9\n", " ✓ 已处理批次 9/9\n", " ✓ 已处理完成,累计文本块数:420\n", "\n", "🎉 所有JSON文档处理完成!\n", "📊 总计处理了 420 个文本块\n", "✅ 成功处理 1 个文件\n", "❌ 失败 0 个文件\n", "\n", "✅ 成功处理的文件:\n", " - 中国移动:2023年度报告.json\n", "✅ 所有文档已成功索引到 Weaviate!\n", "\n", "📋 处理结果总结:\n", "✅ 本次成功: 1 个文件\n", "❌ 本次失败: 0 个文件\n", "📊 本次新增: 420 个文档块\n", " ✓ 已处理批次 9/9\n", " ✓ 已处理完成,累计文本块数:420\n", "\n", "🎉 所有JSON文档处理完成!\n", "📊 总计处理了 420 个文本块\n", "✅ 成功处理 1 个文件\n", "❌ 失败 0 个文件\n", "\n", "✅ 成功处理的文件:\n", " - 中国移动:2023年度报告.json\n", "✅ 所有文档已成功索引到 Weaviate!\n", "\n", "📋 处理结果总结:\n", "✅ 本次成功: 1 个文件\n", "❌ 本次失败: 0 个文件\n", "📊 本次新增: 420 个文档块\n" ] } ], "source": [ "skip_successful_files = []\n", "\n", "# 定义失败的文件,可以单独重试\n", "failed_files_to_retry = [\n", " '中国移动:2024年度报告.json',\n", " '腾讯音乐-SW:2022年度报告.json',\n", " '腾讯音乐-SW:2023年度报告.json', \n", " '腾讯音乐-SW:2024年度报告.json'\n", "]\n", "\n", "# 处理失败的文件,使用更小的批处理大小\n", "result = process_mineru_directory(\n", " data_dir, \n", " vectorstore, \n", " skip_files=skip_successful_files, # 跳过已成功处理的文件\n", " batch_size=50 # 使用更小的批处理大小,避免512项限制\n", ")\n", "\n", "print(f\"\\n📋 处理结果总结:\")\n", "print(f\"✅ 本次成功: {result['successful_files']} 个文件\")\n", "print(f\"❌ 本次失败: {result['failed_files']} 个文件\") \n", "print(f\"📊 本次新增: {result['total_chunks']} 个文档块\")" ] }, { "cell_type": "code", "execution_count": 11, "id": "34e62fe3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔄 重试 4 个失败的文件...\n", "\n", "🎯 重试处理: 中国移动:2024年度报告.json\n", " 📁 在 中国移动(A股)[600941] 中找到文件\n", " ✅ 提取到 439 个文档块\n", " ✅ 提取到 439 个文档块\n", " ✓ 批次 1 处理成功\n", " ✓ 批次 1 处理成功\n", " ✓ 批次 2 处理成功\n", " ✓ 批次 2 处理成功\n", " ✓ 批次 3 处理成功\n", " ✓ 批次 3 处理成功\n", " ✓ 批次 4 处理成功\n", " ✓ 批次 4 处理成功\n", " ✓ 批次 5 处理成功\n", " ✓ 批次 5 处理成功\n", " ✓ 批次 6 处理成功\n", " ✓ 批次 6 处理成功\n", " ✓ 批次 7 处理成功\n", " ✓ 批次 7 处理成功\n", " ✓ 批次 8 处理成功\n", " ✓ 批次 8 处理成功\n", " ✓ 批次 9 处理成功\n", " ✓ 批次 9 处理成功\n", " ✓ 批次 10 处理成功\n", " ✓ 批次 10 处理成功\n", " ✓ 批次 11 处理成功\n", " ✓ 批次 11 处理成功\n", " ✓ 批次 12 处理成功\n", " ✓ 批次 12 处理成功\n", " ✓ 批次 13 处理成功\n", " ✓ 批次 13 处理成功\n", " ✓ 批次 14 处理成功\n", " ✓ 批次 14 处理成功\n", " ✓ 批次 15 处理成功\n", " ✅ 中国移动:2024年度报告.json 重试成功!\n", "\n", "🎯 重试处理: 腾讯音乐-SW:2022年度报告.json\n", " 📁 在 腾讯音乐-SW(港股)[01698] 中找到文件\n", " ✓ 批次 15 处理成功\n", " ✅ 中国移动:2024年度报告.json 重试成功!\n", "\n", "🎯 重试处理: 腾讯音乐-SW:2022年度报告.json\n", " 📁 在 腾讯音乐-SW(港股)[01698] 中找到文件\n", " ✅ 提取到 513 个文档块\n", " ✅ 提取到 513 个文档块\n", " ✓ 批次 1 处理成功\n", " ✓ 批次 1 处理成功\n", " ✓ 批次 2 处理成功\n", " ✓ 批次 2 处理成功\n", " ✓ 批次 3 处理成功\n", " ✓ 批次 3 处理成功\n", " ✓ 批次 4 处理成功\n", " ✓ 批次 4 处理成功\n", " ✓ 批次 5 处理成功\n", " ✓ 批次 5 处理成功\n", " ✓ 批次 6 处理成功\n", " ✓ 批次 6 处理成功\n", " ✓ 批次 7 处理成功\n", " ✓ 批次 7 处理成功\n", " ✓ 批次 8 处理成功\n", " ✓ 批次 8 处理成功\n", " ✓ 批次 9 处理成功\n", " ✓ 批次 9 处理成功\n", " ✓ 批次 10 处理成功\n", " ✓ 批次 10 处理成功\n", " ✓ 批次 11 处理成功\n", " ✓ 批次 11 处理成功\n", " ✓ 批次 12 处理成功\n", " ✓ 批次 12 处理成功\n", " ✓ 批次 13 处理成功\n", " ✓ 批次 13 处理成功\n", " ✓ 批次 14 处理成功\n", " ✓ 批次 14 处理成功\n", " ✓ 批次 15 处理成功\n", " ✓ 批次 15 处理成功\n", " ✓ 批次 16 处理成功\n", " ✓ 批次 16 处理成功\n", " ✓ 批次 17 处理成功\n", " ✓ 批次 17 处理成功\n", " ✓ 批次 18 处理成功\n", " ✅ 腾讯音乐-SW:2022年度报告.json 重试成功!\n", "\n", "🎯 重试处理: 腾讯音乐-SW:2023年度报告.json\n", " 📁 在 腾讯音乐-SW(港股)[01698] 中找到文件\n", " ✓ 批次 18 处理成功\n", " ✅ 腾讯音乐-SW:2022年度报告.json 重试成功!\n", "\n", "🎯 重试处理: 腾讯音乐-SW:2023年度报告.json\n", " 📁 在 腾讯音乐-SW(港股)[01698] 中找到文件\n", " ✅ 提取到 504 个文档块\n", " ✅ 提取到 504 个文档块\n", " ✓ 批次 1 处理成功\n", " ✓ 批次 1 处理成功\n", " ✓ 批次 2 处理成功\n", " ✓ 批次 2 处理成功\n", " ✓ 批次 3 处理成功\n", " ✓ 批次 3 处理成功\n", " ✓ 批次 4 处理成功\n", " ✓ 批次 4 处理成功\n", " ✓ 批次 5 处理成功\n", " ✓ 批次 5 处理成功\n", " ✓ 批次 6 处理成功\n", " ✓ 批次 6 处理成功\n", " ✓ 批次 7 处理成功\n", " ✓ 批次 7 处理成功\n", " ✓ 批次 8 处理成功\n", " ✓ 批次 8 处理成功\n", " ✓ 批次 9 处理成功\n", " ✓ 批次 9 处理成功\n", " ✓ 批次 10 处理成功\n", " ✓ 批次 10 处理成功\n", " ✓ 批次 11 处理成功\n", " ✓ 批次 11 处理成功\n", " ✓ 批次 12 处理成功\n", " ✓ 批次 12 处理成功\n", " ✓ 批次 13 处理成功\n", " ✓ 批次 13 处理成功\n", " ✓ 批次 14 处理成功\n", " ✓ 批次 14 处理成功\n", " ✓ 批次 15 处理成功\n", " ✓ 批次 15 处理成功\n", " ✓ 批次 16 处理成功\n", " ✓ 批次 16 处理成功\n", " ✓ 批次 17 处理成功\n", " ✅ 腾讯音乐-SW:2023年度报告.json 重试成功!\n", "\n", "🎯 重试处理: 腾讯音乐-SW:2024年度报告.json\n", " 📁 在 腾讯音乐-SW(港股)[01698] 中找到文件\n", " ✓ 批次 17 处理成功\n", " ✅ 腾讯音乐-SW:2023年度报告.json 重试成功!\n", "\n", "🎯 重试处理: 腾讯音乐-SW:2024年度报告.json\n", " 📁 在 腾讯音乐-SW(港股)[01698] 中找到文件\n", " ✅ 提取到 501 个文档块\n", " ✅ 提取到 501 个文档块\n", " ✓ 批次 1 处理成功\n", " ✓ 批次 1 处理成功\n", " ✓ 批次 2 处理成功\n", " ✓ 批次 2 处理成功\n", " ✓ 批次 3 处理成功\n", " ✓ 批次 3 处理成功\n", " ✓ 批次 4 处理成功\n", " ✓ 批次 4 处理成功\n", " ✓ 批次 5 处理成功\n", " ✓ 批次 5 处理成功\n", " ✓ 批次 6 处理成功\n", " ✓ 批次 6 处理成功\n", " ✓ 批次 7 处理成功\n", " ✓ 批次 7 处理成功\n", " ✓ 批次 8 处理成功\n", " ✓ 批次 8 处理成功\n", " ✓ 批次 9 处理成功\n", " ✓ 批次 9 处理成功\n", " ✓ 批次 10 处理成功\n", " ✓ 批次 10 处理成功\n", " ✓ 批次 11 处理成功\n", " ✓ 批次 11 处理成功\n", " ✓ 批次 12 处理成功\n", " ✓ 批次 12 处理成功\n", " ✓ 批次 13 处理成功\n", " ✓ 批次 13 处理成功\n", " ✓ 批次 14 处理成功\n", " ✓ 批次 14 处理成功\n", " ✓ 批次 15 处理成功\n", " ✓ 批次 15 处理成功\n", " ✓ 批次 16 处理成功\n", " ✓ 批次 16 处理成功\n", " ✓ 批次 17 处理成功\n", " ✅ 腾讯音乐-SW:2024年度报告.json 重试成功!\n", " ✓ 批次 17 处理成功\n", " ✅ 腾讯音乐-SW:2024年度报告.json 重试成功!\n" ] } ], "source": [ "retry_failed_files(data_dir, vectorstore, failed_files_to_retry, batch_size=30)" ] }, { "cell_type": "code", "execution_count": 7, "id": "d08f8034", "metadata": {}, "outputs": [], "source": [ "client.close()" ] }, { "cell_type": "code", "execution_count": 6, "id": "5efc33d3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "result: ['Report of Independent Registered Public Accounting Firm\\n\\nTo the Shareholders and the Board of Directors of Apple Inc.\\n\\nOpinion on the Financial Statements\\n\\nWe have audited the accompanying consolidated balance sheets of Apple Inc. as of September 30, 2023 and September 24, 2022, the related consolidated statements of operations, comprehensive income, shareholders\\' equity and cash flows for each of the three years in the period ended September 30, 2023, and the related notes (collectively referred to as the \"financial statements\"). In our opinion, the financial statements present fairly, in all material respects, the financial position of Apple Inc. at September 30, 2023 and September 24, 2022, and the results of its operations and its cash flows for each of the three years in the period ended September 30, 2023, in conformity with U.S. generally accepted accounting principles.', \"Apple Inc.\\n\\nNotes to Consolidated Financial Statements\\n\\nNote 1 - Summary of Significant Accounting Policies\\n\\nBasis of Presentation and Preparation\\n\\nThe consolidated financial statements include the accounts of Apple Inc. and its wholly owned subsidiaries. The preparation of these consolidated financial statements and accompanying notes in conformity with GAAP requires the use of management estimates. Certain prior period amounts in the consolidated financial statements and accompanying notes have been reclassified to conform to the current period's presentation.\", 'Apple Inc.\\n\\nCONSOLIDATED BALANCE SHEETS\\n\\n(In millions, except number of shares, which are reflected in thousands, and par value)\\n\\nSee accompanying Notes to Consolidated Financial Statements.', 'Apple Inc.\\n\\nCONSOLIDATED STATEMENTS OF OPERATIONS\\n\\n(In millions, except number of shares, which are reflected in thousands, and per- share amounts)\\n\\nNet sales: Products Services Total net sales\\n\\nCost of sales: Products Services Total cost of sales Gross margin\\n\\nOperating expenses: Research and development Selling, general and administrative Total operating expenses\\n\\nOperating income Other income/(expense), net Income before provision for income taxes Provision for income taxes Net income\\n\\nEarnings per share: Basic Diluted\\n\\nShares used in computing earnings per share:\\n\\nBasic Diluted\\n\\n15,744,231 16,215,963 16,701,272 15,812,547 16,325,819 16,864,919\\n\\nSee accompanying Notes to Consolidated Financial Statements.', 'We also have audited, in accordance with the standards of the Public Company Accounting Oversight Board (United States) (the \"PCAOB\"), the consolidated balance sheets of Apple Inc. as of September 30, 2023 and September 24, 2022, the related consolidated statements of operations, comprehensive income, shareholders\\' equity and cash flows for each of the three years in the period ended September 30, 2023, and the related notes and our report dated November 2, 2023 expressed an unqualified opinion thereon.\\n\\nBasis for Opinion', \"Item 8. Financial Statements and Supplementary Data\\n\\n=== 表格数据摘要 ===\\n0: Index to Consolidated Financial Statements | 1: Page\\n0: Consolidated Statements of Operations for the years ended September 30, 2023, September 24, 2022 and September 25, 2021 | 1: 28\\n0: Consolidated Statements of Comprehensive Income for the years ended September 30, 2023, September 24, 2022 and September 25, 2021 | 1: 29\\n0: Consolidated Balance Sheets as of September 30, 2023 and September 24, 2022 | 1: 30\\n0: Consolidated Statements of Shareholders' Equity for the years ended September 30, 2023, September 24, 2022 and September 25, 2021 | 1: 31\\n0: Consolidated Statements of Cash Flows for the years ended September 30, 2023, September 24, 2022 and September 25, 2021 | 1: 32\\n0: Notes to Consolidated Financial Statements | 1: 33\\n0: Reports of Independent Registered Public Accounting Firm | 1: 49\\n\\n=== 原始表格 ===\\n
Index to Consolidated Financial StatementsPage
Consolidated Statements of Operations for the years ended September 30, 2023, September 24, 2022 and September 25, 202128
Consolidated Statements of Comprehensive Income for the years ended September 30, 2023, September 24, 2022 and September 25, 202129
Consolidated Balance Sheets as of September 30, 2023 and September 24, 202230
Consolidated Statements of Shareholders' Equity for the years ended September 30, 2023, September 24, 2022 and September 25, 202131
Consolidated Statements of Cash Flows for the years ended September 30, 2023, September 24, 2022 and September 25, 202132
Notes to Consolidated Financial Statements33
Reports of Independent Registered Public Accounting Firm49
\", \"PART IV\\n\\nItem 15. Exhibit and Financial Statement Schedules\\n\\n(a) Documents filed as part of this report\\n\\n(1) All financial statements\\n\\nIndex to Consolidated Financial Statements\\n\\nConsolidated Statements of Operations for the years ended September 30, 2023, September 24, 2022 and September 25, 2021 28 Consolidated Statements of Comprehensive Income for the years ended September 30, 2023, September 24, 2022 and September 25, 2021 29 Consolidated Balance Sheets as of September 30, 2023 and September 24, 2022 30 Consolidated Statements of Shareholders' Equity for the years ended September 30, 2023, September 24, 2022 and September 25, 2021 31 Consolidated Statements of Cash Flows for the years ended September 30, 2023, September 24, 2022 and September 25, 2021 32 Notes to Consolidated Financial Statements 33 Reports of Independent Registered Public Accounting Firm* 49 * Ernst & Young LLP, PCAOB Firm ID No. 00042.\\n\\nErnst & Young LLP, PCAOB Firm ID No.00042.\\n\\n(2) Financial Statement Schedules\", 'Item 7. Management\\'s Discussion and Analysis of Financial Condition and Results of Operations\\n\\nItem 7. Management\\'s Discussion and Analysis of Financial Condition and Results of OperationsThe following discussion should be read in conjunction with the consolidated financial statements and accompanying notes included in Part II, Item 8 of this Form 10- K. This item generally discusses 2023 and 2022 items and year- to- year comparisons between 2023 and 2022. Discussions of 2021 items and year- to- year comparisons between 2022 and 2021 are not included, and can be found in \"Management\\'s Discussion and Analysis of Financial Condition and Results of Operations\" in Part II, Item 7 of the Company\\'s Annual Report on Form 10- K for the fiscal year ended September 24, 2022.\\n\\nFiscal Period', \"UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549\\n\\nFORM 10-K\\n\\n(Mark One)\\n\\nANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the fiscal year ended September 30, 2023 or\\n\\nTRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\\n\\nFor the transition period from to Commission File Number: 001- 36743\\n\\nApple Inc.\\n\\n(Exact name of Registrant as specified in its charter)\\n\\nCalifornia (State or other jurisdiction of incorporation of organization)\\n\\n94- 2404110 (R.S. Employer Identification No.)\\n\\nOne Apple Park Way Cupertino, California (Address of principal executive offices)\\n\\n95014 (Zip Code)\\n\\n(408) 996-1010 (Registrant's telephone number, including area code)\\n\\nSecurities registered pursuant to Section 12(b) of the Act:\\n\\nSecurities registered pursuant to Section 12(g) of the Act: None\\n\\nIndicate by check mark if the Registrant is a well- known seasoned issuer, as defined in Rule 405 of the Securities Act.\\n\\nYes No\", 'We also have audited, in accordance with the standards of the Public Company Accounting Oversight Board (United States) (the \"PCAOB\"), Apple Inc.\\'s internal control over financial reporting as of September 30, 2023, based on criteria established in Internal Control - Integrated Framework issued by the Committee of Sponsoring Organizations of the Treadway Commission (2013 framework) and our report dated November 2, 2023 expressed an unqualified opinion thereon.\\n\\nBasis for Opinion\\n\\nThese financial statements are the responsibility of Apple Inc.\\'s management. Our responsibility is to express an opinion on Apple Inc.\\'s financial statements based on our audits. We are a public accounting firm registered with the PCAOB and are required to be independent with respect to Apple Inc. in accordance with the U.S. federal securities laws and the applicable rules and regulations of the U.S. Securities and Exchange Commission and the PCAOB.', 'Apple Inc.\\n\\nCONSOLIDATED STATEMENTS OF CASH FLOWS (In millions)\\n\\nCash, cash equivalents and restricted cash, beginning balances\\n\\nOperating activities:\\n\\nNet income Adjustments to reconcile net income to cash generated by operating activities: Depreciation and amortization Share- based compensation expense Other Changes in operating assets and liabilities: Accounts receivable, net Vendor non- trade receivables Inventories Other current and non- current assets Accounts payable Other current and non- current liabilities Cash generated by operating activities\\n\\nInvesting activities:\\n\\nPurchases of marketable securities Proceeds from maturities of marketable securities Proceeds from sales of marketable securities Payments for acquisition of property, plant and equipment Other Cash generated by/(used in) investing activities\\n\\nFinancing activities:', 'Report of Independent Registered Public Accounting Firm\\n\\nTo the Shareholders and the Board of Directors of Apple Inc.\\n\\nOpinion on Internal Control Over Financial Reporting\\n\\nWe have audited Apple Inc\\'s internal control over financial reporting as of September 30, 2023, based on criteria established in Internal Control - Integrated Framework issued by the Committee of Sponsoring Organizations of the Treadway Commission (2013 framework) (the \"COSO criteria). In our opinion, Apple Inc. maintained, in all material respects, effective internal control over financial reporting as of September 30, 2023, based on the COSO criteria.', 'Apple Inc.\\n\\nCONSOLIDATED STATEMENTS OF COMPREHENSIVE INCOME (In millions)\\n\\nNet income Other comprehensive income/(loss): Change in foreign currency translation, net of tax\\n\\nChange in unrealized gains/losses on derivative instruments, net of tax: Change in fair value of derivative instruments Adjustment for net (gains)/losses realized and included in net income Total change in unrealized gains/losses on derivative instruments\\n\\nChange in unrealized gains/losses on marketable debt securities, net of tax:\\n\\nChange in fair value of marketable debt securities Adjustment for net (gains)/losses realized and included in net income Total change in unrealized gains/losses on marketable debt securities\\n\\nTotal other comprehensive income/(loss): Total comprehensive income\\n\\nSee accompanying Notes to Consolidated Financial Statements.', 'This section should be read in conjunction with Part II, Item 7, \"Management\\'s Discussion and Analysis of Financial Condition and Results of Operations\" and the consolidated financial statements and accompanying notes in Part II, Item 8, \"Financial Statements and Supplementary Data\" of this Form 10- K.\\n\\nMacroeconomic and Industry Risks\\n\\nThe Company\\'s operations and performance depend significantly on global and regional economic conditions and adverse economic conditions can materially adversely affect the Company\\'s business, results of operations and financial condition.', 'SIGNATURES\\n\\nPursuant to the requirements of Section 13 or 15(d) of the Securities Exchange Act of 1934, the Registrant has duly caused this report to be signed on its behalf by the undersigned, thereunto duly authorized.\\n\\nDate: November 2, 2023\\n\\nApple Inc.\\n\\nBy: /s/ Luca Maestri\\n\\nLuca Maestri Senior Vice President, Chief Financial Officer\\n\\nPower of Attorney\\n\\nKNOW ALL PERSONS BY THESE PRESENTS, that each person whose signature appears below constitutes and appoints Timothy D. Cook and Luca Maestri, jointly and severally, his or her attorneys- in- fact, each with the power of substitution, for him or her in any and all capacities, to sign any amendments to this Annual Report on Form 10- K, and to file the same, with exhibits thereto and other documents in connection therewith, with the Securities and Exchange Commission, hereby ratifying and confirming all that each of said attorneys- in- fact, or his substitute or substitutes, may do or cause to be done by virtue hereof.', 'All financial statement schedules have been omitted, since the required information is not applicable or is not present in amounts sufficient to require submission of the schedule, or because the information required is included in the consolidated financial statements and accompanying notes included in this Form 10- K.', 'Unless otherwise stated, all information presented herein is based on the Company\\'s fiscal calendar, and references to particular years, quarters, months or periods refer to the Company\\'s fiscal years ended in September, and the associated quarters, months and periods of those fiscal years. Each of the terms the \"Company\" and \"Apple\" as used herein refers collectively to Apple Inc. and its wholly owned subsidiaries, unless otherwise stated.\\n\\nPART I\\n\\nItem 1. Business\\n\\nCompany Background\\n\\nThe Company designs, manufactures and markets smartphones, personal computers, tablets, wearables and accessories, and sells a variety of related services. The Company\\'s fiscal year is the 52- or 53- week period that ends on the last Saturday of September.\\n\\nProducts\\n\\niPhone\\n\\niPhone is the Company\\'s line of smartphones based on its iOS operating system. The iPhone line includes iPhone 15 Pro, iPhone 15, iPhone 14, iPhone 13 and iPhone SE.\\n\\nMac', \"=== 表格数据摘要 ===\\n0: Exhibit Number | 1: Exhibit Description | 2: Form | 3: Exhibit | 4: Filing Date/ Period End Date\\n0: 4.29 | 1: Officer's Certificate of the Registrant, dated as of May 10, 2023, including forms of global notes representing the 4.421% Notes due 2026, 4.000% Notes due 2028, 4.150% Notes due 2030, 4.300% Notes due 2033 and 4.850% Notes due 2053. | 2: 8-K | 3: 4.1 | 4: 5/10/23\\n0: 4.30* | 1: Apple Inc. Deferred Compensation Plan. | 2: S-8 | 3: 4.1 | 4: 8/23/18\\n0: 10.1* | 1: Apple Inc. Employee Stock Purchase Plan, as amended and restated as of March 10, 2015. | 2: 8-K | 3: 10.1 | 4: 3/13/15\\n0: 10.2* | 1: Form of Indemnification Agreement between the Registrant and each director and executive officer of the Registrant. | 2: 10-Q | 3: 10.2 | 4: 6/27/09\\n0: 10.3* | 1: Apple Inc. Non-Employee Director Stock Plan, as amended November 9, 2021. | 2: 10-Q | 3: 10.1 | 4: 12/25/21\\n0: 10.4* | 1: Apple Inc. 2014 Employee Stock Plan, as amended and restated as of October 1, 2017. | 2: 10-K | 3: 10.8 | 4: 9/30/17\\n0: 10.5* | 1: Form of Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of September 26, 2017. | 2: 10-K | 3: 10.20 | 4: 9/30/17\\n0: 10.6* | 1: Form of Restricted Stock Unit Award Agreement under Non-Employee Director Stock Plan effective as of February 13, 2018. | 2: 10-Q | 3: 10.2 | 4: 3/31/18\\n0: 10.7* | 1: Form of Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of August 21, 2018. | 2: 10-K | 3: 10.17 | 4: 9/29/18\\n0: 10.8* | 1: Form of Performance Award Agreement under 2014 Employee Stock Plan effective as of August 21, 2018. | 2: 10-K | 3: 10.18 | 4: 9/29/18\\n0: 10.9* | 1: Form of Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of September 29, 2019. | 2: 10-K | 3: 10.15 | 4: 9/28/19\\n0: 10.10* | 1: Form of Performance Award Agreement under 2014 Employee Stock Plan effective as of September 29, 2019. | 2: 10-K | 3: 10.16 | 4: 9/28/19\\n0: 10.11* | 1: Form of Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of August 18, 2020. | 2: 10-K | 3: 10.16 | 4: 9/26/20\\n0: 10.12* | 1: Form of Performance Award Agreement under 2014 Employee Stock Plan effective as of August 18, 2020. | 2: 10-K | 3: 10.17 | 4: 9/26/20\\n0: 10.13* | 1: Form of CEO Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of September 27, 2020. | 2: 10-Q | 3: 10.1 | 4: 12/26/20\\n0: 10.14* | 1: Form of CEO Performance Award Agreement under 2014 Employee Stock Plan effective as of September 27, 2020. | 2: 10-Q | 3: 10.2 | 4: 12/26/20\\n0: 10.15* | 1: Apple Inc. 2022 Employee Stock Plan. | 2: 8-K | 3: 10.1 | 4: 3/4/22\\n0: 10.16* | 1: Form of Restricted Stock Unit Award Agreement under 2022 Employee Stock Plan effective as of March 4, 2022. | 2: 8-K | 3: 10.2 | 4: 3/4/22\\n0: 10.17* | 1: Form of Performance Award Agreement under 2022 Employee Stock Plan effective as of March 4, 2022. | 2: 8-K | 3: 10.3 | 4: 3/4/22\\n0: 10.18* | 1: Apple Inc. Executive Cash Incentive Plan. | 2: 8-K | 3: 10.1 | 4: 8/19/22\\n0: 10.19* | 1: Form of CEO Restricted Stock Unit Award Agreement under 2022 Employee Stock Plan effective as of September 25, 2022. | 2: 10-Q | 3: 10.1 | 4: 12/31/22\\n0: 10.20* | 1: Form of CEO Performance Award Agreement under 2022 Employee Stock Plan effective as of September 25, 2022. | 2: 10-Q | 3: 10.2 | 4: 12/31/22\\n0: 21.1** | 1: Subsidiaries of the Registrant.\\n0: 23.1** | 1: Consent of Independent Registered Public Accounting Firm.\\n0: 24.1** | 1: Power of Attorney (included on the Signatures page of this Annual Report in Form 10-K).\\n0: 31.1** | 1: Rule 13a-14(a) / 15d-14(a) Certification of Chief Executive Officer.\\n0: 31.2** | 1: Rule 13a-14(a) / 15d-14(a) Certification of Chief Financial Officer.\\n0: 32.1** | 1: Section 1350 Certifications of Chief Executive Officer and Chief Financial Officer.\\n0: 101** | 1: Inline XBRL Document Set for the consolidated financial statements and accompanying notes in Part II, Item 8, “Financial Statements and Supplementary Data” of this Annual Report on Form 10-K.\\n\\n=== 原始表格 ===\\n
Exhibit NumberExhibit DescriptionFormExhibitFiling Date/ Period End Date
4.29Officer's Certificate of the Registrant, dated as of May 10, 2023, including forms of global notes representing the 4.421% Notes due 2026, 4.000% Notes due 2028, 4.150% Notes due 2030, 4.300% Notes due 2033 and 4.850% Notes due 2053.8-K4.15/10/23
4.30*Apple Inc. Deferred Compensation Plan.S-84.18/23/18
10.1*Apple Inc. Employee Stock Purchase Plan, as amended and restated as of March 10, 2015.8-K10.13/13/15
10.2*Form of Indemnification Agreement between the Registrant and each director and executive officer of the Registrant.10-Q10.26/27/09
10.3*Apple Inc. Non-Employee Director Stock Plan, as amended November 9, 2021.10-Q10.112/25/21
10.4*Apple Inc. 2014 Employee Stock Plan, as amended and restated as of October 1, 2017.10-K10.89/30/17
10.5*Form of Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of September 26, 2017.10-K10.209/30/17
10.6*Form of Restricted Stock Unit Award Agreement under Non-Employee Director Stock Plan effective as of February 13, 2018.10-Q10.23/31/18
10.7*Form of Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of August 21, 2018.10-K10.179/29/18
10.8*Form of Performance Award Agreement under 2014 Employee Stock Plan effective as of August 21, 2018.10-K10.189/29/18
10.9*Form of Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of September 29, 2019.10-K10.159/28/19
10.10*Form of Performance Award Agreement under 2014 Employee Stock Plan effective as of September 29, 2019.10-K10.169/28/19
10.11*Form of Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of August 18, 2020.10-K10.169/26/20
10.12*Form of Performance Award Agreement under 2014 Employee Stock Plan effective as of August 18, 2020.10-K10.179/26/20
10.13*Form of CEO Restricted Stock Unit Award Agreement under 2014 Employee Stock Plan effective as of September 27, 2020.10-Q10.112/26/20
10.14*Form of CEO Performance Award Agreement under 2014 Employee Stock Plan effective as of September 27, 2020.10-Q10.212/26/20
10.15*Apple Inc. 2022 Employee Stock Plan.8-K10.13/4/22
10.16*Form of Restricted Stock Unit Award Agreement under 2022 Employee Stock Plan effective as of March 4, 2022.8-K10.23/4/22
10.17*Form of Performance Award Agreement under 2022 Employee Stock Plan effective as of March 4, 2022.8-K10.33/4/22
10.18*Apple Inc. Executive Cash Incentive Plan.8-K10.18/19/22
10.19*Form of CEO Restricted Stock Unit Award Agreement under 2022 Employee Stock Plan effective as of September 25, 2022.10-Q10.112/31/22
10.20*Form of CEO Performance Award Agreement under 2022 Employee Stock Plan effective as of September 25, 2022.10-Q10.212/31/22
21.1**Subsidiaries of the Registrant.
23.1**Consent of Independent Registered Public Accounting Firm.
24.1**Power of Attorney (included on the Signatures page of this Annual Report in Form 10-K).
31.1**Rule 13a-14(a) / 15d-14(a) Certification of Chief Executive Officer.
31.2**Rule 13a-14(a) / 15d-14(a) Certification of Chief Financial Officer.
32.1**Section 1350 Certifications of Chief Executive Officer and Chief Financial Officer.
101**Inline XBRL Document Set for the consolidated financial statements and accompanying notes in Part II, Item 8, “Financial Statements and Supplementary Data” of this Annual Report on Form 10-K.
\", \"As discussed in Note 7 to the financial statements, Apple Inc. is subject to taxation and files income tax returns in the U.S. federal jurisdiction and many state and foreign jurisdictions. As of September 30, 2023, the total amount of gross unrecognized tax benefits was 9.5 billion, if recognized, would impact Apple Inc.'s effective tax rate. In accounting for some of the uncertain tax positions, Apple Inc. uses significant judgment in the interpretation and application of complex domestic and international tax laws.\\n\\nAuditing management's evaluation of whether an uncertain tax position is more likely than not to be sustained and the measurement of the benefit of various tax positions can be complex, involves significant judgment, and is based on interpretations of tax laws and legal rulings.\", 'Item 8. Financial Statements and Supplementary Data\\n\\nAll financial statement schedules have been omitted, since the required information is not applicable or is not present in amounts sufficient to require submission of the schedule, or because the information required is included in the consolidated financial statements and accompanying notes.', \"The Company evaluates the performance of its reportable segments based on net sales and operating income. Net sales for geographic segments are generally based on the location of customers and sales through the Company's retail stores located in those geographic locations. Operating income for each segment consists of net sales to third parties, related cost of sales, and operating expenses directly attributable to the segment. The information provided to the Company's chief operating decision maker for purposes of making decisions and assessing segment performance excludes asset information.\\n\\nA reconciliation of the Company's segment operating income to the Consolidated Statements of Operations for 2023, 2022 and 2021 is as follows (in millions):\\n\\n(1) Includes corporate marketing expenses, certain share-based compensation expenses, various nonrecurring charges, and other separately managed general and administrative costs.\", 'Form 10-K\\n\\nFor the Fiscal Year Ended September 30, 2023\\n\\nTABLE OF CONTENTS\\n\\n=== 表格数据摘要 ===\\n0: Page | 1: Page\\n0: Part I | 1: Part I\\n0: Item 1. | 1: Business\\n0: Item 1A. | 1: Risk Factors\\n0: Item 1B. | 1: Unresolved Staff Comments\\n0: Item 1C. | 1: Cybersecurity\\n0: Item 2. | 1: Properties\\n0: Item 3. | 1: Legal Proceedings\\n0: Item 4. | 1: Mine Safety Disclosures\\n0: Part II | 1: Part II\\n0: Item 5. | 1: Market for Registrant\\'s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities\\n0: Item 6. | 1: [Reserved]\\n0: Item 7. | 1: Management\\'s Discussion and Analysis of Financial Condition and Results of Operations\\n0: Item 7A. | 1: Quantitative and Qualitative Disclosures About Market Risk\\n0: Item 8. | 1: Financial Statements and Supplementary Data\\n0: Item 9. | 1: Changes in and Disagreements with Accountants on Accounting and Financial Disclosure\\n0: Item 9A. | 1: Controls and Procedures\\n0: Item 9B. | 1: Other Information\\n0: Item 9C. | 1: Disclosure Regarding Foreign Jurisdictions that Prevent Inspections\\n0: Part III | 1: Part III\\n0: Item 10. | 1: Directors, Executive Officers and Corporate Governance\\n0: Item 11. | 1: Executive Compensation\\n0: Item 12. | 1: Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters\\n0: Item 13. | 1: Certain Relationships and Related Transactions, and Director Independence\\n0: Item 14. | 1: Principal Accountant Fees and Services\\n0: Part IV | 1: Part IV\\n0: Item 15. | 1: Exhibit and Financial Statement Schedules\\n0: Item 16. | 1: Form 10-K Summary\\n\\n=== 原始表格 ===\\n
Page
Part I
Item 1.Business
Item 1A.Risk Factors
Item 1B.Unresolved Staff Comments
Item 1C.Cybersecurity
Item 2.Properties
Item 3.Legal Proceedings
Item 4.Mine Safety Disclosures
Part II
Item 5.Market for Registrant's Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities
Item 6.[Reserved]
Item 7.Management's Discussion and Analysis of Financial Condition and Results of Operations
Item 7A.Quantitative and Qualitative Disclosures About Market Risk
Item 8.Financial Statements and Supplementary Data
Item 9.Changes in and Disagreements with Accountants on Accounting and Financial Disclosure
Item 9A.Controls and Procedures
Item 9B.Other Information
Item 9C.Disclosure Regarding Foreign Jurisdictions that Prevent Inspections
Part III
Item 10.Directors, Executive Officers and Corporate Governance
Item 11.Executive Compensation
Item 12.Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters
Item 13.Certain Relationships and Related Transactions, and Director Independence
Item 14.Principal Accountant Fees and Services
Part IV
Item 15.Exhibit and Financial Statement Schedules
Item 16.Form 10-K Summary
', 'Share-Based Compensation\\n\\nThe following table shows share- based compensation expense and the related income tax benefit included in the Consolidated Statements of Operations for 2023, 2022 and 2021 (in millions):\\n\\nShare- based compensation expense Income tax benefit related to share- based compensation expense\\n\\nAs of September 30, 2023, the total unrecognized compensation cost related to outstanding RSUs was \\\\(\\\\) 18.6$ billion, which the Company expects to recognize over a weighted- average period of 2.5 years.\\n\\nNote 12 - Commitments, Contingencies and Supply Concentrations\\n\\nUnconditional Purchase Obligations', 'Available InformationThe Company\\'s Annual Reports on Form 10- K, Quarterly Reports on Form 10- Q, Current Reports on Form 8- K, and amendments to reports filed pursuant to Sections 13(a) and 15(d) of the Securities Exchange Act of 1934, as amended (the \"Exchange Act\"), are filed with the U.S. Securities and Exchange Commission (the \"SEC\"). Such reports and other information filed by the Company with the SEC are available free of charge at investor.apple.com/investor- relations/sec- filings/default.aspx when such reports are available on the SEC\\'s website. The Company periodically provides certain information for investors on its corporate website, www.apple.com, and its investor relations website, investor.apple.com. This includes press releases and other information about financial performance, information on environmental, social and governance matters, and details related to the Company\\'s annual meeting of shareholders. The information contained on the websites referenced in this Form 10- K is not incorporated by reference into this filing. Further, the Company\\'s references to website URLs are intended to be inactive textual references only.', \"Fiscal PeriodThe Company's fiscal year is the 52- or 53- week period that ends on the last Saturday of September. An additional week is included in the first fiscal quarter every five or six years to realign the Company's fiscal quarters with calendar quarters, which occurred in the first quarter of 2023. The Company's fiscal year 2023 spanned 53 weeks, whereas fiscal years 2022 and 2021 spanned 52 weeks each.\\n\\nFiscal Year Highlights\\n\\nThe Company's total net sales were \\\\ billion during 2023.\\n\\nThe Company's total net sales decreased \\\\(3\\\\%\\\\) or \\\\(\\\\) 11.0\\\\(billion during 2023 compared to 2022. The weakness in foreign currencies relative to the U.S. dollar accounted for more than the entire year - over - year decrease in total net sales, which consisted primarily of lower net sales of Mac and iPhone, partially offset by higher net sales of Services.\"]\n" ] } ], "source": [ "def query_vector_collection(client: weaviate.WeaviateClient, collection_name: str, query: str, k: int) -> list:\n", " \"\"\"\n", " 从集合中查询数据\n", " :param client: Weaviate 客户端\n", " :param collection_name: 集合名称\n", " :param query: 查询的【向量】\n", " :param k: 返回的结果数量\n", " :return: 查询结果列表\n", " \"\"\"\n", " \n", " collection = client.collections.get(collection_name)\n", " response = collection.query.hybrid(\n", " query=query,\n", " vector=embeddings.embed_query(query)\n", " )\n", " documents = [res.properties['text'] for res in response.objects]\n", " return documents\n", "\n", "result = query_vector_collection(client, \"AnnualReports_test3\", query=\"2023 Apple CONSOLIDATED STATEMENTS OF OPERATIONS 表格\", k=5)\n", "print(\"result: \", result)" ] }, { "cell_type": "code", "execution_count": 44, "id": "5bc7f016", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "d:\\yusys\\202507\\ai_learning\\.venv\\Lib\\site-packages\\weaviate\\warnings.py:292: ResourceWarning: Con004: The connection to Weaviate was not closed properly. This can lead to memory leaks.\n", " Please make sure to close the connection using `client.close()`.\n", " warnings.warn(\n", "C:\\Users\\ASUS\\AppData\\Local\\Temp\\ipykernel_27156\\3473560771.py:4: ResourceWarning: unclosed \n", " retriever = vectorstore.as_retriever(\n", "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" ] } ], "source": [ "from langchain.chains import RetrievalQA\n", "from langchain.prompts import PromptTemplate\n", "\n", "retriever = vectorstore.as_retriever(\n", " search_type=\"mmr\",\n", " search_kwargs={\"k\": 7, \"lambda_mult\": 0.85}\n", ")\n", "\n", "prompt_template = \"\"\"\n", "你是一名资深专业财务分析师、股票分析师和金融投资专家。你将看到来自上市公司发布的年度报告中相关的文档片段。\n", "基于以下相关文档片段,请准确回答用户的问题。如果文档中没有足够信息回答问题,请说明缺少相关信息。\n", "请根据文档片段的元数据确定该片段属于哪个公司、什么年份,如果提供的文档片段与问题中指定的公司或年份不相关,请不要使用该片段回答问题。\n", "\n", "相关文档:\n", "{context}\n", "\n", "问题: {question}\n", "\n", "请提供详细和准确的答案,并给出相关的文档片段作为支持依据。请确保答案是基于提供的文档内容,而不是外部知识或假设。:\n", "\"\"\"\n", "\n", "PROMPT = PromptTemplate(\n", " template=prompt_template,\n", " input_variables=[\"context\", \"question\"]\n", ")\n", "\n", "# Create RetrievalQA chain\n", "qa_chain = RetrievalQA.from_chain_type(\n", " llm=llm,\n", " chain_type=\"stuff\",\n", " retriever=retriever,\n", " chain_type_kwargs={\"prompt\": PROMPT},\n", " return_source_documents=True\n", ")\n", "\n", "def ask_question(question):\n", " print(f\"\\n问题: {question}\")\n", " print(\"-\" * 80)\n", " \n", " result = qa_chain({\"query\": question})\n", " \n", " print(f\"答案: {result['result']}\")\n", " print(f\"\\n相关文档片段数量: {len(result['source_documents'])}\")\n", " \n", " # Show source documents\n", " for i, doc in enumerate(result['source_documents'][:3]): # Show top 3 sources\n", " print(f\"\\n来源 {i+1}:\")\n", " print(f\"公司: {doc.metadata.get('company', 'Unknown')}\")\n", " print(f\"文件: {doc.metadata.get('file', 'Unknown')}\")\n", " print(f\"内容片段: {doc.page_content[:200]}...\")\n", " \n", " return result" ] }, { "cell_type": "code", "execution_count": 50, "id": "25dd9d53", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🚀 针对中国移动优化的问答系统已就绪!\n", "使用方法:ask_question_with_company_filter('问题', '中国移动')\n", "例如:ask_question_with_company_filter('中国移动2023年营业收入是多少?', '中国移动')\n", "支持的财务报表类型:合并利润表、合并资产负债表、合并现金流量表、合并股东权益变动表\n" ] } ], "source": [ "# 改进的问答系统 - 带公司过滤功能\n", "from langchain.chains import RetrievalQA\n", "from langchain.prompts import PromptTemplate\n", "\n", "def ask_question_with_company_filter(question, target_company=None, k=8):\n", " \"\"\"\n", " 带公司过滤的问答函数,在检索阶段就过滤公司文档\n", " \n", " Args:\n", " question: 用户问题\n", " target_company: 目标公司名称(可选,如\"Apple\"、\"中国移动\"等)\n", " k: 最终返回的文档数量\n", " \"\"\"\n", " print(f\"\\n问题: {question}\")\n", " if target_company:\n", " print(f\"🎯 目标公司: {target_company}\")\n", " print(\"-\" * 80)\n", " \n", " # 检查问题中是否明确提到表格\n", " def check_table_keywords(question):\n", " \"\"\"检查问题中是否包含表格相关关键词\"\"\"\n", " table_keywords = [\n", " \"表格\", \"表\", \"table\", \n", " \"利润表\", \"损益表\", \"资产负债表\", \"现金流量表\", \"财务报表\",\n", " \"income statement\", \"balance sheet\", \"cash flow\", \"financial statement\",\n", " \"consolidated statements\", \"operations\", \"合并报表\",\n", " \"合并利润表\", \"合并资产负债表\", \"合并现金流量表\", \"合并股东权益变动表\",\n", " \"母公司利润表\", \"母公司资产负债表\", \"母公司现金流量表\"\n", " ]\n", " question_lower = question.lower()\n", " found_keywords = [kw for kw in table_keywords if kw in question_lower]\n", " return len(found_keywords) > 0, found_keywords\n", " \n", " # 定义公司关键词\n", " def get_company_keywords(company):\n", " if not company:\n", " return []\n", " \n", " keywords = [company.lower()]\n", " if \"apple\" in company.lower():\n", " keywords.extend([\"apple\", \"aapl\", \"苹果\"])\n", " elif \"中国移动\" in company:\n", " keywords.extend([\"中国移动\", \"china mobile\", \"600941\", \"中国移动通信\", \"移动通信\", \"中国移动有限公司\", \"china mobile limited\"])\n", " elif \"中国石油\" in company:\n", " keywords.extend([\"中国石油\", \"petro\", \"601857\"])\n", " elif \"腾讯\" in company:\n", " keywords.extend([\"腾讯\", \"tencent\", \"tme\"])\n", " \n", " return keywords\n", " \n", " # 过滤文档的函数\n", " def filter_documents_by_company_and_type(docs, target_company, prefer_tables=False):\n", " if not target_company:\n", " company_filtered = docs\n", " else:\n", " company_keywords = get_company_keywords(target_company)\n", " company_filtered = []\n", " other_company_docs = []\n", " \n", " for doc in docs:\n", " company_name = doc.metadata.get('company', '').lower()\n", " file_name = doc.metadata.get('file', '').lower()\n", " \n", " # 检查是否匹配目标公司\n", " is_match = any(keyword in company_name or keyword in file_name \n", " for keyword in company_keywords)\n", " \n", " if is_match:\n", " company_filtered.append(doc)\n", " else:\n", " other_company_docs.append(doc)\n", " \n", " # 如果问题明确提到表格,优先筛选表格文档\n", " if prefer_tables:\n", " table_docs = []\n", " text_docs = []\n", " \n", " for doc in company_filtered:\n", " element_type = doc.metadata.get('element_type', '').lower()\n", " if 'table' in element_type:\n", " table_docs.append(doc)\n", " else:\n", " text_docs.append(doc)\n", " \n", " print(f\"📋 表格类型筛选:\")\n", " print(f\" - 表格文档: {len(table_docs)} 个\")\n", " print(f\" - 文本文档: {len(text_docs)} 个\")\n", " \n", " # 优先使用表格文档,如果数量不足再补充文本文档\n", " if table_docs:\n", " # 优先返回表格文档,如果不够k个再加入文本文档\n", " filtered_docs = table_docs + text_docs\n", " print(f\" ✅ 优先使用表格文档\")\n", " else:\n", " filtered_docs = company_filtered\n", " print(f\" ⚠️ 未找到表格文档,使用所有文档\")\n", " else:\n", " filtered_docs = company_filtered\n", " \n", " # 打印过滤统计\n", " print(f\"📊 检索过滤统计:\")\n", " print(f\" - 总检索文档: {len(docs)}\")\n", " if target_company:\n", " print(f\" - 匹配 {target_company}: {len(company_filtered)}\")\n", " if 'other_company_docs' in locals():\n", " print(f\" - 过滤掉其他公司: {len(other_company_docs)}\")\n", " print(f\" - 最终文档: {len(filtered_docs)}\")\n", " \n", " return filtered_docs\n", " \n", " # 检查是否需要优先筛选表格\n", " is_table_question, table_keywords = check_table_keywords(question)\n", " if is_table_question:\n", " print(f\"📋 检测到表格相关问题,关键词: {table_keywords}\")\n", " print(f\"✅ 将优先筛选表格类型文档\")\n", " \n", " # 先进行向量检索,检索更多文档以提高命中率\n", " retriever = vectorstore.as_retriever(\n", " search_type=\"mmr\",\n", " search_kwargs={\"k\": max(k * 3, 20), \"lambda_mult\": 0.85} # 检索更多文档\n", " )\n", " \n", " # 获取原始检索结果\n", " docs = retriever.get_relevant_documents(question)\n", " \n", " # 过滤文档(公司 + 表格类型)\n", " filtered_docs = filter_documents_by_company_and_type(docs, target_company, is_table_question)\n", " \n", " # 检查过滤结果\n", " if target_company and len(filtered_docs) == 0:\n", " print(f\"\\n❌ 未找到 {target_company} 的相关文档\")\n", " print(\"建议:\")\n", " print(\"1. 检查公司名称是否正确\")\n", " print(\"2. 尝试使用公司的其他称呼\")\n", " print(\"3. 确认该公司数据是否已被索引\")\n", " return {\n", " 'result': f\"未找到 {target_company} 的相关财务数据。请检查公司名称或确认数据是否已被索引。\",\n", " 'source_documents': []\n", " }\n", " elif target_company and len(filtered_docs) < 3:\n", " print(f\"\\n⚠️ 仅找到 {len(filtered_docs)} 个相关文档,结果可能不够全面\")\n", " \n", " # 限制最终文档数量\n", " final_docs = filtered_docs[:k]\n", " \n", " enhanced_prompt = \"\"\"\n", "你是一名资深专业财务分析师、股票分析师和金融投资专家。你将看到来自上市公司发布的年度报告中相关的文档片段。\n", "\n", "基于以下相关文档片段,请准确回答用户的问题。如果文档中没有足够信息回答问题,请说明缺少相关信息。\n", "\n", "特别注意:\n", "- 对于中国移动等中国公司,重点关注合并利润表、合并资产负债表、合并现金流量表\n", "- 识别中文财务术语如:营业收入、净利润、归属于母公司股东的净利润、基本每股收益等\n", "- 金额单位通常为人民币百万元或千元,请注意单位转换\n", "- 通信行业特有指标如:通信服务收入、网络运营成本、用户数等\n", "\n", "相关文档:\n", "{context}\n", "\n", "问题: {question}\n", "\n", "请提供详细和准确的答案,并给出相关的文档片段作为支持依据。请确保答案是基于提供的文档内容,而不是外部知识或假设。对于财务数据,请明确标注金额单位。\n", "\"\"\"\n", " \n", " ENHANCED_PROMPT = PromptTemplate(\n", " template=enhanced_prompt,\n", " input_variables=[\"context\", \"question\"]\n", " )\n", " \n", " # 手动构建上下文\n", " context = \"\\n\\n\".join([f\"文档 {i+1}:\\n{doc.page_content}\" for i, doc in enumerate(final_docs)])\n", " \n", " # 直接调用LLM\n", " formatted_prompt = ENHANCED_PROMPT.format(context=context, question=question)\n", " response = llm.invoke(formatted_prompt)\n", " \n", " # 构建返回结果\n", " result = {\n", " 'result': response.content,\n", " 'source_documents': final_docs\n", " }\n", " \n", " print(f\"\\n答案: {result['result']}\")\n", " print(f\"\\n📄 最终使用文档数量: {len(final_docs)}\")\n", " \n", " # 显示源文档\n", " for i, doc in enumerate(final_docs[:3]): # Show top 3 sources\n", " print(f\"\\n来源 {i+1}:\")\n", " print(f\" 公司: {doc.metadata.get('company', 'Unknown')}\")\n", " print(f\" 文件: {doc.metadata.get('file', 'Unknown')}\")\n", " print(f\" 类型: {doc.metadata.get('element_type', 'Unknown')}\")\n", " print(f\" 内容: {doc.page_content[:150]}...\")\n", " \n", " return result\n", "\n", "# 测试改进的问答系统 - 专门针对中国移动优化\n", "print(\"🚀 针对中国移动优化的问答系统已就绪!\")\n", "print(\"使用方法:ask_question_with_company_filter('问题', '中国移动')\")\n", "print(\"例如:ask_question_with_company_filter('中国移动2023年营业收入是多少?', '中国移动')\")\n", "print(\"支持的财务报表类型:合并利润表、合并资产负债表、合并现金流量表、合并股东权益变动表\")" ] }, { "cell_type": "code", "execution_count": null, "id": "deab6a1b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "问题: 中国移动公司2022年业绩主要的变化是什么?\n", "--------------------------------------------------------------------------------\n", "答案: 根据提供的文档片段,中国移动有限公司2022年业绩的主要变化可归纳如下:\n", "\n", "### 1. **营业收入**\n", " - **数据**:2022年,中国移动的营业收入达到人民币**9,372.59亿元**。\n", " - **依据**:文档明确提到“2022年,营业收入达到人民币9,372.59亿元。”\n", " - **分析**:这是2022年业绩的核心财务指标,但文档未提供2021年同期数据,因此无法判断具体增长或下降幅度。\n", "\n", "### 2. **客户规模**\n", " - **移动客户**:截至2022年底,移动客户总数达到**9.75亿户**。\n", " - **有线宽带客户**:有线宽带客户总数达到**2.72亿户**。\n", " - **依据**:文档提到“截至2022年12月31日,移动客户总数达到9.75亿户,有线宽带客户总数达到2.72亿户。”\n", " - **分析**:客户数量的庞大基数反映了公司在通信服务领域的市场渗透率,但同样缺乏历史数据对比。\n", "\n", "### 3. **战略转型与业务布局**\n", " - **新型信息基础设施**:公司持续推动5G、算力网络、能力中台等新型信息基础设施建设。\n", " - **新型信息服务体系**:创新构建“连接+算力+能力”服务体系,覆盖生产、生活、治理全场景的数智化需求。\n", " - **依据**:文档提到“系统打造以5G、算力网络、能力中台为重点的新型信息基础设施……创新构建‘连接+算力+能力’新型信息服务体系”。\n", " - **分析**:这一战略调整可能带来长期业绩增长点,但属于业务结构优化,需结合后续财务数据验证效果。\n", "\n", "### 4. **品牌与信用评级**\n", " - **品牌价值**:2022年,“中国移动”品牌位列Millward Brown“BrandZ™全球最具价值品牌100强”第88位。\n", " - **信用评级**:债信评级保持与中国国家主权评级一致,为标普A+/前景稳定和穆迪A1/前景稳定。\n", " - **依据**:文档提到“‘中国移动’品牌在2022年再次荣登Millward Brown的‘BrandZ™全球最具价值品牌100强’第88位”以及“债信评级等同于中国国家主权评级”。\n", " - **分析**:品牌排名和信用评级的稳定反映了公司市场地位和财务稳健性,但属于非财务业绩指标。\n", "\n", "### 5. **公司治理与资本结构**\n", " - **A股上市**:2022年1月5日,中国移动A股在上交所挂牌上市;2022年12月13日,终止美国存托股票的注册及报告义务。\n", " - **股权结构**:中国移动集团公司直接和间接持有约69.82%的股份,其他股东持有30.18%。\n", " - **依据**:文档提到“2022年1月5日,本公司人民币普通股(A股)于上海证券交易所挂牌上市”以及股权结构数据。\n", " - **分析**:A股上市可能增强融资能力,但未直接体现2022年业绩变化。\n", "\n", "---\n", "\n", "### **总结**\n", "文档中明确提及的2022年业绩变化主要集中在**营业收入规模**、**客户基数**、**战略转型**、**品牌与信用评级**等方面。然而,由于缺乏**历史数据对比**(如2021年收入、客户增长数据)和**具体财务指标变化**(如净利润、毛利率等),无法全面量化业绩的增减趋势。若需进一步分析,需补充相关财务数据和行业对比信息。\n", "\n", "相关文档片段数量: 5\n", "\n", "来源 1:\n", "公司: 中国移动(A股)[600941]\n", "文件: 中国移动:2022年度报告.pdf\n", "内容片段: 2022 年,本公司再次被《福布斯》选入其“全球2000 领先企业榜”、被《财富》杂志\n", "选入其“全球500 强”。“中国移动”品牌在2022 年再次荣登Millward Brown 的“BrandZ\n", "™全球最具价值品牌100 强”第 88 位。目前,本公司的债信评级等同于中国国家主权评级,\n", "为标普A+/前景稳定和穆迪A1/前景稳定。 \n", "中国移动始终秉承做“网络强国、数字中国、智慧社会主力军”的目标...\n", "\n", "来源 2:\n", "公司: 中国移动(A股)[600941]\n", "文件: 中国移动:2022年度报告.pdf\n", "内容片段: 2022 年年度报告 \n", "4 \n", "公司简介 \n", "本公司于 1997 年 9 月 3 日在中国香港特别行政区(“香港”)注册,并于1997 年 10\n", "月 22 日和23 日分别在纽约证券交易所有限责任公司(“纽约交易所”)和香港联合交易所\n", "有限公司(“香港联交所”)上市。公司股票在1998 年 1 月 27 日成为香港恒生指数成份\n", "股。2021 年5 月 7 日,纽约交易所向美国证券交易委员会(“美国证交...\n", "\n", "来源 3:\n", "公司: 中国移动(A股)[600941]\n", "文件: 中国移动:2023年度报告.pdf\n", "内容片段: 2023 年年度报告 \n", "4 \n", " \n", " \n", "注1:A 股股东持有公司 4.22%股份中包含中国移动集团公司直接持有公司的 0.20%股份 \n", "注2:除中国移动通信集团财务有限公司(“中移财务公司”)由中国移动通信有限公司(“中移通信”)直接及间接\n", "持股92%、中国移动集团公司持股 8%,以及中国移动通信集团终端有限公司由中移通信持股 99.97%、中国移动集团\n", "公司持股 0.03%外,其他专业子公司均由中...\n", "\n", "====================================================================================================\n", "\n" ] } ], "source": [ "question = \"中国移动公司2022年业绩主要的变化是什么?\"\n", "\n", "ask_question(question)\n", "print(\"\\n\" + \"=\"*100 + \"\\n\")" ] }, { "cell_type": "markdown", "id": "878504d7", "metadata": {}, "source": [ "![](https://rean-blog-bucket.oss-cn-guangzhou.aliyuncs.com/assets/essay/20250801101608.png)" ] }, { "cell_type": "code", "execution_count": 45, "id": "21337a40", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "问题: 与2022年比较,中国移动公司2023年业绩主要的变化是什么?\n", "--------------------------------------------------------------------------------\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\ASUS\\AppData\\Local\\Temp\\ipykernel_27156\\3473560771.py:40: LangChainDeprecationWarning: The method `Chain.__call__` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead.\n", " result = qa_chain({\"query\": question})\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "答案: 根据提供的文档片段,该内容属于苹果公司(Apple Inc.)2024年年度报告(Form 10-K)中的“Management's Discussion and Analysis of Financial Condition and Results of Operations”部分,而非中国移动公司。文档中未提及任何关于“中国移动公司”的财务数据或业绩信息,因此无法回答该问题。\n", "\n", "**关键依据:**\n", "1. **公司身份**:文档中提到的产品(如MacBook Pro、iMac)和财务分析(如毛利率、研发费用)均指向苹果公司,而非中国移动。\n", "2. **地理区域描述**:文档中提到的“Greater China”(大中华区)是苹果公司对中国的市场划分,但并未涉及中国移动公司的具体业绩。\n", "3. **财务数据范围**:文档中的财务数据(如2023年运营费用、各地区净销售额)均属于苹果公司,未包含中国移动公司的任何信息。\n", "\n", "**结论**: \n", "提供的文档片段与“中国移动公司”无关,因此无法基于此回答问题。若需分析中国移动公司的业绩变化,需提供其对应的财务报告数据。\n", "\n", "相关文档片段数量: 7\n", "\n", "来源 1:\n", "公司: Apple Inc.(美股)[AAPL]\n", "文件: aapl-20240928-10-K.pdf\n", "内容片段: Item 7. Management's Discussion and Analysis of Financial Condition and Results of Operations\n", "\n", "The following discussion should be read in conjunction with the consolidated financial statements and acc...\n", "\n", "来源 2:\n", "公司: Apple Inc.(美股)[AAPL]\n", "文件: aapl-20240928-10-K.pdf\n", "内容片段: The Company's future gross margins can be impacted by a variety of factors, as discussed in Part I, item 1A of this Form 10- K under the heading \"Risk Factors.\" As a result, the Company believes, in g...\n", "\n", "来源 3:\n", "公司: Apple Inc.(美股)[AAPL]\n", "文件: aapl-20240928-10-K.pdf\n", "内容片段: Segment Operating Performance\n", "\n", "The following table shows net sales by reportable segment for 2024, 2023 and 2022 (dollars in millions):\n", "\n", "Americas\n", "\n", "Americas net sales increased during 2024 compared to ...\n", "\n", "====================================================================================================\n", "\n" ] } ], "source": [ "question = \"与2022年比较,中国移动公司2023年业绩主要的变化是什么?\"\n", "\n", "ask_question(question)\n", "print(\"\\n\" + \"=\"*100 + \"\\n\")" ] }, { "cell_type": "markdown", "id": "5dd6ec42", "metadata": {}, "source": [ "![](https://rean-blog-bucket.oss-cn-guangzhou.aliyuncs.com/assets/essay/20250801102549.png)" ] }, { "cell_type": "code", "execution_count": 19, "id": "55280d18", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "问题: 从利润表上看,中国石油2023年的总收入、净利润分別是多少?\n", "--------------------------------------------------------------------------------\n", "答案: 根据中国石油天然气股份有限公司2023年度财务报表附注中的**合并综合收益表**,2023年的总收入和净利润如下:\n", "\n", "### **总收入(营业收入)** \n", "**3,011,012 百万元** \n", "**依据文档片段**: \n", "在“合并综合收益表”中,营业收入(“营业收入”行)明确列示为 **3,011,012 人民币百万元**(2023年数据)。 \n", "> 文档原文: \n", "> **营业收入 6 3,011,012 3,239,167**\n", "\n", "---\n", "\n", "### **净利润** \n", "**180,293 百万元** \n", "**依据文档片段**: \n", "在“合并综合收益表”中,净利润(“净利润”行)明确列示为 **180,293 人民币百万元**(2023年数据)。 \n", "> 文档原文: \n", "> **净利润 180,293 163,348**\n", "\n", "---\n", "\n", "### **补充说明** \n", "1. **总收入的构成**: \n", " 根据附注中的“营业收入”分类,总收入由油气和新能源、炼油化工和新材料、销售、天然气销售等业务板块构成,合计为 **3,011,012 百万元**(见文档中“营业收入”部分的分项数据及合计数)。 \n", "\n", "2. **净利润的计算逻辑**: \n", " 净利润是通过税前利润(237,462 百万元)减去所得税费用(57,169 百万元)得出,最终结果为 **180,293 百万元**。 \n", "\n", "以上数据均直接来源于提供的财务报表附注,未涉及外部假设或计算。\n", "\n", "相关文档片段数量: 5\n", "\n", "来源 1:\n", "公司: 中国石油(A股)[601857]\n", "文件: 中国石油:2023年度报告.pdf\n", "内容片段: 中国石油天然气股份有限公司 \n", "2023 年度财务报表附注 \n", "(除特别注明外,金额单位为人民币百万元) \n", " \n", "- 116 - \n", "1 公司简介 \n", " \n", "中国石油天然气股份有限公司 (“本公司”)是由中国石油天然气集团公司根据\n", "中华人民共和国(“中国”)原国家经济贸易委员会 《关于同意设立中国石油天然气股\n", "份有限公司的复函》(国经贸企改[1999]1024 号),将核心业务及与这些业务相关的资\n", "产和负债进...\n", "\n", "来源 2:\n", "公司: 中国石油(A股)[601857]\n", "文件: 中国石油:2023年度报告.pdf\n", "内容片段: 中国石油天然气股份有限公司 \n", "合并综合收益表 \n", "截至 2023 年 12 月 31 日止年度 \n", "(除特别注明外,金额单位为人民币百万元) \n", "- 213 - \n", " \n", " 附注 2023 年 2022 年 \n", " 人民币 人民币 \n", " \n", "营业收入 6 3,011,012 3,239,167 \n", " \n", "经营支出 \n", "采购、服务及其他 ...\n", "\n", "来源 3:\n", "公司: 中国石油(A股)[601857]\n", "文件: 中国石油:2023年度报告.pdf\n", "内容片段: 中国石油天然气股份有限公司 \n", "2023 年度财务报表附注 \n", "(除特别注明外,金额单位为人民币百万元) \n", " \n", "- 137 - \n", "4 主要会计政策和会计估计(续) \n", " \n", "(22) 收入确认 \n", " \n", "收入是本集团日常活动中形成的,会导致股东权益增加且与股东投入资本无关\n", "的经济利益的总流入。 \n", " \n", "本集团在履行了合同中的履约义务,即在客户取得相关商品或服务的控制权时,\n", "确认收入。 \n", " \n", "合同中包含两项或多项...\n", "答案: 根据中国石油天然气股份有限公司2023年度财务报表附注中的**合并综合收益表**,2023年的总收入和净利润如下:\n", "\n", "### **总收入(营业收入)** \n", "**3,011,012 百万元** \n", "**依据文档片段**: \n", "在“合并综合收益表”中,营业收入(“营业收入”行)明确列示为 **3,011,012 人民币百万元**(2023年数据)。 \n", "> 文档原文: \n", "> **营业收入 6 3,011,012 3,239,167**\n", "\n", "---\n", "\n", "### **净利润** \n", "**180,293 百万元** \n", "**依据文档片段**: \n", "在“合并综合收益表”中,净利润(“净利润”行)明确列示为 **180,293 人民币百万元**(2023年数据)。 \n", "> 文档原文: \n", "> **净利润 180,293 163,348**\n", "\n", "---\n", "\n", "### **补充说明** \n", "1. **总收入的构成**: \n", " 根据附注中的“营业收入”分类,总收入由油气和新能源、炼油化工和新材料、销售、天然气销售等业务板块构成,合计为 **3,011,012 百万元**(见文档中“营业收入”部分的分项数据及合计数)。 \n", "\n", "2. **净利润的计算逻辑**: \n", " 净利润是通过税前利润(237,462 百万元)减去所得税费用(57,169 百万元)得出,最终结果为 **180,293 百万元**。 \n", "\n", "以上数据均直接来源于提供的财务报表附注,未涉及外部假设或计算。\n", "\n", "相关文档片段数量: 5\n", "\n", "来源 1:\n", "公司: 中国石油(A股)[601857]\n", "文件: 中国石油:2023年度报告.pdf\n", "内容片段: 中国石油天然气股份有限公司 \n", "2023 年度财务报表附注 \n", "(除特别注明外,金额单位为人民币百万元) \n", " \n", "- 116 - \n", "1 公司简介 \n", " \n", "中国石油天然气股份有限公司 (“本公司”)是由中国石油天然气集团公司根据\n", "中华人民共和国(“中国”)原国家经济贸易委员会 《关于同意设立中国石油天然气股\n", "份有限公司的复函》(国经贸企改[1999]1024 号),将核心业务及与这些业务相关的资\n", "产和负债进...\n", "\n", "来源 2:\n", "公司: 中国石油(A股)[601857]\n", "文件: 中国石油:2023年度报告.pdf\n", "内容片段: 中国石油天然气股份有限公司 \n", "合并综合收益表 \n", "截至 2023 年 12 月 31 日止年度 \n", "(除特别注明外,金额单位为人民币百万元) \n", "- 213 - \n", " \n", " 附注 2023 年 2022 年 \n", " 人民币 人民币 \n", " \n", "营业收入 6 3,011,012 3,239,167 \n", " \n", "经营支出 \n", "采购、服务及其他 ...\n", "\n", "来源 3:\n", "公司: 中国石油(A股)[601857]\n", "文件: 中国石油:2023年度报告.pdf\n", "内容片段: 中国石油天然气股份有限公司 \n", "2023 年度财务报表附注 \n", "(除特别注明外,金额单位为人民币百万元) \n", " \n", "- 137 - \n", "4 主要会计政策和会计估计(续) \n", " \n", "(22) 收入确认 \n", " \n", "收入是本集团日常活动中形成的,会导致股东权益增加且与股东投入资本无关\n", "的经济利益的总流入。 \n", " \n", "本集团在履行了合同中的履约义务,即在客户取得相关商品或服务的控制权时,\n", "确认收入。 \n", " \n", "合同中包含两项或多项...\n" ] }, { "data": { "text/plain": [ "{'query': '从利润表上看,中国石油2023年的总收入、净利润分別是多少?',\n", " 'result': '根据中国石油天然气股份有限公司2023年度财务报表附注中的**合并综合收益表**,2023年的总收入和净利润如下:\\n\\n### **总收入(营业收入)** \\n**3,011,012 百万元** \\n**依据文档片段**: \\n在“合并综合收益表”中,营业收入(“营业收入”行)明确列示为 **3,011,012 人民币百万元**(2023年数据)。 \\n> 文档原文: \\n> **营业收入 6 3,011,012 3,239,167**\\n\\n---\\n\\n### **净利润** \\n**180,293 百万元** \\n**依据文档片段**: \\n在“合并综合收益表”中,净利润(“净利润”行)明确列示为 **180,293 人民币百万元**(2023年数据)。 \\n> 文档原文: \\n> **净利润 180,293 163,348**\\n\\n---\\n\\n### **补充说明** \\n1. **总收入的构成**: \\n 根据附注中的“营业收入”分类,总收入由油气和新能源、炼油化工和新材料、销售、天然气销售等业务板块构成,合计为 **3,011,012 百万元**(见文档中“营业收入”部分的分项数据及合计数)。 \\n\\n2. **净利润的计算逻辑**: \\n 净利润是通过税前利润(237,462 百万元)减去所得税费用(57,169 百万元)得出,最终结果为 **180,293 百万元**。 \\n\\n以上数据均直接来源于提供的财务报表附注,未涉及外部假设或计算。',\n", " 'source_documents': [Document(metadata={'moddate': datetime.datetime(2024, 3, 25, 15, 40, 25, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'icv': None, 'creationdate': datetime.datetime(2024, 3, 25, 9, 32, 11, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'ksoproductbuildver': None, 'title': '中国石油天然气股份有限公司', 'page_label': '118', 'page': 117.0, 'text': None, 'creator': 'Microsoft® Word 2016', 'company': '中国石油(A股)[601857]', 'total_pages': 293.0, 'contenttypeid': None, 'file': '中国石油:2023年度报告.pdf', 'source': 'data/年报\\\\中国石油(A股)[601857]\\\\中国石油:2023年度报告.pdf', 'trapped': None, 'author': 'Lenovo User', 'producer': 'Microsoft® Word 2016', 'sourcemodified': None, 'ksotemplatedocersaverecord': None}, page_content='中国石油天然气股份有限公司 \\n2023 年度财务报表附注 \\n(除特别注明外,金额单位为人民币百万元) \\n \\n- 116 - \\n1 公司简介 \\n \\n中国石油天然气股份有限公司 (“本公司”)是由中国石油天然气集团公司根据\\n中华人民共和国(“中国”)原国家经济贸易委员会 《关于同意设立中国石油天然气股\\n份有限公司的复函》(国经贸企改[1999]1024 号),将核心业务及与这些业务相关的资\\n产和负债进行重组,并由中国石油天然气集团公司作为独家发起人,以发起方式于\\n1999 年 11 月 5 日注册成立的股份有限公司。2017 年 12 月 19 日,中国石油天然气\\n集团公司名称变更为中国石油天然气集团有限公司 (变更前后均简称“中国石油集\\n团”)。中国石油集团为一家在中国注册成立的国有独资公司。本公司及其子公司统\\n称为“本集团”。 \\n \\n本集团主要业务包括:(i) 原油及天然气的勘探、开发、生产、输送和销售以及\\n新能源业务;(ii) 原油及石油产品的炼制,基本及衍生化工产品、其他化工产品的生\\n产和销售以及新材料业务;(iii) 炼油产品和非油品的销售及贸易业务; 及(iv) 天然气\\n的输送及销售业务。本集团主要子公司的情况详见附注 6(1)。 \\n \\n本财务报表由本公司董事会于 2024 年 3 月 25 日批准报出。 \\n \\n2 编制基础 \\n \\n本财务报表按照中国财政部 (“财政部”)颁布的企业会计准则及其他相关规定\\n(“企业会计准则”)的要求编制。本集团以持续经营为基础编制财务报表。 \\n \\n此外,本集团的财务报表同时符合中国证券监督管理委员会(“证监会”)修订的\\n《公开发行证券的公司信息披露编报规则第 15 号—财务报告的一般规定》有关财务\\n报表及附注的披露要求。 \\n \\n3 遵循企业会计准则的声明 \\n \\n本公司2023 年度财务报表符合企业会计准则的要求,真实、准确、完整地反映\\n了本集团 2023 年 12 月 31 日的合并及公司财务状况以及 2023 年度的合并及公司经\\n营成果和现金流量等有关信息。'),\n", " Document(metadata={'moddate': datetime.datetime(2024, 3, 25, 15, 40, 25, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'icv': None, 'creationdate': datetime.datetime(2024, 3, 25, 9, 32, 11, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'ksoproductbuildver': None, 'title': '中国石油天然气股份有限公司', 'page_label': '215', 'page': 214.0, 'text': None, 'creator': 'Microsoft® Word 2016', 'company': '中国石油(A股)[601857]', 'total_pages': 293.0, 'contenttypeid': None, 'file': '中国石油:2023年度报告.pdf', 'source': 'data/年报\\\\中国石油(A股)[601857]\\\\中国石油:2023年度报告.pdf', 'sourcemodified': None, 'author': 'Lenovo User', 'trapped': None, 'producer': 'Microsoft® Word 2016', 'ksotemplatedocersaverecord': None}, page_content='中国石油天然气股份有限公司 \\n合并综合收益表 \\n截至 2023 年 12 月 31 日止年度 \\n(除特别注明外,金额单位为人民币百万元) \\n- 213 - \\n \\n 附注 2023 年 2022 年 \\n 人民币 人民币 \\n \\n营业收入 6 3,011,012 3,239,167 \\n \\n经营支出 \\n采购、服务及其他 (1,972,940) (2,213,080) \\n员工费用 8 (172,745) (163,073) \\n勘探费用(包括干井费用) (20,764) (27,074) \\n折旧、折耗及摊销 (247,452) (238,036) \\n销售、一般性和管理费用 (64,074) (59,529) \\n除所得税外的其他税赋 9 (296,226) (278,055) \\n其他费用净值 (1,345) (43,660) \\n经营支出总额 (2,775,546) (3,022,507) \\n经营利润 235,466 216,660 \\n \\n融资成本 \\n汇兑收益 20,162 23,772 \\n汇兑损失 (20,906) (25,590) \\n利息收入 8,265 4,738 \\n利息支出 10 (24,063) (21,554) \\n融资成本净额 (16,542) (18,634) \\n应占联营公司及合营公司的利润 18,538 15,251 \\n税前利润 7 237,462 213,277 \\n所得税费用 12 (57,169) (49,929) \\n净利润 180,293 163,348 \\n \\n其他综合收益 \\n(一)不能重分类进损益的其他综合收益 \\n以公允价值计量且其变动计入其他综合收益的权\\n益投资公允价值变动 64 (116) \\n外币财务报表折算差额 1,515 6,201 \\n(二)可重分类至损益的其他综合收益 \\n现金流量套期储备 (1,893) 11,273 \\n按照权益法核算的在被投资单位其他综合收益中\\n所享有的份额 76 654'),\n", " Document(metadata={'moddate': datetime.datetime(2024, 3, 25, 15, 40, 25, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'icv': None, 'creationdate': datetime.datetime(2024, 3, 25, 9, 32, 11, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'ksoproductbuildver': None, 'title': '中国石油天然气股份有限公司', 'page_label': '139', 'page': 138.0, 'text': None, 'creator': 'Microsoft® Word 2016', 'company': '中国石油(A股)[601857]', 'total_pages': 293.0, 'producer': 'Microsoft® Word 2016', 'file': '中国石油:2023年度报告.pdf', 'source': 'data/年报\\\\中国石油(A股)[601857]\\\\中国石油:2023年度报告.pdf', 'trapped': None, 'author': 'Lenovo User', 'contenttypeid': None, 'sourcemodified': None, 'ksotemplatedocersaverecord': None}, page_content='中国石油天然气股份有限公司 \\n2023 年度财务报表附注 \\n(除特别注明外,金额单位为人民币百万元) \\n \\n- 137 - \\n4 主要会计政策和会计估计(续) \\n \\n(22) 收入确认 \\n \\n收入是本集团日常活动中形成的,会导致股东权益增加且与股东投入资本无关\\n的经济利益的总流入。 \\n \\n本集团在履行了合同中的履约义务,即在客户取得相关商品或服务的控制权时,\\n确认收入。 \\n \\n合同中包含两项或多项履约义务的, 本集团在合同开始日, 按照各单项履约义务\\n所承诺商品或服务的单独售价的相对比例, 将交易价格分摊至各单项履约义务, 按照\\n分摊至各单项履约义务的交易价格计量收入。单独售价, 是指本集团向客户单独销售\\n商品或提供服务的价格。 单独售价无法直接观察的, 本集团综合考虑能够合理取得的\\n全部相关信息,并最大限度地采用可观察的输入值估计单独售价。 \\n \\n附有客户额外购买选择权(例如客户奖励积分)的合同, 本集团评估该选择权是否\\n向客户提供了一项重大权利。提供重大权利的,本集团将其作为单项履约义务,在客\\n户未来行使购买选择权取得相关商品或服务的控制权时, 或者该选择权失效时, 确认\\n相应的收入。 客户额外购买选择权的单独售价无法直接观察的, 本集团综合考虑客户\\n行使和不行使该选择权所能获得的折扣的差异、客户行使该选择权的可能性等全部\\n相关信息后予以估计。 \\n \\n交易价格是本集团因向客户转让商品或服务而预期有权收取的对价金额,不包\\n括代第三方收取的款项。本集团确认的交易价格不超过在相关不确定性消除时累计\\n已确认收入极可能不会发生重大转回的金额。有权收取的对价是非现金形式时, 本集\\n团按照非现金对价的公允价值确定交易价格。非现金对价的公允价值不能合理估计\\n的, 本集团参照承诺向客户转让商品或提供服务的单独售价间接确定交易价格。预期\\n将退还给客户的款项作为退货负债,不计入交易价格。合同中存在重大融资成分的,\\n本集团按照假定客户在取得商品或服务控制权时即以现金支付的应付金额确定交易\\n价格。该交易价格与合同对价之间的差额,在合同期间内采用实际利率法摊销。于合\\n同开始日,本集团预计客户取得商品或服务控制权与客户支付价款间隔不超过一年\\n的,不考虑合同中存在的重大融资成分。'),\n", " Document(metadata={'moddate': datetime.datetime(2024, 3, 25, 15, 40, 25, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'icv': None, 'creationdate': datetime.datetime(2024, 3, 25, 9, 32, 11, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'ksoproductbuildver': None, 'title': '中国石油天然气股份有限公司', 'page_label': '248', 'page': 247.0, 'text': None, 'creator': 'Microsoft® Word 2016', 'company': '中国石油(A股)[601857]', 'total_pages': 293.0, 'producer': 'Microsoft® Word 2016', 'file': '中国石油:2023年度报告.pdf', 'source': 'data/年报\\\\中国石油(A股)[601857]\\\\中国石油:2023年度报告.pdf', 'sourcemodified': None, 'author': 'Lenovo User', 'contenttypeid': None, 'trapped': None, 'ksotemplatedocersaverecord': None}, page_content='中国石油天然气股份有限公司 \\n合并财务报表附注 \\n(除特别注明外,金额单位为人民币百万元) \\n \\n- 246 - \\n5 重要会计估计和会计判断(续) \\n \\n(c) 对资产弃置义务的估计(续) \\n \\n根据内外部环境变化, 依据会计准则和本集团弃置费用管理办法等有关规定, 油\\n气田企业基于最新的参数对油气资产弃置义务进行重新测算,以更加客观反映 本集\\n团油气资产弃置义务的实际情况。 \\n \\n6 营业收入 \\n \\n营业收入是指销售原油、天然气、炼油及化工产品、非油产品等,以及输送原油\\n和天然气所得的收入。合同收入主要于某一时点确认。2023 年度及 2022 年度收入信\\n息如下: \\n \\n2023 年收入分类 油气和新能源 \\n炼油化工和\\n新材料 销售 天然气销售 总部及其他 合计 \\n 商品和服务类型 \\n原油 613,779 - 742,113 - - 1,355,892 \\n天然气 153,562 - 394,608 526,269 - 1,074,439 \\n炼油产品 - 980,396 1,299,647 - - 2,280,043 \\n化工产品 - 233,523 55,942 - - 289,465 \\n管输业务 - - - 1,119 - 1,119 \\n加油站非油品销售 - - 32,265 - - 32,265 \\n其他 124,690 7,063 1,415 33,690 7,014 173,872 \\n分部间抵销数 (747,603) (884,978) (534,421) (27,249) (3,522) (2,197,773) \\n合同收入 144,428 336,004 1,991,569 533,829 3,492 3,009,322 \\n其他收入 304 179 1,069 113 25 1,690 \\n合计 144,732 336,183 1,992,638 533,942 3,517 3,011,012'),\n", " Document(metadata={'moddate': datetime.datetime(2025, 3, 28, 18, 43, 30, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'ksotemplatedocersaverecord': None, 'creationdate': datetime.datetime(2025, 3, 28, 18, 39, 25, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800))), 'ksoproductbuildver': None, 'title': '中国石油天然气股份有限公司', 'page_label': '236', 'page': 235.0, 'text': None, 'creator': 'Microsoft® Word 2016', 'company': '中国石油(A股)[601857]', 'total_pages': 280.0, 'producer': 'Microsoft® Word 2016', 'file': '中国石油:2024年度报告.pdf', 'source': 'data/年报\\\\中国石油(A股)[601857]\\\\中国石油:2024年度报告.pdf', 'sourcemodified': None, 'author': 'Lenovo User', 'contenttypeid': None, 'trapped': None, 'icv': None}, page_content='中国石油天然气股份有限公司 \\n合并财务报表附注 \\n(除特别注明外,金额单位为人民币百万元) \\n \\n- 234 - \\n7 税前利润 \\n \\n 2024 年 2023 年 \\n 人民币 人民币 \\n \\n \\n税前利润已计入及扣除下列各项: \\n \\n计入: \\n来自以公允价值计量且其变动计入其他综合收益\\n的权益投资的股息收入 \\n 30 18 \\n计减坏账准备及信用减值损失 121 432 \\n计减存货跌价损失 313 59 \\n处置附属公司收益 865 102 \\n现金流量套期的无效部分的已实现收益(i) 939 1,226 \\n \\n扣除: \\n无形资产及其他资产的摊销 3,823 4,923 \\n折旧和减值损失: \\n物业、厂房及机器设备 222,767 227,783 \\n使用权资产 16,619 15,050 \\n核数师酬金(ii) 30 46 \\n作为费用确认的存货成本 2,229,378 2,289,586 \\n坏账准备及信用减值损失 864 475 \\n处置物业、厂房及机器设备的损失(i) 9,961 11,591 \\n未纳入租赁负债计量的可变租赁付款额、低价值\\n和短期租赁付款额 \\n 2,559 2,140 \\n研究与开发费用 23,014 21,967 \\n存货跌价损失 2,680 6,470 \\n处置衍生金融工具产生的投资损失(i) 9,764 11,019 \\n其他非流动资产减值损失 42 259 \\n \\n \\n(i) 其他收入/(费用)净值主要为:现金流量套期的无效部分的已实现收益,处置物业、厂房及机器设备的损失,处置\\n衍生金融工具产生的投资损失,政府补助,以及进口天然气增值税返还。 \\n(ii) 上述核数师酬金系由本公司支付的年度审计服务费,并不包括由本公司的附属公司支付给本公司现任审计师及\\n其网络成员所的主要与审计、除审计外的其他鉴证服务、税务及其他服务费,分别为人民币 32 百万元、人民币\\n1 百万元、人民币3 百万元及人民币3 百万元(2023 年:人民币 31 百万元、人民币2 百万元、人民币1 百万元、\\n人民币1 百万元)。 \\n \\n8 员工费用 \\n \\n 2024 年 2023 年 \\n 人民币 人民币')]}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "question = \"从利润表上看,中国石油2023年的总收入、净利润分別是多少?\"\n", "\n", "ask_question(question)" ] }, { "cell_type": "markdown", "id": "aa90b565", "metadata": {}, "source": [ "问题: 从利润表上看,中国石油2023年的总收入、净利润分別是多少?\n", "--------------------------------------------------------------------------------\n", "答案: 根据中国石油天然气股份有限公司2023年度财务报表附注中的**合并综合收益表**,2023年的总收入和净利润如下:\n", "\n", "### **总收入(营业收入)** \n", "**3,011,012 百万元** \n", "**依据文档片段**: \n", "在“合并综合收益表”中,营业收入(“营业收入”行)明确列示为 **3,011,012 人民币百万元**(2023年数据)。 \n", "> 文档原文: \n", "> **营业收入 6 3,011,012 3,239,167**\n", "\n", "---\n", "\n", "### **净利润** \n", "**180,293 百万元** \n", "**依据文档片段**: \n", "在“合并综合收益表”中,净利润(“净利润”行)明确列示为 **180,293 人民币百万元**(2023年数据)。 \n", "> 文档原文: \n", "> **净利润 180,293 163,348**\n", "\n", "---\n", "\n", "![](https://rean-blog-bucket.oss-cn-guangzhou.aliyuncs.com/assets/essay/20250801105923.png)\n", "\n", "### **补充说明** \n", "1. **总收入的构成**: \n", " 根据附注中的“营业收入”分类,总收入由油气和新能源、炼油化工和新材料、销售、天然气销售等业务板块构成,合计为 **3,011,012 百万元**(见文档中“营业收入”部分的分项数据及合计数)。 \n", "\n", "2. **净利润的计算逻辑**: \n", " 净利润是通过税前利润(237,462 百万元)减去所得税费用(57,169 百万元)得出,最终结果为 **180,293 百万元**。 \n", "\n", "以上数据均直接来源于提供的财务报表附注,未涉及外部假设或计算。\n", "\n", "相关文档片段数量: 5\n", "\n", "来源 1:\n", "\n", "公司: 中国石油(A股)[601857]\n", "\n", "文件: 中国石油:2023年度报告.pdf\n", "\n", "内容片段: 中国石油天然气股份有限公司 \n", "2023 年度财务报表附注 \n", "(除特别注明外,金额单位为人民币百万元) \n", " \n", "- 116 - \n", "1 公司简介 \n", " \n", "中国石油天然气股份有限公司 (“本公司”)是由中国石油天然气集团公司根据\n", "中华人民共和国(“中国”)原国家经济贸易委员会 《关于同意设立中国石油天然气股\n", "份有限公司的复函》(国经贸企改[1999]1024 号),将核心业务及与这些业务相关的资\n", "产和负债进...\n", "\n", "来源 2:\n", "\n", "公司: 中国石油(A股)[601857]\n", "\n", "文件: 中国石油:2023年度报告.pdf\n", "\n", "内容片段: 中国石油天然气股份有限公司 \n", "合并综合收益表 \n", "截至 2023 年 12 月 31 日止年度 \n", "(除特别注明外,金额单位为人民币百万元) \n", "- 213 - \n", " \n", " 附注 2023 年 2022 年 \n", " 人民币 人民币 \n", " \n", "营业收入 6 3,011,012 3,239,167 \n", " \n", "经营支出 \n", "采购、服务及其他 ...\n", "\n", "来源 3:\n", "\n", "公司: 中国石油(A股)[601857]\n", "\n", "文件: 中国石油:2023年度报告.pdf\n", "\n", "内容片段: 中国石油天然气股份有限公司 \n", "2023 年度财务报表附注 \n", "(除特别注明外,金额单位为人民币百万元) \n", " \n", "- 137 - \n", "4 主要会计政策和会计估计(续) \n", " \n", "(22) 收入确认 \n", " \n", "收入是本集团日常活动中形成的,会导致股东权益增加且与股东投入资本无关\n", "的经济利益的总流入。 \n", " \n", "本集团在履行了合同中的履约义务,即在客户取得相关商品或服务的控制权时,\n", "确认收入。 \n", " \n", "合同中包含两项或多项..." ] }, { "cell_type": "code", "execution_count": 32, "id": "fbc799ff", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "问题: 与2022年比较,Apple公司2023年业绩主要的变化是什么?\n", "--------------------------------------------------------------------------------\n", "答案: 根据提供的文档片段,Apple公司2023年与2022年业绩的主要变化如下:\n", "\n", "### 1. **核心产品线销售额变化**\n", " - **iPhone**: \n", " 2023年iPhone净销售额较2022年**下降2%(约49亿美元)**,主要受非Pro型号销量下滑影响,但部分被Pro型号的销量增长抵消。 \n", " **依据**:文档明确提到“iPhone net sales decreased 2% or $4.9 billion during 2023 compared to 2022 due to lower net sales of non-Pro iPhone models, partially offset by higher net sales of Pro iPhone models.”\n", "\n", " - **Mac**: \n", " 2023年Mac净销售额较2022年**下降108亿美元**,主要由于笔记本电脑销量减少。 \n", " **依据**:文档指出“Mac net sales decreased $10.8 billion during 2023 compared to 2022 due primarily to lower net sales of laptops.”\n", "\n", " - **iPad**: \n", " 2023年iPad净销售额较2022年**下降10亿美元**,主要因iPad mini和iPad Air销量下滑,部分被iPad 9代和10代的销量抵消。 \n", " **依据**:文档提到“iPad net sales decreased $1.0 billion during 2023 compared to 2022 due primarily to lower net sales of iPad mini and iPad Air, partially offset by the combined net sales of iPad 9th and 10th generation.”\n", "\n", "---\n", "\n", "### 2. **地区销售表现(2022年数据)**\n", " - 文档中提到的地区销售数据主要针对**2022年**,例如: \n", " - **美洲**:2022年净销售额因iPhone、服务和Mac销量增长而上升。 \n", " - **欧洲**:2022年净销售额因iPhone和服务增长而上升,但汇率波动产生负面影响。 \n", " - **大中华区**:2022年净销售额因iPhone和服务增长而上升,人民币升值带来正面影响。 \n", " - **日本**:2022年净销售额因日元贬值而下降。 \n", " **注意**:这些数据属于**2022年**,未直接涉及2023年地区销售变化。\n", "\n", "---\n", "\n", "### 3. **总销售额的间接推断**\n", " - 文档未直接提供2023年总净销售额的绝对值或同比变化,但通过各产品线的变动可推测: \n", " - iPhone、Mac、iPad的销售额均出现下滑,可能对整体业绩产生压力。 \n", " - 服务业务(Services)的销售额变化未在文档中明确提及,但可能受产品销售影响。\n", "\n", "---\n", "\n", "### 4. **其他关键信息**\n", " - **2023年产品发布**: \n", " 文档提到2023年Apple推出了新款MacBook Pro、iMac等产品,但未说明这些新品对销售额的具体贡献。 \n", " - **汇率影响**: \n", " 文档提到2022年汇率波动对销售产生影响,但未明确2023年汇率对业绩的影响。\n", "\n", "---\n", "\n", "### 结论\n", "Apple公司2023年业绩的主要变化体现在**核心产品线(iPhone、Mac、iPad)的销售额同比下降**,尤其是Mac和iPad的显著下滑。然而,文档未提供2023年总销售额的直接对比数据,也未详细说明服务业务的表现或汇率对2023年业绩的具体影响。若需更全面的分析,需补充2023年总销售额数据及服务业务详情。\n", "\n", "相关文档片段数量: 5\n", "\n", "来源 1:\n", "公司: 中国移动(A股)[600941]\n", "文件: 中国移动:2023年度报告.pdf\n", "内容片段: 表格列标题: 0 | 1\n", "0: 2023年度 | 1: 2022年度\n", "0: 基本每股收益 | 1: 基本每股收益\n", "0: 2023年度 | 1: 稀释每股收益\n", "0: 2023年度 | 1: 2022年度...\n", "\n", "来源 2:\n", "公司: Apple Inc.(美股)[AAPL]\n", "文件: aapl-20220924-10-K.pdf\n", "内容片段: Total net sales increased or \\(\\) 28.5$ billion during 2022 compared to 2021, driven primarily by higher net sales of iPhone, Services and Mac. The weakness in foreign currencies relative to the U.S...\n", "\n", "来源 3:\n", "公司: Apple Inc.(美股)[AAPL]\n", "文件: aapl-20230930-10-K.pdf\n", "内容片段: Products and Services Performance\n", "\n", "The following table shows net sales by category for 2023, 2022 and 2021 (dollars in millions):\n", "\n", "(1) Products net sales include amortization of the deferred value of ...\n" ] }, { "data": { "text/plain": [ "{'query': '与2022年比较,Apple公司2023年业绩主要的变化是什么?',\n", " 'result': '根据提供的文档片段,Apple公司2023年与2022年业绩的主要变化如下:\\n\\n### 1. **核心产品线销售额变化**\\n - **iPhone**: \\n 2023年iPhone净销售额较2022年**下降2%(约49亿美元)**,主要受非Pro型号销量下滑影响,但部分被Pro型号的销量增长抵消。 \\n **依据**:文档明确提到“iPhone net sales decreased 2% or $4.9 billion during 2023 compared to 2022 due to lower net sales of non-Pro iPhone models, partially offset by higher net sales of Pro iPhone models.”\\n\\n - **Mac**: \\n 2023年Mac净销售额较2022年**下降108亿美元**,主要由于笔记本电脑销量减少。 \\n **依据**:文档指出“Mac net sales decreased $10.8 billion during 2023 compared to 2022 due primarily to lower net sales of laptops.”\\n\\n - **iPad**: \\n 2023年iPad净销售额较2022年**下降10亿美元**,主要因iPad mini和iPad Air销量下滑,部分被iPad 9代和10代的销量抵消。 \\n **依据**:文档提到“iPad net sales decreased $1.0 billion during 2023 compared to 2022 due primarily to lower net sales of iPad mini and iPad Air, partially offset by the combined net sales of iPad 9th and 10th generation.”\\n\\n---\\n\\n### 2. **地区销售表现(2022年数据)**\\n - 文档中提到的地区销售数据主要针对**2022年**,例如: \\n - **美洲**:2022年净销售额因iPhone、服务和Mac销量增长而上升。 \\n - **欧洲**:2022年净销售额因iPhone和服务增长而上升,但汇率波动产生负面影响。 \\n - **大中华区**:2022年净销售额因iPhone和服务增长而上升,人民币升值带来正面影响。 \\n - **日本**:2022年净销售额因日元贬值而下降。 \\n **注意**:这些数据属于**2022年**,未直接涉及2023年地区销售变化。\\n\\n---\\n\\n### 3. **总销售额的间接推断**\\n - 文档未直接提供2023年总净销售额的绝对值或同比变化,但通过各产品线的变动可推测: \\n - iPhone、Mac、iPad的销售额均出现下滑,可能对整体业绩产生压力。 \\n - 服务业务(Services)的销售额变化未在文档中明确提及,但可能受产品销售影响。\\n\\n---\\n\\n### 4. **其他关键信息**\\n - **2023年产品发布**: \\n 文档提到2023年Apple推出了新款MacBook Pro、iMac等产品,但未说明这些新品对销售额的具体贡献。 \\n - **汇率影响**: \\n 文档提到2022年汇率波动对销售产生影响,但未明确2023年汇率对业绩的影响。\\n\\n---\\n\\n### 结论\\nApple公司2023年业绩的主要变化体现在**核心产品线(iPhone、Mac、iPad)的销售额同比下降**,尤其是Mac和iPad的显著下滑。然而,文档未提供2023年总销售额的直接对比数据,也未详细说明服务业务的表现或汇率对2023年业绩的具体影响。若需更全面的分析,需补充2023年总销售额数据及服务业务详情。',\n", " 'source_documents': [Document(metadata={'file': '中国移动:2023年度报告.pdf', 'company': '中国移动(A股)[600941]', 'page_number': 217.0, 'element_type': 'table', 'block_index': 8.0}, page_content='表格列标题: 0 | 1\\n0: 2023年度 | 1: 2022年度\\n0: 基本每股收益 | 1: 基本每股收益\\n0: 2023年度 | 1: 稀释每股收益\\n0: 2023年度 | 1: 2022年度'),\n", " Document(metadata={'file': 'aapl-20220924-10-K.pdf', 'block_index': None, 'company': 'Apple Inc.(美股)[AAPL]', 'page_number': 24.0, 'element_type': 'text'}, page_content='Total net sales increased or \\\\(\\\\) 28.5$ billion during 2022 compared to 2021, driven primarily by higher net sales of iPhone, Services and Mac. The weakness in foreign currencies relative to the U.S. dollar had an unfavorable year- over- year impact on all Products and Services net sales during 2022.\\n\\nThe Company announces new product, service and software offerings at various times during the year. Significant announcements during fiscal 2022 included the following:\\n\\nFirst Quarter 2022:\\n\\n- Updated MacBook Pro 14\" and MacBook Pro 16\", powered by the Apple M1 Pro or M1 Max chip; and- Third generation of AirPods.\\n\\nSecond Quarter 2022:\\n\\n- Updated iPhone SE with 5G technology;- All-new Mac Studio, powered by the Apple M1 Max or M1 Ultra chip;- All-new Studio Display™; and- Updated iPad Air with 5G technology, powered by the Apple M1 chip.\\n\\nThird Quarter 2022:'),\n", " Document(metadata={'file': 'aapl-20230930-10-K.pdf', 'block_index': None, 'company': 'Apple Inc.(美股)[AAPL]', 'page_number': 27.0, 'element_type': 'text'}, page_content='Products and Services Performance\\n\\nThe following table shows net sales by category for 2023, 2022 and 2021 (dollars in millions):\\n\\n(1) Products net sales include amortization of the deferred value of unspecified software upgrade rights, which are bundled in the sales price of the respective product. \\n(2) Services net sales include amortization of the deferred value of services bundled in the sales price of certain products.\\n\\niPhone\\n\\niPhone iPhone net sales decreased \\\\(2\\\\%\\\\) or \\\\(\\\\) 4.9\\\\(billion during 2023 compared to 2022 due to lower net sales of non - Pro iPhone models, partially offset by higher net sales of Pro iPhone models.\\n\\nMac\\n\\nMac Mac net sales decreased or \\\\(\\\\) 10.8$ billion during 2023 compared to 2022 due primarily to lower net sales of laptops.\\n\\niPad\\n\\niPad iPad net sales decreased or \\\\(\\\\) 1.0$ billion during 2023 compared to 2022 due primarily to lower net sales of iPad mini and iPad Air, partially offset by the combined net sales of iPad 9th and 10th generation.'),\n", " Document(metadata={'file': 'aapl-20240928-10-K.pdf', 'block_index': None, 'company': 'Apple Inc.(美股)[AAPL]', 'page_number': 24.0, 'element_type': 'text'}, page_content='Item 7. Management\\'s Discussion and Analysis of Financial Condition and Results of Operations\\n\\nThe following discussion should be read in conjunction with the consolidated financial statements and accompanying notes included in Part II Item 8 of this Form 10- K. This Item generally discusses 2024 and 2023 items and year- to- year comparisons between 2024 and 2023. Discussions of 2022 items and year- to- year comparisons between 2023 and 2022 are not included, and can be found in \"Management\\'s Discussion and Analysis of Financial Condition and Results of Operations\" in Part II, Item 7 of the Company\\'s Annual Report on Form 10- K for the fiscal year ended September 30, 2023.\\n\\nProduct, Service and Software Announcements\\n\\nThe Company announces new product, service and software offerings at various times during the year. Significant announcements during fiscal year 2024 included the following:\\n\\nFirst Quarter 2024:\\n\\nMacBook Pro 14- in.; MacBook Pro 16- in.; and iMac.\\n\\nSecond Quarter 2024:'),\n", " Document(metadata={'file': 'aapl-20220924-10-K.pdf', 'block_index': None, 'company': 'Apple Inc.(美股)[AAPL]', 'page_number': 26.0, 'element_type': 'text'}, page_content='Americas\\n\\nAmericasAmericas net sales increased during 2022 compared to 2021 due primarily to higher net sales of iPhone, Services and Mac.\\n\\nEurope\\n\\nEurope Europe net sales increased during 2022 compared to 2021 due primarily to higher net sales of iPhone and Services. The weakness in foreign currencies relative to the U.S. dollar had a net unfavorable year- over- year impact on Europe net sales during 2022.\\n\\nGreater China\\n\\nGreater China Greater China net sales increased during 2022 compared to 2021 due primarily to higher net sales of iPhone and Services. The strength of the renminbi relative to the U.S. dollar had a favorable year- over- year impact on Greater China net sales during 2022.\\n\\nJapan\\n\\nJapan Japan net sales decreased during 2022 compared to 2021 due to the weakness of the yen relative to the U.S. dollar.\\n\\nRest of Asia Pacific')]}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "question = \"与2022年比较,Apple公司2023年业绩主要的变化是什么?\"\n", "\n", "ask_question(question)" ] }, { "cell_type": "code", "execution_count": 35, "id": "dbfc950e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "问题: 从利润表格(CONSOLIDATED STATEMENTS OF OPERATIONS),Apple公司2024年的销售额(Total Net sales)、净利润(Net income)?\n", "🎯 目标公司: Apple\n", "--------------------------------------------------------------------------------\n", "📋 检测到表格相关问题,关键词: ['表格', '表', '利润表', 'consolidated statements', 'operations']\n", "✅ 将优先筛选表格类型文档\n", "📋 表格类型筛选:\n", " - 表格文档: 13 个\n", " - 文本文档: 7 个\n", " ✅ 优先使用表格文档\n", "📊 检索过滤统计:\n", " - 总检索文档: 20\n", " - 匹配 Apple: 20\n", " - 过滤掉其他公司: 0\n", " - 最终文档: 20\n", "📋 表格类型筛选:\n", " - 表格文档: 13 个\n", " - 文本文档: 7 个\n", " ✅ 优先使用表格文档\n", "📊 检索过滤统计:\n", " - 总检索文档: 20\n", " - 匹配 Apple: 20\n", " - 过滤掉其他公司: 0\n", " - 最终文档: 20\n", "\n", "答案: 根据提供的文档内容,苹果公司2024年的**Total Net Sales**(总销售额)和**Net Income**(净利润)信息如下:\n", "\n", "---\n", "\n", "### **1. Total Net Sales(总销售额)**\n", "**文档依据**: \n", "- **文档7** 提供了按类别划分的销售额数据,其中明确列示了 **Total net sales**(总销售额)为 **$391,035百万**(即3910.35亿美元)。 \n", " - **2024年数据**:$391,035百万 \n", " - **2023年数据**:$383,285百万 \n", " - **2022年数据**:$394,328百万 \n", "\n", "**表格片段**: \n", "| 项目 | 2024 | Change | 2023 | Change | 2022 |\n", "|--------------------|------------|--------|------------|--------|------------|\n", "| **Total net sales**| $391,035 | 2% | $383,285 | (3)% | $394,328 |\n", "\n", "---\n", "\n", "### **2. Net Income(净利润)**\n", "**文档依据**: \n", "- **文档中未直接提供2024年的净利润(Net Income)数据**。 \n", " - **文档1、2、3、4、5、6、8** 中均未明确列出净利润数值。 \n", " - **文档7** 仅提供了销售额(Total Net Sales)和各业务线的销售额,但未涉及净利润。 \n", " - **文档4** 提供了运营费用(Total operating expenses)和研发费用等,但未提及最终的净利润。 \n", "\n", "**结论**: \n", "由于提供的文档中**缺乏2024年净利润的具体数值或计算依据**,无法准确回答该问题。\n", "\n", "---\n", "\n", "### **最终答案**\n", "- **2024年Total Net Sales**:**$391,035百万**(依据文档7)。 \n", "- **2024年Net Income**:**文档中未提供相关信息**。 \n", "\n", "如需进一步分析,需参考苹果公司年度报告中完整的利润表(Consolidated Statements of Operations)或其他补充财务数据。\n", "\n", "📄 最终使用文档数量: 8\n", "\n", "来源 1:\n", " 公司: Apple Inc.(美股)[AAPL]\n", " 文件: aapl-20240928-10-K.pdf\n", " 类型: table_html\n", " 内容: === 财务报表搜索摘要 ===\n", "财务报表类型: 财务报表 FINANCIAL STATEMENT 财务数据表\n", "表格内容: 2024 2024.0 2023 2023.0 2022 2022.0 Gross margin: Products $ 109633.0 $ 108803.0 $ 11472...\n", "\n", "来源 2:\n", " 公司: Apple Inc.(美股)[AAPL]\n", " 文件: aapl-20240928-10-K.pdf\n", " 类型: table_html\n", " 内容: By: /s/ Luca Maestri Lucia Maestri Senior Vice President, Chief Financial Officer\n", "\n", "Apple Inc.\n", "\n", "=== 财务报表搜索摘要 ===\n", "表格内容: Name Title Date /s/ Timothy D...\n", "\n", "来源 3:\n", " 公司: Apple Inc.(美股)[AAPL]\n", " 文件: aapl-20240928-10-K.pdf\n", " 类型: table_html\n", " 内容: One Apple Park Way\n", "\n", "Apple Inc.\n", "\n", "=== 财务报表搜索摘要 ===\n", "财务报表类型: 财务报表 FINANCIAL STATEMENT 财务数据表\n", "表格内容: Title of each class Trading symbol(s) Name of each excha...\n", "\n", "答案: 根据提供的文档内容,苹果公司2024年的**Total Net Sales**(总销售额)和**Net Income**(净利润)信息如下:\n", "\n", "---\n", "\n", "### **1. Total Net Sales(总销售额)**\n", "**文档依据**: \n", "- **文档7** 提供了按类别划分的销售额数据,其中明确列示了 **Total net sales**(总销售额)为 **$391,035百万**(即3910.35亿美元)。 \n", " - **2024年数据**:$391,035百万 \n", " - **2023年数据**:$383,285百万 \n", " - **2022年数据**:$394,328百万 \n", "\n", "**表格片段**: \n", "| 项目 | 2024 | Change | 2023 | Change | 2022 |\n", "|--------------------|------------|--------|------------|--------|------------|\n", "| **Total net sales**| $391,035 | 2% | $383,285 | (3)% | $394,328 |\n", "\n", "---\n", "\n", "### **2. Net Income(净利润)**\n", "**文档依据**: \n", "- **文档中未直接提供2024年的净利润(Net Income)数据**。 \n", " - **文档1、2、3、4、5、6、8** 中均未明确列出净利润数值。 \n", " - **文档7** 仅提供了销售额(Total Net Sales)和各业务线的销售额,但未涉及净利润。 \n", " - **文档4** 提供了运营费用(Total operating expenses)和研发费用等,但未提及最终的净利润。 \n", "\n", "**结论**: \n", "由于提供的文档中**缺乏2024年净利润的具体数值或计算依据**,无法准确回答该问题。\n", "\n", "---\n", "\n", "### **最终答案**\n", "- **2024年Total Net Sales**:**$391,035百万**(依据文档7)。 \n", "- **2024年Net Income**:**文档中未提供相关信息**。 \n", "\n", "如需进一步分析,需参考苹果公司年度报告中完整的利润表(Consolidated Statements of Operations)或其他补充财务数据。\n", "\n", "📄 最终使用文档数量: 8\n", "\n", "来源 1:\n", " 公司: Apple Inc.(美股)[AAPL]\n", " 文件: aapl-20240928-10-K.pdf\n", " 类型: table_html\n", " 内容: === 财务报表搜索摘要 ===\n", "财务报表类型: 财务报表 FINANCIAL STATEMENT 财务数据表\n", "表格内容: 2024 2024.0 2023 2023.0 2022 2022.0 Gross margin: Products $ 109633.0 $ 108803.0 $ 11472...\n", "\n", "来源 2:\n", " 公司: Apple Inc.(美股)[AAPL]\n", " 文件: aapl-20240928-10-K.pdf\n", " 类型: table_html\n", " 内容: By: /s/ Luca Maestri Lucia Maestri Senior Vice President, Chief Financial Officer\n", "\n", "Apple Inc.\n", "\n", "=== 财务报表搜索摘要 ===\n", "表格内容: Name Title Date /s/ Timothy D...\n", "\n", "来源 3:\n", " 公司: Apple Inc.(美股)[AAPL]\n", " 文件: aapl-20240928-10-K.pdf\n", " 类型: table_html\n", " 内容: One Apple Park Way\n", "\n", "Apple Inc.\n", "\n", "=== 财务报表搜索摘要 ===\n", "财务报表类型: 财务报表 FINANCIAL STATEMENT 财务数据表\n", "表格内容: Title of each class Trading symbol(s) Name of each excha...\n" ] }, { "data": { "text/plain": [ "{'result': '根据提供的文档内容,苹果公司2024年的**Total Net Sales**(总销售额)和**Net Income**(净利润)信息如下:\\n\\n---\\n\\n### **1. Total Net Sales(总销售额)**\\n**文档依据**: \\n- **文档7** 提供了按类别划分的销售额数据,其中明确列示了 **Total net sales**(总销售额)为 **$391,035百万**(即3910.35亿美元)。 \\n - **2024年数据**:$391,035百万 \\n - **2023年数据**:$383,285百万 \\n - **2022年数据**:$394,328百万 \\n\\n**表格片段**: \\n| 项目 | 2024 | Change | 2023 | Change | 2022 |\\n|--------------------|------------|--------|------------|--------|------------|\\n| **Total net sales**| $391,035 | 2% | $383,285 | (3)% | $394,328 |\\n\\n---\\n\\n### **2. Net Income(净利润)**\\n**文档依据**: \\n- **文档中未直接提供2024年的净利润(Net Income)数据**。 \\n - **文档1、2、3、4、5、6、8** 中均未明确列出净利润数值。 \\n - **文档7** 仅提供了销售额(Total Net Sales)和各业务线的销售额,但未涉及净利润。 \\n - **文档4** 提供了运营费用(Total operating expenses)和研发费用等,但未提及最终的净利润。 \\n\\n**结论**: \\n由于提供的文档中**缺乏2024年净利润的具体数值或计算依据**,无法准确回答该问题。\\n\\n---\\n\\n### **最终答案**\\n- **2024年Total Net Sales**:**$391,035百万**(依据文档7)。 \\n- **2024年Net Income**:**文档中未提供相关信息**。 \\n\\n如需进一步分析,需参考苹果公司年度报告中完整的利润表(Consolidated Statements of Operations)或其他补充财务数据。',\n", " 'source_documents': [Document(metadata={'file': 'aapl-20240928-10-K.pdf', 'table_count': 1.0, 'company': 'Apple Inc.(美股)[AAPL]', 'block_indices': [2], 'page_number': 27.0, 'element_type': 'table_html'}, page_content='=== 财务报表搜索摘要 ===\\n财务报表类型: 财务报表 FINANCIAL STATEMENT 财务数据表\\n表格内容: 2024 2024.0 2023 2023.0 2022 2022.0 Gross margin: Products $ 109633.0 $ 108803.0 $ 114728.0 Services 71050 60345 56054 Total gross margin $ 180683.0 $ 169148.0 $ 170782.0\\n搜索关键词: Apple财务数据 苹果公司财务报表 年度报告 annual report financial data income statement profit loss statement consolidated operations consolidated statements of operations statements of operations operations statement comprehensive income net sales net income 利润表 损益表 综合收益表 营业收入 净利润 经营业绩 财务报表 合并报表 综合财务报表\\n\\n=== 原始表格HTML ===\\n
202420232022
Gross margin:
Products$109,633$108,803$114,728
Services71,05060,34556,054
Total gross margin$180,683$169,148$170,782
'),\n", " Document(metadata={'file': 'aapl-20240928-10-K.pdf', 'table_count': 1.0, 'company': 'Apple Inc.(美股)[AAPL]', 'block_indices': [8], 'page_number': 60.0, 'element_type': 'table_html'}, page_content='By: /s/ Luca Maestri Lucia Maestri Senior Vice President, Chief Financial Officer\\n\\nApple Inc.\\n\\n=== 财务报表搜索摘要 ===\\n表格内容: Name Title Date /s/ Timothy D. Cook Chief Executive Officer and Director (Principal Executive Officer) November 1, 2024 TIMOTHY D. COOK /s/ Luca Maestri Senior Vice President, Chief Financial Officer (Principal Financial Officer) November 1, 2024 LUCA MAESTRI /s/ Chris Kondo Senior Director of Corporate Accounting (Principal Accounting Officer) November 1, 2024 CHRIS KONDO /s/ Wanda Austin Director November 1, 2024 WANDA AUSTIN /s/ Alex Gorsky Director November 1, 2024 ALEX GORSKY /s/ Andrea Jung Director November 1, 2024 ANDREA JUNG /s/ Arthur D. Levinson Director and Chair of the Board November 1, 2024 ARTHUR D. LEVINSON /s/ Monica Lozano Director November 1, 2024 MONICA LOZANO /s/ Ronald D. Sugar Director November 1, 2024 RONALD D. SUGAR /s/ Susan L. Wagner Director November 1, 2024 SUSAN L. WAGNER\\n搜索关键词: Apple财务数据 苹果公司财务报表 年度报告 annual report financial data income statement profit loss statement consolidated operations consolidated statements of operations statements of operations operations statement comprehensive income net sales net income 利润表 损益表 综合收益表 营业收入 净利润 经营业绩 财务报表 合并报表 综合财务报表\\n\\n=== 原始表格HTML ===\\n
NameTitleDate
/s/ Timothy D. CookChief Executive Officer and Director \\n(Principal Executive Officer)November 1, 2024
TIMOTHY D. COOK
/s/ Luca MaestriSenior Vice President, Chief Financial Officer \\n(Principal Financial Officer)November 1, 2024
LUCA MAESTRI
/s/ Chris KondoSenior Director of Corporate Accounting \\n(Principal Accounting Officer)November 1, 2024
CHRIS KONDO
/s/ Wanda AustinDirectorNovember 1, 2024
WANDA AUSTIN
/s/ Alex GorskyDirectorNovember 1, 2024
ALEX GORSKY
/s/ Andrea JungDirectorNovember 1, 2024
ANDREA JUNG
/s/ Arthur D. LevinsonDirector and Chair of the BoardNovember 1, 2024
ARTHUR D. LEVINSON
/s/ Monica LozanoDirectorNovember 1, 2024
MONICA LOZANO
/s/ Ronald D. SugarDirectorNovember 1, 2024
RONALD D. SUGAR
/s/ Susan L. WagnerDirectorNovember 1, 2024
SUSAN L. WAGNER
'),\n", " Document(metadata={'file': 'aapl-20240928-10-K.pdf', 'table_count': 1.0, 'company': 'Apple Inc.(美股)[AAPL]', 'block_indices': [16], 'page_number': 1.0, 'element_type': 'table_html'}, page_content='One Apple Park Way\\n\\nApple Inc.\\n\\n=== 财务报表搜索摘要 ===\\n财务报表类型: 财务报表 FINANCIAL STATEMENT 财务数据表\\n表格内容: Title of each class Trading symbol(s) Name of each exchange on which registered Common Stock, $0.00001 par value per share AAPL The Nasdaq Stock Market LLC 0.000% Notes due 2025 — The Nasdaq Stock Market LLC 0.875% Notes due 2025 — The Nasdaq Stock Market LLC 1.625% Notes due 2026 — The Nasdaq Stock Market LLC 2.000% Notes due 2027 — The Nasdaq Stock Market LLC 1.375% Notes due 2029 — The Nasdaq Stock Market LLC 3.050% Notes due 2029 — The Nasdaq Stock Market LLC 0.500% Notes due 2031 — The Nasdaq Stock Market LLC 3.000% Notes due 2042 — The Nasdaq Stock Market LLC\\n搜索关键词: Apple财务数据 苹果公司财务报表 年度报告 annual report financial data income statement profit loss statement consolidated operations consolidated statements of operations statements of operations operations statement comprehensive income net sales net income 利润表 损益表 综合收益表 营业收入 净利润 经营业绩 财务报表 合并报表 综合财务报表\\n\\n=== 原始表格HTML ===\\n
Title of each classTrading symbol(s)Name of each exchange on which registered
Common Stock, $0.00001 par value per shareAAPLThe Nasdaq Stock Market LLC
0.000% Notes due 2025The Nasdaq Stock Market LLC
0.875% Notes due 2025The Nasdaq Stock Market LLC
1.625% Notes due 2026The Nasdaq Stock Market LLC
2.000% Notes due 2027The Nasdaq Stock Market LLC
1.375% Notes due 2029The Nasdaq Stock Market LLC
3.050% Notes due 2029The Nasdaq Stock Market LLC
0.500% Notes due 2031The Nasdaq Stock Market LLC
3.000% Notes due 2042The Nasdaq Stock Market LLC
'),\n", " Document(metadata={'file': 'aapl-20240928-10-K.pdf', 'table_count': 1.0, 'company': 'Apple Inc.(美股)[AAPL]', 'block_indices': [14], 'page_number': 27.0, 'element_type': 'table_html'}, page_content='Services gross margin increased during 2024 compared to 2023 due primarily to higher Services net sales.\\n\\n=== 财务报表搜索摘要 ===\\n财务报表类型: 合并损益表 CONSOLIDATED STATEMENTS OF OPERATIONS 利润表 损益表 收益表 财务报表\\n关键财务指标: 0: Research and development | 0: Percentage of total net sales | 0: Selling, general and administrative | 0: Percentage of total net sales | 0: Total operating expenses | 0: Percentage of total net sales\\n表格内容: 2024 Change 2023 Change 2022 Research and development $ 31,370 5 % $ 29,915 14 % $ 26,251 Percentage of total net sales 8 % 8 % 7 % Selling, general and administrative $ 26,097 5 % $ 24,932 (1)% $ 25,094 Percentage of total net sales 7 % 7 % 6 % Total operating expenses $ 57,467 5 % $ 54,847 7 % $ 51,345 Percentage of total net sales 15 % 14 % 13 %\\n搜索关键词: Apple财务数据 苹果公司财务报表 年度报告 annual report financial data income statement profit loss statement consolidated operations consolidated statements of operations statements of operations operations statement comprehensive income net sales net income 利润表 损益表 综合收益表 营业收入 净利润 经营业绩 财务报表 合并报表 综合财务报表\\n\\n=== 原始表格HTML ===\\n
2024Change2023Change2022
Research and development$ 31,3705 %$ 29,91514 %$ 26,251
Percentage of total net sales8 %8 %7 %
Selling, general and administrative$ 26,0975 %$ 24,932(1)%$ 25,094
Percentage of total net sales7 %7 %6 %
Total operating expenses$ 57,4675 %$ 54,8477 %$ 51,345
Percentage of total net sales15 %14 %13 %
'),\n", " Document(metadata={'file': 'aapl-20240928-10-K.pdf', 'table_count': 1.0, 'company': 'Apple Inc.(美股)[AAPL]', 'block_indices': [0], 'page_number': 50.0, 'element_type': 'table_html'}, page_content='=== 财务报表搜索摘要 ===\\n财务报表类型: 合并损益表 CONSOLIDATED STATEMENTS OF OPERATIONS 利润表 损益表 收益表 财务报表\\n关键财务指标: 0: Net sales | 0: Operating income | 0: Net sales | 0: Operating income | 0: Net sales | 0: Operating income | 0: Net sales | 0: Operating income | 0: Net sales | 0: Operating income\\n表格内容: 2024 2024.0 2023 2023.0 2022 2022.0 Americas: Net sales $ 167045.0 $ 162560.0 $ 169658.0 Operating income $ 67656.0 $ 60508.0 $ 62683.0 Europe: Net sales $ 101328.0 $ 94294.0 $ 95118.0 Operating income $ 41790.0 $ 36098.0 $ 35233.0 Greater China: Net sales $ 66952.0 $ 72559.0 $ 74200.0 Operating income $ 27082.0 $ 30328.0 $ 31153.0 Japan: Net sales $ 25052.0 $ 24257.0 $ 25977.0 Operating income $ 12454.0 $ 11888.0 $ 12257.0 Rest of Asia Pacific: Net sales $ 30658.0 $ 29615.0 $ 29375.0 Operating income $ 13062.0 $ 12066.0 $ 11569.0\\n搜索关键词: Apple财务数据 苹果公司财务报表 年度报告 annual report financial data income statement profit loss statement consolidated operations consolidated statements of operations statements of operations operations statement comprehensive income net sales net income 利润表 损益表 综合收益表 营业收入 净利润 经营业绩 财务报表 合并报表 综合财务报表\\n\\n=== 原始表格HTML ===\\n
202420232022
Americas:
Net sales$167,045$162,560$169,658
Operating income$67,656$60,508$62,683
Europe:
Net sales$101,328$94,294$95,118
Operating income$41,790$36,098$35,233
Greater China:
Net sales$66,952$72,559$74,200
Operating income$27,082$30,328$31,153
Japan:
Net sales$25,052$24,257$25,977
Operating income$12,454$11,888$12,257
Rest of Asia Pacific:
Net sales$30,658$29,615$29,375
Operating income$13,062$12,066$11,569
'),\n", " Document(metadata={'file': 'aapl-20240928-10-K.pdf', 'table_count': 1.0, 'company': 'Apple Inc.(美股)[AAPL]', 'block_indices': [8], 'page_number': 46.0, 'element_type': 'table_html'}, page_content='=== 财务报表搜索摘要 ===\\n财务报表类型: 财务报表 FINANCIAL STATEMENT 财务数据表\\n表格内容: 2024 2023 2022 $ 3,960 $ (1,333) $ 5,264 — — — 5948 — (7,257) — (1,309) — 3955\\n搜索关键词: Apple财务数据 苹果公司财务报表 年度报告 annual report financial data income statement profit loss statement consolidated operations consolidated statements of operations statements of operations operations statement comprehensive income net sales net income 利润表 损益表 综合收益表 营业收入 净利润 经营业绩 财务报表 合并报表 综合财务报表\\n\\n=== 原始表格HTML ===\\n
202420232022
$ 3,960$ (1,333)$ 5,264
5,948
(7,257)
(1,309)
3,955
'),\n", " Document(metadata={'file': 'aapl-20240928-10-K.pdf', 'table_count': 1.0, 'company': 'Apple Inc.(美股)[AAPL]', 'block_indices': [2], 'page_number': 26.0, 'element_type': 'table_html'}, page_content='The following table shows net sales by category for 2024, 2023 and 2022 (dollars in millions):\\n\\n=== 财务报表搜索摘要 ===\\n财务报表类型: 合并损益表 CONSOLIDATED STATEMENTS OF OPERATIONS 利润表 损益表 收益表 财务报表\\n关键财务指标: 0: Total net sales\\n表格内容: 2024 Change 2023 Change 2022 iPhone $ 201,183 — % $ 200,583 (2)% $ 205,489 Mac 29984 2 % 29357 (27)% 40177 iPad 26694 (6)% 28300 (3)% 29292 Wearables, Home and Accessories 37005 (7)% 39845 (3)% 41241 Services (1) 96169 13 % 85200 9 % 78129 Total net sales $ 391,035 2 % $ 383,285 (3)% $ 394,328\\n搜索关键词: Apple财务数据 苹果公司财务报表 年度报告 annual report financial data income statement profit loss statement consolidated operations consolidated statements of operations statements of operations operations statement comprehensive income net sales net income 利润表 损益表 综合收益表 营业收入 净利润 经营业绩 财务报表 合并报表 综合财务报表\\n\\n=== 原始表格HTML ===\\n
2024Change2023Change2022
iPhone$ 201,183— %$ 200,583(2)%$ 205,489
Mac29,9842 %29,357(27)%40,177
iPad26,694(6)%28,300(3)%29,292
Wearables, Home and Accessories37,005(7)%39,845(3)%41,241
Services (1)96,16913 %85,2009 %78,129
Total net sales$ 391,0352 %$ 383,285(3)%$ 394,328
'),\n", " Document(metadata={'file': 'aapl-20240928-10-K.pdf', 'table_count': 1.0, 'company': 'Apple Inc.(美股)[AAPL]', 'block_indices': [3], 'page_number': 44.0, 'element_type': 'table_html'}, page_content='Deferred tax assets: Capitalized research and development Tax credit carryforwards Accrued liabilities and other reserves Deferred revenue Lease liabilities Unrealized losses Other Total deferred tax assets Less: Valuation allowance Total deferred tax assets, net Deferred tax liabilities: Depreciation Right- of- use assets Minimum tax on foreign earnings Unrealized gains Other Total deferred tax liabilities Net deferred tax assets\\n\\n=== 财务报表搜索摘要 ===\\n财务报表类型: 财务报表 FINANCIAL STATEMENT 财务数据表\\n表格内容: 2024 2023 $ 10,739 $ 6,294 8856 6114 3413 2410 1173 2168 34873 (8,866) 26007 2551 2125 1674 — 455 6805 19202\\n搜索关键词: Apple财务数据 苹果公司财务报表 年度报告 annual report financial data income statement profit loss statement consolidated operations consolidated statements of operations statements of operations operations statement comprehensive income net sales net income 利润表 损益表 综合收益表 营业收入 净利润 经营业绩 财务报表 合并报表 综合财务报表\\n\\n=== 原始表格HTML ===\\n
20242023
$ 10,739$ 6,294
8,856
6,114
3,413
2,410
1,173
2,168
34,873
(8,866)
26,007
2,551
2,125
1,674
455
6,805
19,202
')]}" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "question = \"从利润表格(CONSOLIDATED STATEMENTS OF OPERATIONS),Apple公司2024年的销售额(Total Net sales)、净利润(Net income)?\"\n", "\n", "ask_question_with_company_filter(question, target_company=\"Apple\")" ] }, { "cell_type": "code", "execution_count": 72, "id": "97e8c0dd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "问题: 从合并资产负债表看,2023年中国移动的流动比率是多少?\n", "🎯 目标公司: 中国移动\n", "--------------------------------------------------------------------------------\n", "📋 检测到表格相关问题,关键词: ['表', '资产负债表', '合并资产负债表']\n", "✅ 将优先筛选表格类型文档\n", "📋 表格类型筛选:\n", " - 表格文档: 2 个\n", " - 文本文档: 18 个\n", " ✅ 优先使用表格文档\n", "📊 检索过滤统计:\n", " - 总检索文档: 20\n", " - 匹配 中国移动: 20\n", " - 过滤掉其他公司: 0\n", " - 最终文档: 20\n", "📋 表格类型筛选:\n", " - 表格文档: 2 个\n", " - 文本文档: 18 个\n", " ✅ 优先使用表格文档\n", "📊 检索过滤统计:\n", " - 总检索文档: 20\n", " - 匹配 中国移动: 20\n", " - 过滤掉其他公司: 0\n", " - 最终文档: 20\n", "\n", "答案: 根据提供的文档内容,无法计算2023年中国移动的流动比率。原因如下:\n", "\n", "1. **流动比率定义**:流动比率=流动资产/流动负债,需要合并资产负债表中具体的流动资产和流动负债数值。\n", "\n", "2. **文档信息缺失**:\n", " - 文档1、文档2仅列出合并资产负债表和利润表的负责人信息,未提供具体财务数据(如流动资产、流动负债的金额)。\n", " - 文档3-文档8的附注部分涉及合同资产、其他流动资产、资本管理、金融工具风险等,但未披露合并资产负债表中流动资产和流动负债的量化数据。\n", " - 文档4提到资产负债率(总负债/总资产),但未提供流动负债的具体数值。\n", " - 文档8的现金流量表补充资料仅涉及现金及现金等价物的变动,未明确流动资产和流动负债的结构。\n", "\n", "3. **关键数据缺失**:所有提供的文档片段均未包含合并资产负债表中流动资产(如货币资金、应收账款、存货等)和流动负债(如短期借款、应付账款等)的具体金额,因此无法计算流动比率。\n", "\n", "**结论**:文档中缺乏必要的财务数据(流动资产和流动负债的数值),无法确定2023年中国移动的流动比率。建议查阅完整的合并资产负债表或相关财务报告的详细数据部分。\n", "\n", "📄 最终使用文档数量: 8\n", "\n", "来源 1:\n", " 公司: 中国移动(A股)[600941]\n", " 文件: 中国移动:2023年度报告.pdf\n", " 类型: table_html\n", " 内容: 合并资产负债表2023年12月31日(除特别注明外,金额单位为人民币百万元)\n", "\n", "[财务数据表]\n", "杨杰 | 李荣华 | 陈静\n", "企业负责人 | 主管会计工作负责人 | 会计机构负责人\n", "\n", "
杨杰李荣华陈静
企...\n", "\n", "来源 2:\n", " 公司: 中国移动(A股)[600941]\n", " 文件: 中国移动:2023年度报告.pdf\n", " 类型: table_html\n", " 内容: 合并利润表2023年度(除特别注明外,金额单位为人民币百万元)\n", "\n", "[财务数据表]\n", "杨杰 | 李荣华 | 陈静\n", "企业负责人 | 主管会计工作负责人 | 会计机构负责人\n", "\n", "
杨杰李荣华陈静
企业负责人
杨杰李荣华陈静
企...\n", "\n", "来源 2:\n", " 公司: 中国移动(A股)[600941]\n", " 文件: 中国移动:2023年度报告.pdf\n", " 类型: table_html\n", " 内容: 合并利润表2023年度(除特别注明外,金额单位为人民币百万元)\n", "\n", "[财务数据表]\n", "杨杰 | 李荣华 | 陈静\n", "企业负责人 | 主管会计工作负责人 | 会计机构负责人\n", "\n", "
杨杰李荣华陈静
企业负责人