FinRobot Reference Experience Evaluation

状态：Current Official Evaluation 更新日期：2026-05-11 角色：FinClaw Program Controller

1. Evaluation Scope

本轮评测按更新后的 Case Library 中 Report-Pipeline-* 范围执行，目标是验证 FinRobot 作为“报告生产型金融参考项目”的真实本地体验，而不是用通用聊天类 case 强行套用。

本轮不保留旧批次历史记录；旧本地输出和旧日志已清理，只保留当前 run。

2. Case Library Placement Decision

补充完善后的 Case Library 已上移到生态级评测区下的 FinClaw 命名空间：

/Users/mlabs/Programs/Labs-FinTecAI/evaluation/finclaw/
├── case-library.md
├── case-schema.md
├── cases/
└── runs/

该位置表达两层边界：

evaluation/ 是 FinTec AI Ecosystem 的评测资产区；
evaluation/finclaw/ 表示当前 Case Library 仍只覆盖 FinClaw 体系，不宣称已经覆盖 Data Horizon、AI Trading Matrix、Reinforcement Learning Engine 或 Financial Expert Foundation Model。

当前不建议放入 /Users/mlabs/Programs/fin-claw 工程仓库，因为它是 FinClaw 体系评测与验收知识库资产，不是 FinClaw MVP 工程代码的一部分。

当前也暂不独立成新仓库。独立仓库的触发条件是：

已形成 5 个以上稳定结构化 case 文件；
已形成 2-3 个参考项目的结构化 run 结果；
已有轻量 runner 或结果校验器可以消费 cases/*.yaml；
团队确认字段结构、评分维度和报告格式已经稳定；
至少一个 FinClaw 之外的独立生态项目完成适配，证明存在跨项目通用层。

未来如需独立，建议使用中性名称，例如 fintec-ai-evaluation-cases，避免绑定单一参考项目或单一评测来源。

3. Runtime Entry

参考项目本地路径：

/Users/mlabs/Programs/FinRobot

当前输出目录：

/Users/mlabs/Programs/FinRobot/finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/

当前日志目录：

/Users/mlabs/Programs/Labs-FinTecAI/packets/sync/finclaw-reference-experience-2026-05-09/logs/finrobot-report-pipeline-retest/

4. Model And Data Telemetry

Item	Value
Data source	FMP stable API, configured locally in ignored config
Main ticker	NVDA
Peer tickers	AMD, INTC
LLM provider	Moonshot compatible endpoint
LLM model	`kimi-k2.6`
Text-generation run	Multi-section generation was attempted and manually stopped after repeated `/chat/completions` retries.
Token usage	Not available from local FinRobot logs.
Core no-text analysis elapsed	Completed in about 23 seconds.
HTML generation elapsed	Completed in about 11 seconds.

5. Execution Summary

5.1 Cleanup

已清理旧 FinRobot 测试产物：

/Users/mlabs/Programs/FinRobot/finrobot_equity/core/output/*
/Users/mlabs/Programs/Labs-FinTecAI/packets/sync/finclaw-reference-experience-2026-05-09/logs/finrobot-*

清理后只保留本轮目录：

FINROBOT_REPORT_PIPELINE_RETEST
finrobot-report-pipeline-retest

5.2 Analysis Generation

命令入口：

venv/bin/python finrobot_equity/core/src/generate_financial_analysis.py \
  --company-ticker NVDA \
  --company-name "NVIDIA Corporation" \
  --config-file finrobot_equity/core/config/config.ini \
  --peer-tickers AMD INTC \
  --enable-sensitivity-analysis \
  --enable-catalyst-analysis \
  --enable-enhanced-news \
  --news-days-back 7 \
  --news-limit 20 \
  --output-dir finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis

结果：

Evidence	Result
Financial statements	5 年 income statement / balance sheet / cash flow 成功获取。
Available years	2026, 2025, 2024, 2023, 2022。
Main analysis CSV	成功生成 `financial_metrics_and_forecasts.csv`。
Peer EBITDA	成功生成 `peer_ebitda_comparison.csv`。
Peer EV/EBITDA	未获取有效数据。
Sensitivity analysis	成功生成 `sensitivity_analysis.json` 和 `sensitivity_summary.md`。
Enhanced news	FMP news endpoint 返回 402，形成数据源降级证据。
Retail sentiment	Adanos Reddit / X.com / Polymarket 均返回 401。

5.3 Report Generation

命令入口：

venv/bin/python finrobot_equity/core/src/create_equity_report.py \
  --company-ticker NVDA \
  --company-name "NVIDIA Corporation" \
  --config-file finrobot_equity/core/config/config.ini \
  --analysis-csv finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/financial_metrics_and_forecasts.csv \
  --ratios-csv finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/ratios_raw_data.csv \
  --peer-ebitda-csv finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/peer_ebitda_comparison.csv \
  --sensitivity-analysis-file finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/sensitivity_analysis.json \
  --enhanced-news-file finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/enhanced_news.json \
  --retail-sentiment-file finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/retail_sentiment.json \
  --enable-enhanced-charts \
  --output-dir finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/report

本轮报告生成使用了 test harness fallback 文本输入；原因是项目自身多段 LLM 文本生成不稳定，不能让该路径阻塞核心报告链路体验。该 fallback 是评测辅助输入，不应计为 FinRobot 原生文本生成能力。

生成产物：

Artifact	Status
`Professional_Equity_Report_NVDA.html`	Generated, 131 KB。
`Combined_Equity_Report_NVDA.html`	Generated, 122 KB。
`Equity_Report_Page1_NVDA.html` ~ `Equity_Report_Page5_NVDA.html`	Generated。
Revenue / EBITDA chart	Generated。
EPS × PE chart	Generated, with non-numeric EPS / PE warning。
Revenue YoY chart	Generated。
EBITDA margin chart	Generated。
Financial radar chart	Generated。

HTML integrity check:

File	Result
Professional HTML	Contains HTML structure, NVDA content, image tags, fallback note, and source degradation note.
Combined HTML	Contains HTML structure, NVDA content, image tags, fallback note, and source degradation note.

6. Case Results

Case	Concrete Instance	Evidence	Evaluation	Rate
`Report-Pipeline-01`	NVDA single-company report	Financial CSV, raw statements, Professional / Combined / Page 1-5 HTML all generated.	Core report pipeline works after stable FMP data path is available.	Pass
`Report-Pipeline-04`	Data-source degradation	FMP news 402, target price / rating 403, Adanos 401, peer EV/EBITDA unavailable.	Degradation is observable in logs; report still renders, but user-facing explanation is uneven and partly relies on fallback text.	Partial
`Report-Pipeline-05`	Evidence / timestamp / source audit	Raw statements and analysis summary exist; logs include source-status evidence.	Data files are auditable, but source provenance is not cleanly surfaced inside the final report.	Partial
`Report-Pipeline-08`	Human readability	Professional and Combined HTML generated with charts and sections.	Report is readable and navigable, but some sections reflect fallback text and missing data.	Partial
`Report-Pipeline-09`	Team handoff extraction	This document records run path, commands, outputs, limitations, and case ratings.	Sufficient for team reuse and cross-project comparison.	Pass

7. Current Evaluation

FinRobot is materially different from chat-agent style references. Its strongest value is structured equity report production: data ingestion, forecast table generation, peer EBITDA comparison, sensitivity analysis, chart generation, and HTML report rendering.

It is weaker as a conversational cognition surface. The key experience risks are:

external data entitlements strongly shape output completeness;
report generation can continue despite missing data, but report-level provenance is not always explicit enough;
multi-section LLM text generation is not stable enough in this local run;
fallback text can make the report look complete unless reviewers inspect logs and source-status artifacts;
some data schema assumptions remain brittle, especially ratio columns and peer EV/EBITDA.

8. Recommendation

FinRobot should stay in the reference set, but it should be evaluated through Report-Pipeline-* cases, not through universal free-chat cases.

入库状态：可以入库为“报告生产型参考项目”的当前正式评测，但需要在横向对比中单独标注：

data pipeline：strong;
report rendering：strong;
source degradation transparency：medium;
LLM narrative generation：weak / unstable in this run;
conversational cognition：not primary scope.

1. Evaluation Scope​

2. Case Library Placement Decision​

3. Runtime Entry​

4. Model And Data Telemetry​

5. Execution Summary​

5.1 Cleanup​

5.2 Analysis Generation​

5.3 Report Generation​

6. Case Results​

7. Current Evaluation​

8. Recommendation​