FinRobot Reference Experience Evaluation
状态:Current Official Evaluation 更新日期:2026-05-11 角色:FinClaw Program Controller
1. Evaluation Scope
本轮评测按更新后的 Case Library 中 Report-Pipeline-* 范围执行,目标是验证 FinRobot 作为“报告生产型金融参考项目”的真实本地体验,而不是用通用聊天类 case 强行套用。
本轮不保留旧批次历史记录;旧本地输出和旧日志已清理,只保留当前 run。
2. Case Library Placement Decision
补充完善后的 Case Library 已上移到生态级评测区下的 FinClaw 命名空间:
/Users/mlabs/Programs/Labs-FinTecAI/evaluation/finclaw/
├── case-library.md
├── case-schema.md
├── cases/
└── runs/
该位置表达两层边界:
evaluation/是 FinTec AI Ecosystem 的评测资产区;evaluation/finclaw/表示当前 Case Library 仍只覆盖 FinClaw 体系,不宣称已经覆盖 Data Horizon、AI Trading Matrix、Reinforcement Learning Engine 或 Financial Expert Foundation Model。
当前不建议放入 /Users/mlabs/Programs/fin-claw 工程仓库,因为它是 FinClaw 体系评测与验收知识库资产,不是 FinClaw MVP 工程代码的一部分。
当前也暂不独立成新仓库。独立仓库的触发条件是:
- 已形成 5 个以上稳定结构化 case 文件;
- 已形成 2-3 个参考项目的结构化 run 结果;
- 已有轻量 runner 或结果校验器可以消费
cases/*.yaml; - 团队确认字段结构、评分维度和报告格式已经稳定;
- 至少一个 FinClaw 之外的独立生态项目完成适配,证明存在跨项目通用层。
未来如需独立,建议使用中性名称,例如 fintec-ai-evaluation-cases,避免绑定单一参考项目或单一评测来源。
3. Runtime Entry
参考项目本地路径:
/Users/mlabs/Programs/FinRobot
当前输出目录:
/Users/mlabs/Programs/FinRobot/finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/
当前日志目录:
/Users/mlabs/Programs/Labs-FinTecAI/packets/sync/finclaw-reference-experience-2026-05-09/logs/finrobot-report-pipeline-retest/
4. Model And Data Telemetry
| Item | Value |
|---|---|
| Data source | FMP stable API, configured locally in ignored config |
| Main ticker | NVDA |
| Peer tickers | AMD, INTC |
| LLM provider | Moonshot compatible endpoint |
| LLM model | kimi-k2.6 |
| Text-generation run | Multi-section generation was attempted and manually stopped after repeated /chat/completions retries. |
| Token usage | Not available from local FinRobot logs. |
| Core no-text analysis elapsed | Completed in about 23 seconds. |
| HTML generation elapsed | Completed in about 11 seconds. |
5. Execution Summary
5.1 Cleanup
已清理旧 FinRobot 测试产物:
/Users/mlabs/Programs/FinRobot/finrobot_equity/core/output/*
/Users/mlabs/Programs/Labs-FinTecAI/packets/sync/finclaw-reference-experience-2026-05-09/logs/finrobot-*
清理后只保留本轮目录:
FINROBOT_REPORT_PIPELINE_RETEST
finrobot-report-pipeline-retest
5.2 Analysis Generation
命令入口:
venv/bin/python finrobot_equity/core/src/generate_financial_analysis.py \
--company-ticker NVDA \
--company-name "NVIDIA Corporation" \
--config-file finrobot_equity/core/config/config.ini \
--peer-tickers AMD INTC \
--enable-sensitivity-analysis \
--enable-catalyst-analysis \
--enable-enhanced-news \
--news-days-back 7 \
--news-limit 20 \
--output-dir finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis
结果:
| Evidence | Result |
|---|---|
| Financial statements | 5 年 income statement / balance sheet / cash flow 成功获取。 |
| Available years | 2026, 2025, 2024, 2023, 2022。 |
| Main analysis CSV | 成功生成 financial_metrics_and_forecasts.csv。 |
| Peer EBITDA | 成功生成 peer_ebitda_comparison.csv。 |
| Peer EV/EBITDA | 未获取有效数据。 |
| Sensitivity analysis | 成功生成 sensitivity_analysis.json 和 sensitivity_summary.md。 |
| Enhanced news | FMP news endpoint 返回 402,形成数据源降级证据。 |
| Retail sentiment | Adanos Reddit / X.com / Polymarket 均返回 401。 |
5.3 Report Generation
命令入口:
venv/bin/python finrobot_equity/core/src/create_equity_report.py \
--company-ticker NVDA \
--company-name "NVIDIA Corporation" \
--config-file finrobot_equity/core/config/config.ini \
--analysis-csv finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/financial_metrics_and_forecasts.csv \
--ratios-csv finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/ratios_raw_data.csv \
--peer-ebitda-csv finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/peer_ebitda_comparison.csv \
--sensitivity-analysis-file finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/sensitivity_analysis.json \
--enhanced-news-file finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/enhanced_news.json \
--retail-sentiment-file finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/retail_sentiment.json \
--enable-enhanced-charts \
--output-dir finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/report
本轮报告生成使用了 test harness fallback 文本输入;原因是项目自身多段 LLM 文本生成不稳定,不能让该路径阻塞核心报告链路体验。该 fallback 是评测辅助输入,不应计为 FinRobot 原生文本生成能力。
生成产物:
| Artifact | Status |
|---|---|
Professional_Equity_Report_NVDA.html | Generated, 131 KB。 |
Combined_Equity_Report_NVDA.html | Generated, 122 KB。 |
Equity_Report_Page1_NVDA.html ~ Equity_Report_Page5_NVDA.html | Generated。 |
| Revenue / EBITDA chart | Generated。 |
| EPS × PE chart | Generated, with non-numeric EPS / PE warning。 |
| Revenue YoY chart | Generated。 |
| EBITDA margin chart | Generated。 |
| Financial radar chart | Generated。 |
HTML integrity check:
| File | Result |
|---|---|
| Professional HTML | Contains HTML structure, NVDA content, image tags, fallback note, and source degradation note. |
| Combined HTML | Contains HTML structure, NVDA content, image tags, fallback note, and source degradation note. |
6. Case Results
| Case | Concrete Instance | Evidence | Evaluation | Rate |
|---|---|---|---|---|
Report-Pipeline-01 | NVDA single-company report | Financial CSV, raw statements, Professional / Combined / Page 1-5 HTML all generated. | Core report pipeline works after stable FMP data path is available. | Pass |
Report-Pipeline-04 | Data-source degradation | FMP news 402, target price / rating 403, Adanos 401, peer EV/EBITDA unavailable. | Degradation is observable in logs; report still renders, but user-facing explanation is uneven and partly relies on fallback text. | Partial |
Report-Pipeline-05 | Evidence / timestamp / source audit | Raw statements and analysis summary exist; logs include source-status evidence. | Data files are auditable, but source provenance is not cleanly surfaced inside the final report. | Partial |
Report-Pipeline-08 | Human readability | Professional and Combined HTML generated with charts and sections. | Report is readable and navigable, but some sections reflect fallback text and missing data. | Partial |
Report-Pipeline-09 | Team handoff extraction | This document records run path, commands, outputs, limitations, and case ratings. | Sufficient for team reuse and cross-project comparison. | Pass |
7. Current Evaluation
FinRobot is materially different from chat-agent style references. Its strongest value is structured equity report production: data ingestion, forecast table generation, peer EBITDA comparison, sensitivity analysis, chart generation, and HTML report rendering.
It is weaker as a conversational cognition surface. The key experience risks are:
- external data entitlements strongly shape output completeness;
- report generation can continue despite missing data, but report-level provenance is not always explicit enough;
- multi-section LLM text generation is not stable enough in this local run;
- fallback text can make the report look complete unless reviewers inspect logs and source-status artifacts;
- some data schema assumptions remain brittle, especially ratio columns and peer EV/EBITDA.
8. Recommendation
FinRobot should stay in the reference set, but it should be evaluated through Report-Pipeline-* cases, not through universal free-chat cases.
入库状态:可以入库为“报告生产型参考项目”的当前正式评测,但需要在横向对比中单独标注:
- data pipeline:strong;
- report rendering:strong;
- source degradation transparency:medium;
- LLM narrative generation:weak / unstable in this run;
- conversational cognition:not primary scope.