跳到主要内容

FinRobot Reference Experience Evaluation

状态:Current Official Evaluation 更新日期:2026-05-11 角色:FinClaw Program Controller

1. Evaluation Scope

本轮评测按更新后的 Case Library 中 Report-Pipeline-* 范围执行,目标是验证 FinRobot 作为“报告生产型金融参考项目”的真实本地体验,而不是用通用聊天类 case 强行套用。

本轮不保留旧批次历史记录;旧本地输出和旧日志已清理,只保留当前 run。

2. Case Library Placement Decision

补充完善后的 Case Library 已上移到生态级评测区下的 FinClaw 命名空间:

/Users/mlabs/Programs/Labs-FinTecAI/evaluation/finclaw/
├── case-library.md
├── case-schema.md
├── cases/
└── runs/

该位置表达两层边界:

  1. evaluation/ 是 FinTec AI Ecosystem 的评测资产区;
  2. evaluation/finclaw/ 表示当前 Case Library 仍只覆盖 FinClaw 体系,不宣称已经覆盖 Data Horizon、AI Trading Matrix、Reinforcement Learning Engine 或 Financial Expert Foundation Model。

当前不建议放入 /Users/mlabs/Programs/fin-claw 工程仓库,因为它是 FinClaw 体系评测与验收知识库资产,不是 FinClaw MVP 工程代码的一部分。

当前也暂不独立成新仓库。独立仓库的触发条件是:

  1. 已形成 5 个以上稳定结构化 case 文件;
  2. 已形成 2-3 个参考项目的结构化 run 结果;
  3. 已有轻量 runner 或结果校验器可以消费 cases/*.yaml
  4. 团队确认字段结构、评分维度和报告格式已经稳定;
  5. 至少一个 FinClaw 之外的独立生态项目完成适配,证明存在跨项目通用层。

未来如需独立,建议使用中性名称,例如 fintec-ai-evaluation-cases,避免绑定单一参考项目或单一评测来源。

3. Runtime Entry

参考项目本地路径:

/Users/mlabs/Programs/FinRobot

当前输出目录:

/Users/mlabs/Programs/FinRobot/finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/

当前日志目录:

/Users/mlabs/Programs/Labs-FinTecAI/packets/sync/finclaw-reference-experience-2026-05-09/logs/finrobot-report-pipeline-retest/

4. Model And Data Telemetry

ItemValue
Data sourceFMP stable API, configured locally in ignored config
Main tickerNVDA
Peer tickersAMD, INTC
LLM providerMoonshot compatible endpoint
LLM modelkimi-k2.6
Text-generation runMulti-section generation was attempted and manually stopped after repeated /chat/completions retries.
Token usageNot available from local FinRobot logs.
Core no-text analysis elapsedCompleted in about 23 seconds.
HTML generation elapsedCompleted in about 11 seconds.

5. Execution Summary

5.1 Cleanup

已清理旧 FinRobot 测试产物:

/Users/mlabs/Programs/FinRobot/finrobot_equity/core/output/*
/Users/mlabs/Programs/Labs-FinTecAI/packets/sync/finclaw-reference-experience-2026-05-09/logs/finrobot-*

清理后只保留本轮目录:

FINROBOT_REPORT_PIPELINE_RETEST
finrobot-report-pipeline-retest

5.2 Analysis Generation

命令入口:

venv/bin/python finrobot_equity/core/src/generate_financial_analysis.py \
--company-ticker NVDA \
--company-name "NVIDIA Corporation" \
--config-file finrobot_equity/core/config/config.ini \
--peer-tickers AMD INTC \
--enable-sensitivity-analysis \
--enable-catalyst-analysis \
--enable-enhanced-news \
--news-days-back 7 \
--news-limit 20 \
--output-dir finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis

结果:

EvidenceResult
Financial statements5 年 income statement / balance sheet / cash flow 成功获取。
Available years2026, 2025, 2024, 2023, 2022。
Main analysis CSV成功生成 financial_metrics_and_forecasts.csv
Peer EBITDA成功生成 peer_ebitda_comparison.csv
Peer EV/EBITDA未获取有效数据。
Sensitivity analysis成功生成 sensitivity_analysis.jsonsensitivity_summary.md
Enhanced newsFMP news endpoint 返回 402,形成数据源降级证据。
Retail sentimentAdanos Reddit / X.com / Polymarket 均返回 401。

5.3 Report Generation

命令入口:

venv/bin/python finrobot_equity/core/src/create_equity_report.py \
--company-ticker NVDA \
--company-name "NVIDIA Corporation" \
--config-file finrobot_equity/core/config/config.ini \
--analysis-csv finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/financial_metrics_and_forecasts.csv \
--ratios-csv finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/ratios_raw_data.csv \
--peer-ebitda-csv finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/peer_ebitda_comparison.csv \
--sensitivity-analysis-file finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/sensitivity_analysis.json \
--enhanced-news-file finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/enhanced_news.json \
--retail-sentiment-file finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/analysis/retail_sentiment.json \
--enable-enhanced-charts \
--output-dir finrobot_equity/core/output/FINROBOT_REPORT_PIPELINE_RETEST/report

本轮报告生成使用了 test harness fallback 文本输入;原因是项目自身多段 LLM 文本生成不稳定,不能让该路径阻塞核心报告链路体验。该 fallback 是评测辅助输入,不应计为 FinRobot 原生文本生成能力。

生成产物:

ArtifactStatus
Professional_Equity_Report_NVDA.htmlGenerated, 131 KB。
Combined_Equity_Report_NVDA.htmlGenerated, 122 KB。
Equity_Report_Page1_NVDA.html ~ Equity_Report_Page5_NVDA.htmlGenerated。
Revenue / EBITDA chartGenerated。
EPS × PE chartGenerated, with non-numeric EPS / PE warning。
Revenue YoY chartGenerated。
EBITDA margin chartGenerated。
Financial radar chartGenerated。

HTML integrity check:

FileResult
Professional HTMLContains HTML structure, NVDA content, image tags, fallback note, and source degradation note.
Combined HTMLContains HTML structure, NVDA content, image tags, fallback note, and source degradation note.

6. Case Results

CaseConcrete InstanceEvidenceEvaluationRate
Report-Pipeline-01NVDA single-company reportFinancial CSV, raw statements, Professional / Combined / Page 1-5 HTML all generated.Core report pipeline works after stable FMP data path is available.Pass
Report-Pipeline-04Data-source degradationFMP news 402, target price / rating 403, Adanos 401, peer EV/EBITDA unavailable.Degradation is observable in logs; report still renders, but user-facing explanation is uneven and partly relies on fallback text.Partial
Report-Pipeline-05Evidence / timestamp / source auditRaw statements and analysis summary exist; logs include source-status evidence.Data files are auditable, but source provenance is not cleanly surfaced inside the final report.Partial
Report-Pipeline-08Human readabilityProfessional and Combined HTML generated with charts and sections.Report is readable and navigable, but some sections reflect fallback text and missing data.Partial
Report-Pipeline-09Team handoff extractionThis document records run path, commands, outputs, limitations, and case ratings.Sufficient for team reuse and cross-project comparison.Pass

7. Current Evaluation

FinRobot is materially different from chat-agent style references. Its strongest value is structured equity report production: data ingestion, forecast table generation, peer EBITDA comparison, sensitivity analysis, chart generation, and HTML report rendering.

It is weaker as a conversational cognition surface. The key experience risks are:

  1. external data entitlements strongly shape output completeness;
  2. report generation can continue despite missing data, but report-level provenance is not always explicit enough;
  3. multi-section LLM text generation is not stable enough in this local run;
  4. fallback text can make the report look complete unless reviewers inspect logs and source-status artifacts;
  5. some data schema assumptions remain brittle, especially ratio columns and peer EV/EBITDA.

8. Recommendation

FinRobot should stay in the reference set, but it should be evaluated through Report-Pipeline-* cases, not through universal free-chat cases.

入库状态:可以入库为“报告生产型参考项目”的当前正式评测,但需要在横向对比中单独标注:

  • data pipeline:strong;
  • report rendering:strong;
  • source degradation transparency:medium;
  • LLM narrative generation:weak / unstable in this run;
  • conversational cognition:not primary scope.