跳到主要内容

FinClaw V1 Agent Orchestration Design

状态:Accepted Initial Design / P0 Design Output 日期:2026-05-14 项目:FinClaw 文档级别:项目级设计支撑 上游文档:v1-prd.mdproduct-object-and-advisor-design.mdterminology-and-object-naming.mdv1-user-journey-and-interaction-flow.mdv1-product-object-and-schema-design.mdv1-evaluation-initial-plan.md

本文定义 FinClaw V1 的 Agent Orchestration 初稿。它不定义工程实现,不恢复 Action Suggestion,不创建交易执行、生产 channel、真实提醒或账户能力。

1. Orchestration Goal

FinClaw V1 的编排目标是把用户自然语言问题转成对象化金融认知,而不是展示多 Agent 表演。

编排层负责:

  • 识别用户任务;
  • 判断是否需要澄清或低置信输出;
  • 选择必要的 Financial Cognition Advisor;
  • 调用 FinSkills;
  • 生成 Snapshot、Thread 或 Pre-Execution Checkpoint;
  • 写入 EvidenceItem 和 DataQualityNote;
  • 保留分歧、风险和执行边界;
  • 把反馈和失败样本交给 evaluation / trial ops。

2. Layers

LayerResponsibilityNot responsible for
Task Router识别任务类型、对象、风险和澄清需求直接生成最终金融结论
Context Builder装载用户问题、线程、快照、证据和偏好收集账户、私钥或执行权限
Advisor Planner选择必要顾问视角为展示而调用所有顾问
Skill Runtime执行原子金融认知能力决定产品边界
Evidence / Quality Checker标注来源、缺口、冲突和低置信伪造来源或隐藏缺证据
Object Writer写入 Snapshot、Thread、Checkpoint写入订单、信号或执行字段
Boundary Guard检查行动邻近、敏感信息和 forbidden fields用免责声明替代结构边界
Feedback Adapter记录试运营反馈和失败样本代表生产客服或交易支持

3. Task Routing

User intentRouteRequired output
看资产 / 主题 / 新闻Snapshot routeMarketCognitionSnapshot
持续看 / 保存 / 后面刷新Thread proposal routeSnapshot + Thread proposal
和上次比变化Thread refresh routeRefresh diff + updated Thread
挑风险 / 反方Risk challenge routeCounter-thesis + invalidators
买卖 / 加减仓 / 开多开空Pre-execution routePreExecutionCheckpoint
证据够不够Evidence audit routeClaim inventory + EvidenceItem / DataQualityNote
输入凭证 / 私钥Sensitive rejection routeRejection + masked handling record

Router must allow a task to branch. Example: a colloquial BTC question may produce Snapshot plus Thread proposal; an action-adjacent prompt must branch to Pre-Execution Checkpoint.

4. Advisor Roles

AdvisorUsed whenWrites to
Event Interpretation AdvisorNews, policy, announcement, event chainSnapshot event summary, affected objects, watch questions
Asset Research AdvisorAsset, project, protocol or theme questionSnapshot main thesis, market context, unknowns
Market / Macro AdvisorMarket regime or cross-market context mattersSnapshot market context, refresh conditions
Risk AdvisorUser needs risk map or action-adjacent cautionRisk constraints, invalidators, checkpoint
Counter-Thesis AdvisorUser asks what could be wrong or system detects one-sided thesisCounter thesis, watch questions, invalidators
Pre-Execution AdvisorUser uses buy / sell / add / reduce / long / short languagePreExecutionCheckpoint only
Source Quality AdvisorSources are missing, stale, conflicting or user-suppliedEvidenceItem, DataQualityNote

No advisor may output real orders, position sizes, leverage, account operations, wallet actions, automatic alerts or production channel calls.

5. FinSkills

Initial FinSkills are capability labels, not implementation commitments:

SkillPurpose
asset-context-summarizerBuild bounded context around an asset, project or theme
event-impact-readerSeparate event facts from impact inference
narrative-mapperMap main and counter narratives
risk-controversy-mapperIdentify risks, disputes and invalidators
watch-question-generatorProduce watch questions and refresh conditions
strategy-hypothesis-framerConvert action-adjacent questions into conditional cognition
source-quality-checkerLabel source and data quality limitations
sensitive-input-classifierClassify and reject sensitive credentials

Skills must return structured output that can be written into product objects. Free-form skill output is not sufficient for V1 formal outputs.

6. Object Write Rules

TargetWrite allowedWrite forbidden
Snapshotfacts, inferences, risks, unknowns, watch questions, evidence, qualityorder instructions, target price as instruction
Threadlinked snapshots, thesis, counter thesis, refresh conditions, invalidators, cognition changesauto alerts, trade state, portfolio management
Checkpointconditional hypothesis, preconditions, risk constraints, invalidators, non-execution boundaryorders, account fields, leverage, private keys
EvidenceItemsource state and limitationsfabricated provenance
DataQualityNotefreshness, conflicts, permission limits, model inferenceconfidence labels that imply tradable signal

Object Writer must reject writes that contain forbidden execution fields.

7. Disagreement Handling

Advisor disagreement is expected. The system must preserve useful disagreement.

Disagreement sourceRequired handling
Fact conflictList conflicting sources and evidence state
Inference differenceKeep both paths and assumptions
Time horizon differenceSeparate short-term and medium-term cognition
Risk preference differenceMark as user-context dependent
Data quality differenceExplain source limitations
Execution boundary pressureConvert to Pre-Execution Checkpoint

The orchestrator should produce a main view, counter view and watch questions rather than forcing false certainty.

8. Boundary Guard

Boundary Guard runs before final output and before object write.

It blocks or rewrites:

  • direct buy / sell / hold / short / long commands;
  • target price as instruction;
  • position size or leverage;
  • exchange / broker operation;
  • wallet / private key / seed phrase handling;
  • automatic execution or production alert claims;
  • unsupported high-confidence claims;
  • reference experience treated as product truth.

If action-adjacent content remains, output must become Pre-Execution Checkpoint.

9. Sensitive Input Handling

Sensitive input classifier must label:

  • ordinary preference;
  • financial context;
  • sensitive personal / financial information;
  • credential or permission.

Credential or permission input triggers:

  • masking;
  • rejection;
  • no save;
  • no train;
  • no echo;
  • optional human review flag.

Financial context such as holdings or cost basis can be used for the current cognition task but cannot be saved without ProfileConsent.

10. Evaluation Hooks

Each orchestration run should record:

  • route selected;
  • advisors used and why;
  • skills used;
  • object write targets;
  • evidence and data quality notes;
  • boundary guard results;
  • sensitive input handling;
  • missing fields;
  • case mapping;
  • reviewer notes.

Evaluation should fail if the system produces the right prose but cannot map the output to Snapshot, Thread or Checkpoint fields.

10A. Multi-Advisor Coordination

A single user task may require 2–4 advisors. The orchestrator coordinates them as follows.

10A.1 Coordination Modes

ModeWhenBehavior
SequentialAdvisor B depends on Advisor A's output (e.g., Risk Advisor needs Asset Research thesis first)Run A, feed structured output to B, merge into Snapshot
Parallel-then-mergeAdvisors are independent on same input (e.g., Event + Market/Macro on same news)Run in parallel, Object Writer merges non-conflicting fields; conflicts enter Disagreement Handling §7
Challenge-after-draftCounter-Thesis or Risk Advisor challenges an existing draft thesisRun primary advisors, produce draft Snapshot, then pass draft to challenger; challenger writes counter_thesis, invalidators, watch_questions

10A.2 Merge Rules

  1. Each advisor writes to its declared thread_write_target fields (§6). No advisor overwrites another advisor's target unless it is the designated challenge pass.
  2. When two advisors write the same field (e.g., both add watch_questions), Object Writer concatenates and deduplicates.
  3. When advisors produce contradictory main_thesis or supporting_reasons, the orchestrator must invoke Disagreement Handling (§7) and preserve both paths in the output.
  4. The final Snapshot or Thread update must trace which advisor contributed which claim via advisor_outputs references.

10A.3 Advisor Budget

V1 limits each task to at most 5 advisor invocations per turn to bound latency and cost. If more than 5 advisors are plausible, Advisor Planner must rank by relevance and defer lower-priority advisors to follow-up turns.

10B. Degradation and Fallback Paths

When a component fails, the orchestrator must degrade gracefully rather than produce silent errors or fabricated outputs.

FailureDetectionFallbackUser-visible effect
FinSkill timeout or errorSkill Runtime returns error or exceeds timeoutRetry once; if still fails, mark affected claims as evidence_status: unavailableDataQualityNote with quality_state: unavailable and impact explanation
Source unavailableSource Quality Advisor or tool returns no dataProceed with available sources; add DataQualityNote unavailable for missing sourceSnapshot shows source-limited state; claims dependent on missing source marked low-confidence
Model low-quality outputEvidence/Quality Checker detects unsupported certainty, hallucination markers, or empty structured fieldsDemote to low_confidence state; if critical fields are empty, return needs_clarificationUI shows low-confidence badge; user offered to provide context or accept bounded output
All advisors failNo advisor produces usable structured outputReturn a minimal acknowledgment explaining the failure, do not fabricate a SnapshotUser sees failure state with retry option
Boundary Guard rejects outputForbidden execution fields detected in advisor outputStrip forbidden fields, re-run through Boundary Guard; if persistent, block output and flag for human reviewOutput withheld; user sees boundary enforcement message
Context window exceededContext Builder detects input exceeds model limitSummarize older snapshots, trim low-priority evidence, preserve most recent thesis and user questionOutput may miss historical nuance; DataQualityNote records context trimming

10B.1 Degradation Principles

  1. Never fabricate sources or evidence to fill gaps left by failures.
  2. Every degradation must produce a visible DataQualityNote or UI state change.
  3. Degraded outputs must still pass Boundary Guard before delivery.
  4. If degradation affects a Pre-Execution Checkpoint, the non_execution_statement must note the limitation.

10C. Context Budget Management

V1 operates within finite model context windows. The orchestrator must manage context deliberately.

10C.1 Context Priority Order

When context must be trimmed, preserve in this priority (highest first):

  1. Current user question and task type;
  2. Active thread current_thesis, counter_thesis, invalidators, watch_questions;
  3. Most recent snapshot (full);
  4. Evidence items for current claims;
  5. User context and profile consent state;
  6. Earlier snapshots (summarized, not full);
  7. Advisor output history (summarized);
  8. Reference and background material (summarized or dropped).

10C.2 Budget Rules

  • Context Builder must estimate token usage before advisor invocation. If estimated usage exceeds 80% of the context window, trigger summarization of items at priority 6–8.
  • Each advisor receives a scoped context subset relevant to its role, not the full context. Advisor Planner determines the subset.
  • Refresh tasks always load the previous snapshot in full; older history loads as summaries.
  • Sensitive input handling records are never summarized or dropped; they are compact by design.

11. Engineering Handoff Boundary

Engineering implementation may choose concrete classes, queues, prompts or runtime structure later. This design only fixes:

  • product-level responsibilities;
  • object write targets;
  • boundary guard behavior;
  • evaluation hooks;
  • forbidden execution fields.

Engineering must not infer that V1 already has live data, account access, production alerting, training pipeline or trial evidence.

12. Acceptance

本文可作为 Engineering-start draft input together with Journey, Schema, UI / UX and Evaluation Initial Plan. It does not by itself satisfy Engineering-start gate.

13. Open Items

  • Prompt / tool contract details remain engineering design work.
  • Evaluation Review / Acceptance Plan must test boundary guard and object-write behavior.
  • Trial Ops must define how failures are reported, reviewed and stopped.