Skip to content

AI-Powered Operations Agent + Dashboard Chatbot #813

Description

@agamatlab

@yeasy @YoungHypo ,

Opening this to track my mentorship project and lay out how I want to approach
it. I've got Cello running locally with make local and I've been through the
existing POCs to see how the API behaves. Would love a review on the plan below
before I start creating PRs.

What I want to build

I want an operator to be able to manage their Fabric deployment in plain language
instead of clicking through the Dashboard. Two parts:

  1. A Python agent backend that takes a natural-language request, calls
    Cello's REST API through an LLM with tool-calling, and returns a structured
    answer. I also want it to summarize Fabric fault logs for common failures.
  2. A chat panel in the Dashboard (React) with streaming replies and
    multi-turn conversation.

I want to start read-only (nodes, channels, chaincodes, orgs, health) and add
write actions later, once RBAC is in.

How I'm thinking of building it

Operator types in the Dashboard chat panel (React / Umi / AntD)
  -> POST /api/v1/agent/chat            (Django + DRF, JWT-auth'd, streaming)
    -> LLM tool-calling loop
      -> read-only tools over the existing REST / service layer
        -> nodes / channels / chaincodes / organizations
  • I'd add a new Django app src/api-engine/agent/, next to node/, channel/,
    etc. I want the tools to reuse the same querysets and serializers the REST
    viewsets already use, so there's one source of truth.
  • I'm planning to make the chat endpoint a normal JWT-protected DRF view, so the
    agent acts as the logged-in user and inherits their permissions. That way I can
    leave per-org API keys for the write phase.
  • I'll keep the LLM provider configurable, defaulting to Claude, no keys in the
    repo.

PRs I want to push, in order

  1. Backend read-only skeleton - the agent app: read-only tools (list
    nodes / channels / chaincodes / orgs + a deployment summary), the
    POST /api/v1/agent/chat endpoint, and unit tests with the LLM mocked.
  2. Streaming - switch the chat endpoint to SSE so replies stream in.
  3. Fault-log summarization - a tool that pulls a failing node's logs and
    returns an LLM summary of the likely cause.
  4. Dashboard chat panel - the React component that consumes the stream,
    multi-turn, with en-US and zh-CN strings.
  5. Docs - how to configure the provider, run the agent, and add a tool.

Before I get started, having answers to these could really help me think about the PRs more clearly:

  • Does this match what you had in mind for the project, or would you shape it
    differently?
  • Any preference on the LLM provider, or is keeping it swappable fine?
  • Are you okay with the JWT-as-logged-in-user approach for now, and handling API
    keys in the write phase?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions