NATS Bloodstream Events: Inter-Bee Choreography

Abstraction Level: Level 4 (Ecosystem) — Multi-bee event flows

Purpose: Document the NATS event topics, payload structures, and inter-bee communication patterns that form the Hive's "Bloodstream" — the circulatory system distributing signals across autonomous services.


What is the NATS Bloodstream?

From FOUNDATION.md:14, the Nucleus (core) "Communicates via NATS 'Bloodstream'".

NATS (Neural Autonomic Transport System, via https://nats.io) is the pub/sub message bus that enables:

  • Asynchronous choreography — Bees coordinate without direct coupling

  • Event sourcing — Chronicles of decisions for audit trails

  • Self-healing signals — Injury reports trigger automated remediation

  • Distributed observability — Heartbeats, metrics, and state broadcasts

Unlike synchronous gRPC calls (request/response), NATS events are fire-and-forget broadcasts that allow multiple subscribers to react independently.


Event Topic Namespace

All Hive events follow the hierarchical naming convention:

aura.<domain>.<category>[.<subcategory>]

Domains:

  • aura.hive.* — Internal governance, audits, operational state

  • aura.negotiation.* — Economic negotiation events (deprecated, use aura.hive.events.*)

  • aura.core.* — Brain-specific events (failures, diagnostics)

Categories:

  • audit — Architectural compliance reports

  • injury — Self-healing triggers

  • events.* — Domain events (negotiation outcomes, user actions)

  • heartbeat — Service liveness signals

  • brain_dead — Critical LLM failures


Event Flow Architecture

spinner

Key Insight: Publishers (Bees) don't know who subscribes. Subscribers don't know who published. This decoupling enables the Hive to evolve without breaking contracts.


Event Catalog

1. aura.hive.audit — Architectural Audits

Publisher: agents/bee-keeper/src/hive/connector.py:169

Purpose: Chronicle architectural violations detected by bee-keeper's LLM audit.

Payload Schema:

Subscribers:

  • chronicler (future) — Updates HIVE_STATE.md with audit findings

  • Prometheus (future) — Metrics for violation trends

Example Usage:


2. aura.hive.injury — Self-Healing Triggers

Publisher: agents/bee-keeper/src/hive/connector.py:178

Purpose: Signal critical issues requiring automated remediation (e.g., unauthorized directories, broken tests).

Payload Schema:

Subscribers:

  • auto-healer (future agent) — Automatically fixes known issues (delete dir, format code)

  • PagerDuty (future) — Alerts human operators for auto_heal: false injuries

Example Usage:


3. aura.hive.events.* — Domain Events

Publisher: core/src/hive/generator.py:42

Purpose: Broadcast negotiation outcomes for audit trails, analytics, and downstream reactions.

Topic Pattern: aura.hive.events.{event_type}

Examples:

  • aura.hive.events.negotiation_accepted

  • aura.hive.events.negotiation_countered

  • aura.hive.events.negotiation_rejected

  • aura.hive.events.user_registered

Payload Schema:

Subscribers:

  • Analytics service (future) — Builds negotiation success metrics

  • Billing service (future) — Triggers payment workflows

  • Audit log — Compliance records

Example Usage:


4. aura.hive.heartbeat — Service Liveness

Publisher: core/src/hive/generator.py:51

Purpose: Periodic liveness signals from each Bee to prove it's operational.

Payload Schema:

Subscribers:

  • Prometheus — Scrapes heartbeats for uptime metrics

  • Health monitor (future) — Alerts on missing heartbeats (service down)

Heartbeat Interval: Default 60 seconds (configurable via HEARTBEAT_INTERVAL_SEC)

Example Usage:


5. aura.core.brain_dead — Critical LLM Failures

Publisher: core/src/hive/transformer.py:71

Purpose: Signal catastrophic LLM failures (API down, timeout, hallucination detection) that require human intervention.

Payload Schema:

Subscribers:

  • PagerDuty — Immediate alert to on-call engineer

  • Auto-scaler (future) — Spin up backup LLM providers

  • HIVE_STATE.md updater — Chronicle brain failures

Example Usage:


Event Sequence: Complete Negotiation Flow

spinner

Flow:

  1. External agent negotiates via HTTP → gRPC

  2. core processes, emits negotiation_accepted event

  3. core emits heartbeat every 60s

  4. Prometheus collects heartbeat for uptime tracking

  5. bee-keeper (separate flow) audits code, emits audit + injury events

  6. Prometheus alerts on injuries


Publisher Implementation Patterns

Pattern 1: Fire-and-Forget (No Error Handling)

Use Case: Non-critical telemetry (heartbeats, low-priority events)

Behavior: If NATS is down, event is silently dropped. Service continues.


Pattern 2: Graceful Degradation (Catch Connection Errors)

Use Case: Important events that shouldn't crash the service

Behavior: Log warning, continue processing. Event is lost but service survives.


Pattern 3: Retry with Backoff (Critical Events)

Use Case: Audit events that MUST be delivered

Behavior: Connection timeout → log error. Could be extended with retry queue.


Subscriber Implementation Patterns

Pattern 1: Simple Callback


Pattern 2: Queue Groups (Load Balancing)

Use Case: Multiple instances of a subscriber should share the workload

Behavior: If 3 instances of analytics-workers subscribe, NATS round-robins events to them.


Pattern 3: Durable Subscriptions (At-Least-Once Delivery)

Use Case: Events must not be lost, even if subscriber is temporarily down

Behavior: NATS JetStream persists events; subscriber replays missed events on reconnect.


NATS Configuration

Environment Variables:

Docker Compose:

Kubernetes:


Monitoring NATS Events

CLI Subscription (Debugging)


NATS Monitoring Dashboard

Access NATS web UI at http://localhost:8222 to view:

  • Active connections (publishers/subscribers)

  • Message rates (msgs/sec)

  • Topic statistics (message counts)


Prometheus Metrics (Future)

Potential Metrics:


Event Versioning Strategy

Problem: Event schemas evolve. How do we avoid breaking subscribers?

Solution: Semantic versioning in topic names (future)

Migration Path:

  1. Introduce v2 topic

  2. Publishers emit to both v1 and v2 during transition period

  3. Subscribers migrate at their own pace

  4. Deprecate v1 after 6 months


Security Considerations

1. Topic Authorization (Future)

Problem: Prevent rogue services from publishing to aura.hive.audit

Solution: NATS ACLs (Access Control Lists)


2. Payload Encryption (Future)

Use Case: Sensitive data (e.g., user PII) in events

Solution: Encrypt payload before publishing


Relation to Canonical Architecture

This event system implements the "Bloodstream" communication pattern defined in:

  • docs/FOUNDATION.md line 14 (NATS Bloodstream)

  • packages/aura-core/src/aura_core/dna.py lines 174-177 (Generator protocol)

  • core/src/hive/generator.py (G nucleotide implementation)

  • agents/bee-keeper/src/hive/connector.py (Audit event emission)

NATS Topics Used:

  • aura.hive.audit — bee-keeper audits

  • aura.hive.injury — bee-keeper self-healing triggers

  • aura.hive.events.* — core domain events

  • aura.hive.heartbeat — core liveness

  • aura.core.brain_dead — core LLM failures


End of NATS Bloodstream Events Documentation

For the glory of the Hive. 🐝

Последнее обновление