Skip to content
Home

Architecture

Status: Draft Owner: Ben Last Updated: 2026-03-07

Overview

This document defines the technical architecture of LifeDB — the system structure, technology choices, module boundaries, and data flow patterns that implement the product specifications.

Guiding principles:

  • Self-hosted first. The architecture optimizes for multi-container deployment on a single node (compose-based). Cloud-hosted is a deployment topology change, not a different architecture.
  • Clean internal boundaries. Modules communicate through Go interfaces. Boundaries are designed for eventual extraction to services but implemented as a compiled monolith.
  • Two performance profiles. Ingestion is throughput-oriented. The consumer API is latency-oriented. The architecture explicitly separates these paths.
  • Interface over implementation. Storage, search, graph traversal, and job processing are defined as interfaces. Initial implementations use simple infrastructure (PostgreSQL, in-process queues). Implementations are swappable without changing module contracts.

Technology Stack

Core Runtime: Go

The primary language for all server-side components: API serving, ingestion pipeline, entity resolution, data processing, and connector management.

Rationale:

  • Compiles to a single binary — clean container deployment, low resource footprint for self-hosted
  • Native concurrency (goroutines) handles both concurrent API requests and parallel pipeline processing without framework overhead
  • 2-5x faster than Node.js, 10-50x faster than Python for CPU-bound data manipulation (normalization, scoring, transformation)
  • Strong standard library for HTTP clients (connectors) and servers (API)

Intelligence Layer: Python

ML and NLP capabilities (semantic search, entity extraction, summarization) run as Python components when needed.

Rationale:

  • Dominant ecosystem for ML/NLP tooling
  • Not needed initially — the intelligence module starts thin
  • Communicates with the Go core via defined interfaces (initially in-process via subprocess/FFI, extractable to a service)

Database: PostgreSQL (Initial)

PostgreSQL is the starting database engine, accessed through three distinct interface boundaries (relational, search, graph). It is a starting point, not a permanent commitment — the interface boundaries exist specifically so that dedicated engines can replace PostgreSQL for search (Elasticsearch, Meilisearch) and graph traversal (Neo4j, Dgraph) as the system’s needs outgrow what PostgreSQL handles well.

Rationale for starting here:

  • Handles relational queries, full-text search (tsvector), and graph-like traversal (recursive CTEs) adequately for initial scale
  • Single infrastructure dependency — simplifies self-hosted deployment and operations
  • Extensions (pgvector) provide a path to semantic search without additional infrastructure
  • Well-understood operationally — backups, monitoring, tuning are well-documented

Expected evolution: As data volume and query complexity grow, dedicated engines will likely replace PostgreSQL behind the search and graph interfaces. The relational interface may remain on PostgreSQL long-term. Each replacement is an implementation swap behind a stable interface — no consumer changes required.

API Transport: GraphQL (Primary)

GraphQL as the primary API transport, implemented with gqlgen (schema-first code generation).

Rationale:

  • Natural fit for the entity graph — consumers control traversal depth and shape
  • Composable queries match the product spec’s filtering and drill-down patterns
  • AI agents request exactly what they need — no over-fetching, no endpoint proliferation
  • Schema serves as a typed contract; TypeScript clients auto-generated from it
  • Subscriptions provide a path to real-time when needed

REST endpoints for operations where GraphQL adds no value: health checks, webhooks, bulk import upload, connector callbacks.

Module Architecture

Dependency Structure

┌─────────┐
│ API │
└────┬────┘
┌────▼────┐
┌─────┤ Domain ├─────┐
│ └────┬────┘ │
│ │ │
┌─────▼──┐ ┌───▼───┐ ┌──▼──────┐
│Ingest- │ │Resolu-│ │Intelli- │
│ ion │ │ tion │ │ gence │
└────────┘ └───────┘ └─────────┘
│ │ │
└──────────┼──────────┘
┌──────────▼──────────┐
│ Store │
│ ┌───────────────┐ │
│ │ WriteStore │ │
│ ├───────────────┤ │
│ │ ReadStore │ │
│ │ ├─Relational │ │
│ │ ├─Search │ │
│ │ └─Graph │ │
│ ├───────────────┤ │
│ │ Projector │ │
│ └───────────────┘ │
└─────────────────────┘

Domain is the center. All modules depend on Domain types and interfaces. Nothing depends on API. Store implements Domain-defined interfaces. This ensures business logic is independent of transport and storage.

Modules

API

Request handling, authentication, response formatting.

  • GraphQL schema definition and resolver layer (gqlgen)
  • REST endpoints for non-GraphQL operations
  • Authentication middleware — resolves requests to AuthContext
  • Pagination, error formatting, rate limiting
  • No business logic — delegates entirely to Domain

Domain

Core entity types, business rules, and interface definitions.

  • Canonical entity types (Person, PlatformIdentity, Conversation, Message, Event, Document, Attachment)
  • Provenance model types and validation
  • Interface definitions for all store operations (WriteStore, ReadStore, JobQueue)
  • Business rules (resolution policies, permission enforcement, provenance precedence)
  • AuthContext definition and permission checking

Ingestion

Connector management, data normalization, sync state.

  • Connector lifecycle management (registration, health monitoring, state tracking)
  • Platform data normalization — maps platform-specific formats to Domain types
  • Sync position tracking for incremental updates
  • Batch processing coordination for bulk imports
  • Writes through WriteStore interface with platform_reported provenance

Connectors are conceptually outside the monolith — they consume the write API. Initially they run in-process but the boundary is designed for extraction.

Resolution

Entity resolution engine.

  • Factual matching (phone, email, platform-reported links) — deterministic, immediate
  • Evidence accumulation — non-deterministic signals contribute to confidence scores
  • Auto-resolution — applies resolution when evidence crosses configurable threshold
  • Candidate management — maintains the pool of potential matches with evolving confidence
  • Runs asynchronously, triggered by new data via the job queue

Intelligence

Pre-computed artifacts and ML integration.

  • Initially thin — coordinates pre-computation of summaries, entity extraction
  • Interface boundary for Python ML components
  • Artifacts stored as data fields on entities via WriteStore
  • Runs asynchronously via the job queue

Store

Data persistence, implemented behind three interface boundaries.

  • WriteStore — write-optimized operations: entity creation, upserts, batch writes, relationship management
  • ReadStore — read-optimized operations, with three sub-interfaces:
    • Relational — entity queries, filtering, temporal ranges, pagination
    • Search — content search, relevance scoring, ranking
    • Graph — relationship traversal, entity resolution queries, connection patterns
  • Projector — async process that transforms WriteStore data into ReadStore representations

All three sub-interfaces are initially implemented against PostgreSQL. Each can be independently swapped to a dedicated engine (Elasticsearch for search, Neo4j for graph) without changing consumers.

Data Flow

Write Path (Ingestion)

Connector → API (write) → Domain (validate, provenance) → WriteStore → JobQueue
┌───────────────┤
▼ ▼
Projector Resolution
│ │
▼ ▼
ReadStore WriteStore
(new links)
  1. Data arrives via the write API (GraphQL mutation or REST)
  2. Domain validates, applies provenance, enforces permissions
  3. WriteStore persists durably — write is acknowledged
  4. Jobs enqueued: projection (update read-optimized views), resolution (evaluate new evidence)
  5. Projector updates ReadStore representations asynchronously
  6. Resolution evaluates new data against existing identities, may create new links

Data is queryable via ReadStore after projection completes (seconds). Resolution and intelligence improve results over time (minutes to hours).

Read Path (Consumer API)

Consumer → API (query) → Domain (AuthContext, permissions) → ReadStore → Response
  1. Request arrives, middleware resolves AuthContext
  2. Domain applies permission scoping (data scope filters injected into query)
  3. ReadStore serves from read-optimized representations
  4. Response shaped by GraphQL selection set (consumer controls depth)

The read path never touches WriteStore. Read and write paths are independent, enabling separate performance optimization.

Async Processing

Job Queue Interface

Background work (projection, resolution, intelligence) executes through a job queue interface.

type JobQueue interface {
Enqueue(ctx context.Context, job Job) error
Subscribe(ctx context.Context, jobType string, handler JobHandler) error
}

Initial implementation: In-process — goroutines consuming from Go channels. Simple, no infrastructure dependencies, adequate for single-node deployment.

Extraction path: Swap to an external queue (Redis, NATS) when independent scaling of workers is needed. The interface stays the same; only the implementation changes.

Job Types

  • Projection — update read-optimized views after writes. Latency-sensitive (affects time-to-queryable).
  • Resolution — evaluate new data against identity graph. Can be batched.
  • Intelligence — compute summaries, extract entities. Lowest priority, most resource-intensive.

Priority ordering ensures projection runs first (data becomes queryable quickly), resolution second, intelligence third.

Observability

Tracing, metrics, and profiling infrastructure for understanding system behavior in development and production.

Tracing and Metrics: OpenTelemetry

The system uses go.opentelemetry.io/otel for distributed tracing and metrics. OpenTelemetry is vendor-neutral — traces and metrics can be exported to any compatible backend without code changes.

TracerProvider and MeterProvider are configured at application startup and passed through the dependency tree. Modules receive a tracer/meter from the provider rather than importing a global. This keeps the observability dependency explicit and testable.

Exporter strategy is environment-driven, with three modes:

  • noop (default) — no exporter configured. Zero runtime overhead. Used when observability output is not needed.
  • stdout — exports traces and metrics to standard output as structured JSON. Used during local development for quick visibility into spans and counters without external infrastructure.
  • otlp — exports via OpenTelemetry Protocol (gRPC or HTTP) to a collector or backend (Jaeger, Grafana, Datadog, etc.). Used in production or when a collector is available.

The exporter is selected by environment variable. The application code is identical across all three modes — only the provider configuration changes.

Context propagation — span parent-child relationships flow via Go’s context.Context. Each function receives a context carrying the active span; creating a new span from that context automatically establishes the parent relationship. When goroutines fan out, each must derive its span from the parent context to preserve the trace tree. This is the standard OTel-Go mechanism and requires no custom propagation code — but every instrumented function must accept and pass context.Context.

Span granularity — the ingestion pipeline processes batches that may contain thousands of entities. Creating a child span per entity would produce trace explosion (e.g., 10k messages = 10k+ spans per trace). Instead: create child spans for phase-level operations (identity phase, conversation phase, message phase) and use span events for per-entity iterations within a phase. Phase spans capture aggregate duration; span events capture individual entity outcomes without the overhead of full spans.

Error recording — when an operation fails, the enclosing span records the error as a span event and sets span status to Error. This makes errors visible in trace backends as first-class annotations on the span, enabling filtering and alerting on error spans.

Log-trace correlation — structured log output (slog) includes trace_id and span_id fields extracted from the active span context. This enables correlating log lines to the trace that produced them, bridging the gap between log-based debugging and trace-based analysis.

Sampling — in production, high-throughput ingestion (bulk imports of 100k+ entities) can generate excessive trace volume. The TracerProvider is configured with a sampler:

  • Development: AlwaysSample — every trace is captured for full visibility.
  • Production: ParentBased(TraceIDRatioBased(rate)) — respects upstream sampling decisions when present; otherwise samples a configurable fraction of traces (default: 1%). This balances visibility with overhead and storage cost.

The sampling rate is configured via environment variable. The application code is sampling-unaware — it always creates spans, and the sampler decides whether to record them.

Profiling: pprof

Go’s net/http/pprof endpoints provide CPU, memory, goroutine, and mutex profiling for performance investigation.

Design:

  • Served on a separate port from the main API server, ensuring profiling endpoints are never exposed on the public API surface
  • Disabled by default — enabled via environment variable
  • No authentication required (access control is via network exposure of the debug port, not application-level auth)
  • Available profiles: CPU, heap, goroutine, mutex, block, allocs, threadcreate

The separate port means profiling can be enabled in production without risk of accidental public exposure. Operators access it via port-forwarding or internal network access.

Authentication and Authorization

Auth Mechanism

API keys as the primary mechanism. Each key resolves to an AuthContext.

type AuthContext struct {
Tenant string // owner of the data (cloud: user ID, self-hosted: implicit)
Tier ConsumerTier // owner | consumer | connector
Operations PermissionSet // allowed operations per entity type
DataScope *DataScope // optional: platform, person, conversation, temporal filters
Provenance ProvenanceDefaults // what provenance to stamp on writes
RateLimit RateLimitProfile // throughput limits for this consumer
}

The API layer resolves the token to an AuthContext. All downstream logic (Domain, Store) receives the context and enforces it. The store layer applies DataScope as implicit query filters — scoped consumers cannot see data outside their scope at the query level.

Permission Model

Permissions are additive and scoped. A token starts with no access and is granted specific capabilities.

Dimensions:

  • Operationread, write:create, write:relationships, write:annotate, delete, manage
  • Entity typepersons, messages, conversations, events, documents, relationships, connectors
  • Data scope — optional filters restricting access to specific platforms, persons, conversations, or time ranges

These compose freely. Examples:

  • AI agent: read:* (all entity types), no data scope restrictions
  • Focused app: read:messages scoped to specific conversations
  • Connector: write:create for messages/persons/conversations, provenance auto-set to platform_reported
  • Annotation tool: write:annotate only

Audit Trail

Every API access is logged against the AuthContext: who (token identity), what (operation, entities), when (timestamp), outcome (success/denied). Designed into the auth middleware, not bolted on.

Cloud Extension

The cloud-hosted version layers on:

  • OAuth 2.0 for user login and third-party connector authorization
  • Tenant isolation at the database level (row-level security or schema-per-tenant)
  • Token management UI (issue, revoke, scope management)

The AuthContext model is the same — OAuth tokens resolve to the same AuthContext structure as API keys. The permission model doesn’t change between deployment modes.

Deployment

Deployment uses OCI-compatible containers managed via compose files. Container runtime is user’s choice (Docker, Podman, etc.) — the spec is runtime-neutral.

Self-Hosted

Multi-container deployment on a single node, managed via compose.

┌─ compose ──────────────────────────────┐
│ │
│ ┌──────────────────┐ │
│ │ LifeDB (Go) │ │
│ │ - API server │ │
│ │ - Job workers │ │
│ │ - Connectors │ │
│ └────────┬─────────┘ │
│ │ │
│ ┌────────▼─────────┐ │
│ │ PostgreSQL │ ...future: │
│ └──────────────────┘ elasticsearch, │
│ redis, etc. │
└────────────────────────────────────────┘
  • App container — single Go binary containing all modules (API, workers, connectors)
  • Database container — PostgreSQL, managed independently of the app
  • Compose-based orchestration: docker-compose up (or podman-compose up)
  • In-process job queue (goroutines) within the app container
  • Low resource footprint — Go binary + Postgres can run on modest hardware

Separating the database into its own container provides:

  • Independent lifecycle — upgrade, restart, or back up the database without touching the app (and vice versa)
  • Infrastructure evolution — adding Elasticsearch, Redis, or other engines follows the same pattern (new container, config change)
  • Flexible hosting — users can point the app at an existing or managed PostgreSQL instance instead of running one locally
  • Closer parity with the cloud topology — reduces the gap between self-hosted and cloud deployment

Cloud-Hosted

Same containers, different deployment topology.

  • App container splits along module boundaries: API server, job workers, and connectors as separate containers
  • Managed PostgreSQL (with read replicas if needed for read/write separation)
  • Dedicated containers for additional engines (search, graph, cache) as needed
  • External job queue (Redis/NATS) for independent worker scaling
  • Load balancer in front of API containers
  • Per-tenant data isolation

The transition from self-hosted to cloud is a topology change, not an architectural change. The same Go binary runs in both — cloud adds horizontal scaling, managed infrastructure, and module-level container separation.

Future Architecture Considerations

Areas designed for but not implemented initially:

  • Connector extraction — connectors move from in-process to separate containers/services. The boundary exists (they’re API consumers); extraction is a deployment change.
  • Dedicated search engine — ReadStore search interface swaps from PostgreSQL to Elasticsearch/Meilisearch. No consumer changes.
  • Dedicated graph engine — ReadStore graph interface swaps from PostgreSQL recursive CTEs to a graph database. No consumer changes.
  • External job queue — JobQueue interface swaps from in-process channels to Redis/NATS. No producer/consumer changes.
  • Plugin formalization — internal interfaces formalized into an external plugin contract for third-party extensions. Deferred until interfaces stabilize after building 3-4 connectors.
  • Real-time subscriptions — GraphQL subscriptions for push-based updates. gqlgen supports this; the infrastructure (WebSocket handling, subscription management) is the main addition.

Encryption

Encryption in Transit

  • TLS required for all external API connections
  • Container-to-container TLS (app ↔ database) recommended for cloud deployment; optional for self-hosted where containers share a private compose network

Encryption at Rest

All stored data must be encrypted at rest — database files, backups, and any local state.

  • Filesystem-level encryption (LUKS, dm-crypt) or PostgreSQL Transparent Data Encryption
  • Protects against physical access (stolen disk, decommissioned hardware, compromised backups)
  • The running application works with plaintext in memory; encryption is transparent to application code and preserves full search and indexing capability

Credential Encryption

Connector credentials (OAuth tokens, platform API keys) receive additional application-level encryption before storage. These grant access to the user’s accounts on other platforms and warrant protection beyond at-rest encryption — a database compromise should not expose credentials in plaintext.

Credentials are encrypted with a separate key from the database encryption, stored and managed independently.

Future: Zero-Knowledge Architecture

For the cloud-hosted version, users may want guarantees that even LifeDB operators cannot read their data. A zero-knowledge architecture (application-level encryption of all content, user-held keys) has deep implications for search, indexing, and feature capability. This is deferred to a dedicated ADR when the cloud-hosted version is designed.

Privacy Considerations

  • All data is scoped to a tenant. No cross-tenant data access is possible at the store level.
  • AuthContext enforces data scope at the query level — a scoped consumer physically cannot retrieve out-of-scope data.
  • Audit trail captures all data access for accountability.
  • Self-hosted deployment means data never leaves the user’s infrastructure.
  • Cloud-hosted deployment requires tenant isolation at the database level, encryption at rest, encryption in transit (TLS), and application-level credential encryption.
  • Connector credentials are application-level encrypted, separate from general data encryption.

Vision

  • Project Vision — Privacy and performance principles driving architectural choices

Product Specifications

  • Data Model — Canonical entities this architecture stores and serves
  • API — Consumer API contract this architecture implements
  • Entity Resolution — Resolution system implemented by the Resolution module
  • Ingestion — Connector architecture and pipeline this architecture supports; Performance Visibility defines what operators observe (implemented by Observability infrastructure here)

Technical Specifications

Decisions