Architecture

Status: Draft Owner: Ben Last Updated: 2026-03-07

Overview

This document defines the technical architecture of LifeDB — the system structure, technology choices, module boundaries, and data flow patterns that implement the product specifications.

Guiding principles:

Self-hosted first. The architecture optimizes for multi-container deployment on a single node (compose-based). Cloud-hosted is a deployment topology change, not a different architecture.
Clean internal boundaries. Modules communicate through Go interfaces. Boundaries are designed for eventual extraction to services but implemented as a compiled monolith.
Two performance profiles. Ingestion is throughput-oriented. The consumer API is latency-oriented. The architecture explicitly separates these paths.
Interface over implementation. Storage, search, graph traversal, and job processing are defined as interfaces. Initial implementations use simple infrastructure (PostgreSQL, in-process queues). Implementations are swappable without changing module contracts.

Technology Stack

Core Runtime: Go

The primary language for all server-side components: API serving, ingestion pipeline, entity resolution, data processing, and connector management.

Rationale:

Compiles to a single binary — clean container deployment, low resource footprint for self-hosted
Native concurrency (goroutines) handles both concurrent API requests and parallel pipeline processing without framework overhead
2-5x faster than Node.js, 10-50x faster than Python for CPU-bound data manipulation (normalization, scoring, transformation)
Strong standard library for HTTP clients (connectors) and servers (API)

Intelligence Layer: Python

ML and NLP capabilities (semantic search, entity extraction, summarization) run as Python components when needed.

Rationale:

Dominant ecosystem for ML/NLP tooling
Not needed initially — the intelligence module starts thin
Communicates with the Go core via defined interfaces (initially in-process via subprocess/FFI, extractable to a service)

Database: PostgreSQL (Initial)

PostgreSQL is the starting database engine, accessed through three distinct interface boundaries (relational, search, graph). It is a starting point, not a permanent commitment — the interface boundaries exist specifically so that dedicated engines can replace PostgreSQL for search (Elasticsearch, Meilisearch) and graph traversal (Neo4j, Dgraph) as the system’s needs outgrow what PostgreSQL handles well.

Rationale for starting here:

Handles relational queries, full-text search (tsvector), and graph-like traversal (recursive CTEs) adequately for initial scale
Single infrastructure dependency — simplifies self-hosted deployment and operations
Extensions (pgvector) provide a path to semantic search without additional infrastructure
Well-understood operationally — backups, monitoring, tuning are well-documented

Expected evolution: As data volume and query complexity grow, dedicated engines will likely replace PostgreSQL behind the search and graph interfaces. The relational interface may remain on PostgreSQL long-term. Each replacement is an implementation swap behind a stable interface — no consumer changes required.

API Transport: GraphQL (Primary)

GraphQL as the primary API transport, implemented with gqlgen (schema-first code generation).

Rationale:

Natural fit for the entity graph — consumers control traversal depth and shape
Composable queries match the product spec’s filtering and drill-down patterns
AI agents request exactly what they need — no over-fetching, no endpoint proliferation
Schema serves as a typed contract; TypeScript clients auto-generated from it
Subscriptions provide a path to real-time when needed

REST endpoints for operations where GraphQL adds no value: health checks, webhooks, bulk import upload, connector callbacks.

Module Architecture

Dependency Structure

                    ┌─────────┐
                    │   API   │
                    └────┬────┘
                         │
                    ┌────▼────┐
              ┌─────┤ Domain  ├─────┐
              │     └────┬────┘     │
              │          │          │
        ┌─────▼──┐  ┌───▼───┐  ┌──▼──────┐
        │Ingest- │  │Resolu-│  │Intelli- │
        │  ion   │  │ tion  │  │ gence   │
        └────────┘  └───────┘  └─────────┘
              │          │          │
              └──────────┼──────────┘
                         │
              ┌──────────▼──────────┐
              │       Store         │
              │  ┌───────────────┐  │
              │  │  WriteStore   │  │
              │  ├───────────────┤  │
              │  │  ReadStore    │  │
              │  │  ├─Relational │  │
              │  │  ├─Search     │  │
              │  │  └─Graph      │  │
              │  ├───────────────┤  │
              │  │  Projector    │  │
              │  └───────────────┘  │
              └─────────────────────┘

Domain is the center. All modules depend on Domain types and interfaces. Nothing depends on API. Store implements Domain-defined interfaces. This ensures business logic is independent of transport and storage.

Modules

API

Request handling, authentication, response formatting.

GraphQL schema definition and resolver layer (gqlgen)
REST endpoints for non-GraphQL operations
Authentication middleware — resolves requests to AuthContext
Pagination, error formatting, rate limiting
No business logic — delegates entirely to Domain

Domain

Core entity types, business rules, and interface definitions.

Canonical entity types (Person, PlatformIdentity, Conversation, Message, Event, Document, Attachment)
Provenance model types and validation
Interface definitions for all store operations (WriteStore, ReadStore, JobQueue)
Business rules (resolution policies, permission enforcement, provenance precedence)
AuthContext definition and permission checking

Ingestion

Connector management, data normalization, sync state.

Connector lifecycle management (registration, health monitoring, state tracking)
Platform data normalization — maps platform-specific formats to Domain types
Sync position tracking for incremental updates
Batch processing coordination for bulk imports
Writes through WriteStore interface with platform_reported provenance

Connectors are conceptually outside the monolith — they consume the write API. Initially they run in-process but the boundary is designed for extraction.

Resolution

Entity resolution engine.

Factual matching (phone, email, platform-reported links) — deterministic, immediate
Evidence accumulation — non-deterministic signals contribute to confidence scores
Auto-resolution — applies resolution when evidence crosses configurable threshold
Candidate management — maintains the pool of potential matches with evolving confidence
Runs asynchronously, triggered by new data via the job queue

Intelligence

Pre-computed artifacts and ML integration.

Initially thin — coordinates pre-computation of summaries, entity extraction
Interface boundary for Python ML components
Artifacts stored as data fields on entities via WriteStore
Runs asynchronously via the job queue

Store

Data persistence, implemented behind three interface boundaries.

WriteStore — write-optimized operations: entity creation, upserts, batch writes, relationship management
ReadStore — read-optimized operations, with three sub-interfaces:
- Relational — entity queries, filtering, temporal ranges, pagination
- Search — content search, relevance scoring, ranking
- Graph — relationship traversal, entity resolution queries, connection patterns
Projector — async process that transforms WriteStore data into ReadStore representations

All three sub-interfaces are initially implemented against PostgreSQL. Each can be independently swapped to a dedicated engine (Elasticsearch for search, Neo4j for graph) without changing consumers.

Data Flow

Write Path (Ingestion)

Connector → API (write) → Domain (validate, provenance) → WriteStore → JobQueue
                                                                          │
                                                          ┌───────────────┤
                                                          ▼               ▼
                                                      Projector      Resolution
                                                          │               │
                                                          ▼               ▼
                                                      ReadStore      WriteStore
                                                                    (new links)

Data arrives via the write API (GraphQL mutation or REST)
Domain validates, applies provenance, enforces permissions
WriteStore persists durably — write is acknowledged
Jobs enqueued: projection (update read-optimized views), resolution (evaluate new evidence)
Projector updates ReadStore representations asynchronously
Resolution evaluates new data against existing identities, may create new links

Data is queryable via ReadStore after projection completes (seconds). Resolution and intelligence improve results over time (minutes to hours).

Read Path (Consumer API)

Consumer → API (query) → Domain (AuthContext, permissions) → ReadStore → Response

Request arrives, middleware resolves AuthContext
Domain applies permission scoping (data scope filters injected into query)
ReadStore serves from read-optimized representations
Response shaped by GraphQL selection set (consumer controls depth)

The read path never touches WriteStore. Read and write paths are independent, enabling separate performance optimization.

Async Processing

Job Queue Interface

Background work (projection, resolution, intelligence) executes through a job queue interface.

type JobQueue interface {
    Enqueue(ctx context.Context, job Job) error
    Subscribe(ctx context.Context, jobType string, handler JobHandler) error
}

Initial implementation: In-process — goroutines consuming from Go channels. Simple, no infrastructure dependencies, adequate for single-node deployment.

Extraction path: Swap to an external queue (Redis, NATS) when independent scaling of workers is needed. The interface stays the same; only the implementation changes.

Job Types

Projection — update read-optimized views after writes. Latency-sensitive (affects time-to-queryable).
Resolution — evaluate new data against identity graph. Can be batched.
Intelligence — compute summaries, extract entities. Lowest priority, most resource-intensive.

Priority ordering ensures projection runs first (data becomes queryable quickly), resolution second, intelligence third.

Observability

Tracing, metrics, and profiling infrastructure for understanding system behavior in development and production.

Tracing and Metrics: OpenTelemetry

The system uses go.opentelemetry.io/otel for distributed tracing and metrics. OpenTelemetry is vendor-neutral — traces and metrics can be exported to any compatible backend without code changes.

TracerProvider and MeterProvider are configured at application startup and passed through the dependency tree. Modules receive a tracer/meter from the provider rather than importing a global. This keeps the observability dependency explicit and testable.

Exporter strategy is environment-driven, with three modes:

noop (default) — no exporter configured. Zero runtime overhead. Used when observability output is not needed.
stdout — exports traces and metrics to standard output as structured JSON. Used during local development for quick visibility into spans and counters without external infrastructure.
otlp — exports via OpenTelemetry Protocol (gRPC or HTTP) to a collector or backend (Jaeger, Grafana, Datadog, etc.). Used in production or when a collector is available.

The exporter is selected by environment variable. The application code is identical across all three modes — only the provider configuration changes.

Context propagation — span parent-child relationships flow via Go’s context.Context. Each function receives a context carrying the active span; creating a new span from that context automatically establishes the parent relationship. When goroutines fan out, each must derive its span from the parent context to preserve the trace tree. This is the standard OTel-Go mechanism and requires no custom propagation code — but every instrumented function must accept and pass context.Context.

Span granularity — the ingestion pipeline processes batches that may contain thousands of entities. Creating a child span per entity would produce trace explosion (e.g., 10k messages = 10k+ spans per trace). Instead: create child spans for phase-level operations (identity phase, conversation phase, message phase) and use span events for per-entity iterations within a phase. Phase spans capture aggregate duration; span events capture individual entity outcomes without the overhead of full spans.

Error recording — when an operation fails, the enclosing span records the error as a span event and sets span status to Error. This makes errors visible in trace backends as first-class annotations on the span, enabling filtering and alerting on error spans.

Log-trace correlation — structured log output (slog) includes trace_id and span_id fields extracted from the active span context. This enables correlating log lines to the trace that produced them, bridging the gap between log-based debugging and trace-based analysis.

Sampling — in production, high-throughput ingestion (bulk imports of 100k+ entities) can generate excessive trace volume. The TracerProvider is configured with a sampler:

Development: AlwaysSample — every trace is captured for full visibility.
Production: ParentBased(TraceIDRatioBased(rate)) — respects upstream sampling decisions when present; otherwise samples a configurable fraction of traces (default: 1%). This balances visibility with overhead and storage cost.

The sampling rate is configured via environment variable. The application code is sampling-unaware — it always creates spans, and the sampler decides whether to record them.

Profiling: pprof

Go’s net/http/pprof endpoints provide CPU, memory, goroutine, and mutex profiling for performance investigation.

Design:

Served on a separate port from the main API server, ensuring profiling endpoints are never exposed on the public API surface
Disabled by default — enabled via environment variable
No authentication required (access control is via network exposure of the debug port, not application-level auth)
Available profiles: CPU, heap, goroutine, mutex, block, allocs, threadcreate

The separate port means profiling can be enabled in production without risk of accidental public exposure. Operators access it via port-forwarding or internal network access.

Authentication and Authorization

Auth Mechanism

API keys as the primary mechanism. Each key resolves to an AuthContext.

type AuthContext struct {
    Tenant      string           // owner of the data (cloud: user ID, self-hosted: implicit)
    Tier        ConsumerTier     // owner | consumer | connector
    Operations  PermissionSet    // allowed operations per entity type
    DataScope   *DataScope       // optional: platform, person, conversation, temporal filters
    Provenance  ProvenanceDefaults // what provenance to stamp on writes
    RateLimit   RateLimitProfile // throughput limits for this consumer
}

The API layer resolves the token to an AuthContext. All downstream logic (Domain, Store) receives the context and enforces it. The store layer applies DataScope as implicit query filters — scoped consumers cannot see data outside their scope at the query level.

Permission Model

Permissions are additive and scoped. A token starts with no access and is granted specific capabilities.

Dimensions:

Operation — read, write:create, write:relationships, write:annotate, delete, manage
Entity type — persons, messages, conversations, events, documents, relationships, connectors
Data scope — optional filters restricting access to specific platforms, persons, conversations, or time ranges

These compose freely. Examples:

AI agent: read:* (all entity types), no data scope restrictions
Focused app: read:messages scoped to specific conversations
Connector: write:create for messages/persons/conversations, provenance auto-set to platform_reported
Annotation tool: write:annotate only

Audit Trail

Every API access is logged against the AuthContext: who (token identity), what (operation, entities), when (timestamp), outcome (success/denied). Designed into the auth middleware, not bolted on.

Cloud Extension

The cloud-hosted version layers on:

OAuth 2.0 for user login and third-party connector authorization
Tenant isolation at the database level (row-level security or schema-per-tenant)
Token management UI (issue, revoke, scope management)

The AuthContext model is the same — OAuth tokens resolve to the same AuthContext structure as API keys. The permission model doesn’t change between deployment modes.

Deployment

Deployment uses OCI-compatible containers managed via compose files. Container runtime is user’s choice (Docker, Podman, etc.) — the spec is runtime-neutral.

Self-Hosted

Multi-container deployment on a single node, managed via compose.

┌─ compose ──────────────────────────────┐
│                                        │
│  ┌──────────────────┐                  │
│  │  LifeDB (Go)    │                  │
│  │  - API server    │                  │
│  │  - Job workers   │                  │
│  │  - Connectors    │                  │
│  └────────┬─────────┘                  │
│           │                            │
│  ┌────────▼─────────┐                  │
│  │  PostgreSQL      │  ...future:      │
│  └──────────────────┘  elasticsearch,  │
│                        redis, etc.     │
└────────────────────────────────────────┘

App container — single Go binary containing all modules (API, workers, connectors)
Database container — PostgreSQL, managed independently of the app
Compose-based orchestration: docker-compose up (or podman-compose up)
In-process job queue (goroutines) within the app container
Low resource footprint — Go binary + Postgres can run on modest hardware

Separating the database into its own container provides:

Independent lifecycle — upgrade, restart, or back up the database without touching the app (and vice versa)
Infrastructure evolution — adding Elasticsearch, Redis, or other engines follows the same pattern (new container, config change)
Flexible hosting — users can point the app at an existing or managed PostgreSQL instance instead of running one locally
Closer parity with the cloud topology — reduces the gap between self-hosted and cloud deployment

Cloud-Hosted

Same containers, different deployment topology.

App container splits along module boundaries: API server, job workers, and connectors as separate containers
Managed PostgreSQL (with read replicas if needed for read/write separation)
Dedicated containers for additional engines (search, graph, cache) as needed
External job queue (Redis/NATS) for independent worker scaling
Load balancer in front of API containers
Per-tenant data isolation

The transition from self-hosted to cloud is a topology change, not an architectural change. The same Go binary runs in both — cloud adds horizontal scaling, managed infrastructure, and module-level container separation.

Future Architecture Considerations

Areas designed for but not implemented initially:

Connector extraction — connectors move from in-process to separate containers/services. The boundary exists (they’re API consumers); extraction is a deployment change.
Dedicated search engine — ReadStore search interface swaps from PostgreSQL to Elasticsearch/Meilisearch. No consumer changes.
Dedicated graph engine — ReadStore graph interface swaps from PostgreSQL recursive CTEs to a graph database. No consumer changes.
External job queue — JobQueue interface swaps from in-process channels to Redis/NATS. No producer/consumer changes.
Plugin formalization — internal interfaces formalized into an external plugin contract for third-party extensions. Deferred until interfaces stabilize after building 3-4 connectors.
Real-time subscriptions — GraphQL subscriptions for push-based updates. gqlgen supports this; the infrastructure (WebSocket handling, subscription management) is the main addition.

Encryption

Encryption in Transit

TLS required for all external API connections
Container-to-container TLS (app ↔ database) recommended for cloud deployment; optional for self-hosted where containers share a private compose network

Encryption at Rest

All stored data must be encrypted at rest — database files, backups, and any local state.

Filesystem-level encryption (LUKS, dm-crypt) or PostgreSQL Transparent Data Encryption
Protects against physical access (stolen disk, decommissioned hardware, compromised backups)
The running application works with plaintext in memory; encryption is transparent to application code and preserves full search and indexing capability

Credential Encryption

Connector credentials (OAuth tokens, platform API keys) receive additional application-level encryption before storage. These grant access to the user’s accounts on other platforms and warrant protection beyond at-rest encryption — a database compromise should not expose credentials in plaintext.

Credentials are encrypted with a separate key from the database encryption, stored and managed independently.

Future: Zero-Knowledge Architecture

For the cloud-hosted version, users may want guarantees that even LifeDB operators cannot read their data. A zero-knowledge architecture (application-level encryption of all content, user-held keys) has deep implications for search, indexing, and feature capability. This is deferred to a dedicated ADR when the cloud-hosted version is designed.

Privacy Considerations

All data is scoped to a tenant. No cross-tenant data access is possible at the store level.
AuthContext enforces data scope at the query level — a scoped consumer physically cannot retrieve out-of-scope data.
Audit trail captures all data access for accountability.
Self-hosted deployment means data never leaves the user’s infrastructure.
Cloud-hosted deployment requires tenant isolation at the database level, encryption at rest, encryption in transit (TLS), and application-level credential encryption.
Connector credentials are application-level encrypted, separate from general data encryption.

Vision

Project Vision — Privacy and performance principles driving architectural choices

Product Specifications

Data Model — Canonical entities this architecture stores and serves
API — Consumer API contract this architecture implements
Entity Resolution — Resolution system implemented by the Resolution module
Ingestion — Connector architecture and pipeline this architecture supports; Performance Visibility defines what operators observe (implemented by Observability infrastructure here)

Technical Specifications

Data Schema — Database schema, indexes, and query patterns implementing the store layer
API Implementation — GraphQL schema, resolver patterns, REST endpoints
Search & Indexing — Search interface implementation, indexing strategy
Security & Privacy — Security architecture overview and guarantees
Security & Privacy (Internal) — Encryption implementation, tenant isolation, credential management
Module Interfaces — Go interface definitions for all module boundaries defined here

Decisions

ADR-001: Go as Primary Language — Why Go was chosen for all server-side components
ADR-002: PostgreSQL as Initial Storage — Why PostgreSQL for all three access patterns initially
ADR-003: GraphQL as Primary API — Why GraphQL with gqlgen for the consumer API
ADR-005: WriteStore/ReadStore Split — Why separate write and read stores (CQRS-lite)
ADR-006: Dev Containers — Development environment configuration

Architecture

Overview

Technology Stack

Core Runtime: Go

Intelligence Layer: Python

Database: PostgreSQL (Initial)

API Transport: GraphQL (Primary)

Module Architecture

Dependency Structure

Modules

API

Domain

Ingestion

Resolution

Intelligence

Store

Data Flow

Write Path (Ingestion)

Read Path (Consumer API)

Async Processing

Job Queue Interface

Job Types

Observability

Tracing and Metrics: OpenTelemetry

Profiling: pprof

Authentication and Authorization

Auth Mechanism

Permission Model

Audit Trail

Cloud Extension

Deployment

Self-Hosted

Cloud-Hosted

Future Architecture Considerations

Encryption

Encryption in Transit

Encryption at Rest

Credential Encryption

Future: Zero-Knowledge Architecture

Privacy Considerations

Related Documents

Vision

Product Specifications

Technical Specifications

Decisions