Data Model
Status: Published Last Updated: 2026-03-09
Overview
LifeDB’s data model defines the canonical entities that represent a person’s digital life across platforms. The model prioritizes:
- Normalization — platform-specific interactions map to universal entity types
- Identity coherence — the same person is recognized across platforms
- Queryability — all entities are temporally indexed and cross-referenceable
- Losslessness — platform-specific metadata is preserved even when it can’t be normalized
Entities
Person
A unified real-world identity. The result of entity resolution — one Person represents one human, regardless of how many platforms they appear on.
A Person exists independently of any single platform. Two PlatformIdentities that refer to the same human resolve to the same Person.
Key semantics:
- A Person may be fully resolved or tentative. Each resolution carries provenance indicating how it was established (see Provenance Model).
- The system owner (the user whose data this is) is also a Person
PlatformIdentity
A person’s identity on a specific platform: a phone number, email address, Slack handle, Discord username, etc.
Key semantics:
- Many-to-one relationship with Person. Multiple PlatformIdentities resolve to one Person.
- A PlatformIdentity that hasn’t been resolved to a Person yet still exists as a standalone record
- Carries platform-specific profile information (display name, avatar, status, etc.)
Conversation
A context in which messages are exchanged. Represents any grouping of communication: a 1:1 DM, a group chat, a Slack channel, an email thread, a thread within a channel.
Key semantics:
- Recursive. A Conversation has an optional parent Conversation. A Slack channel is a Conversation; a thread within it is a child Conversation. This supports arbitrary nesting depth.
- Participants. A Conversation has participants (Persons), though the participant list may be fixed (group chat) or fluid (open channel). Membership is temporal — join/leave events are recorded so membership can be reconstructed at any point in time. Optimization of temporal membership queries is a technical spec concern.
- Platform-bound. Each Conversation instance belongs to a single platform. Cross-platform linking (e.g., “this Slack thread and this email thread are about the same topic”) is a higher-level concern, handled via cross-entity references.
Message
A unit of communication within a Conversation. The core interaction entity.
Key semantics:
- Sender. One Person (or PlatformIdentity if unresolved).
- Recipients. One or more Persons, each with an optional role:
to,cc,bcc,mention, or unspecified. Recipients capture who the message was directed at, which may differ from the Conversation’s full participant list (e.g., email To vs. CC, @mentions in a channel). - Content. A Message has a body (markdown, normalized from platform-specific formatting) and zero or more Attachments. Content types include but are not limited to:
- Text
- Media (image, video, audio, file)
- Call (voice or video — a call is a communication event with duration, and optionally a transcript or recording)
- System event (participant joined, topic changed, etc.)
- Attachments. Attachments are separate entities, queryable independently of their parent Message. This enables queries like “all images shared in this conversation.” An Attachment belongs to one Message and has its own metadata (type, size, filename, etc.).
- Conversation membership. Every Message belongs to exactly one Conversation.
Event
A named block of time. Represents anything that lives on a calendar: meetings, birthdays, holidays, deadlines, reminders, focus time, all-day events.
Key semantics:
- Time. An Event has a start time and optionally an end time (or all-day flag).
- Participants. Optional. When present, participants may have roles (organizer, attendee) and response status (accepted, declined, tentative).
- Recurrence. An Event may recur. Individual occurrences may override properties of the recurring series.
- Not communication. An Event represents a commitment of time, not an exchange between people. Calendar invites and meeting discussions are Messages that may reference an Event, but the Event itself is distinct.
Document
A content-centric entity. Represents notes, documents, journal entries, and other authored content where the emphasis is on what it says, not who sent it to whom.
Key semantics:
- Content-centric, not communication-centric. The primary value is the content itself. This distinguishes Documents from Messages, where the exchange (sender, recipients) is primary.
- Authorship is optional. A Document may have one author, multiple collaborators, or unknown provenance.
- Ownership is optional. A Document may belong to someone, be shared, or have ambiguous ownership.
- Versioning. Documents may change over time. The model should support snapshots or revision history where available from the source platform.
- Opaque content. Document content is stored faithfully as provided by the source. The canonical representation is plain text or markdown. Internal structure (headings, blocks, Notion-style nesting) is preserved in platform metadata. Derived representations (structured projections, semantic chunking, search-optimized formats) may be computed over time without replacing the source content.
Relationships
Identity Resolution
PlatformIdentity *──────1 PersonMultiple PlatformIdentities resolve to one Person. Resolution may be automatic (same phone number across platforms) or manual (user confirms two identities are the same person).
Conversation Structure
Conversation *──────? Conversation (parent)A Conversation optionally belongs to a parent Conversation, enabling recursive nesting: channels contain threads, threads can contain sub-threads.
Message ↔ Conversation
Message *──────1 ConversationEvery Message belongs to exactly one Conversation.
Message ↔ Person
Message ──────1 Person (sender)Message ──────* Person (recipients, with role)A Message has one sender and zero or more recipients, each with an optional role.
Conversation ↔ Person
Conversation ──────* Person (participants)A Conversation has participants. Membership is temporal — participants join and leave over time, and membership history is preserved as a timeline of events.
Event ↔ Person
Event ──────* Person (participants, with role and status)An Event optionally has participants with roles (organizer, attendee) and response status.
Cross-Entity References
Entities can reference each other across types:
- A Message may reference an Event (a calendar invite sent via email)
- A Message may reference a Document (a note shared in a chat)
- A Document may reference a Conversation (meeting notes linked to a discussion)
Cross-entity references use a generic reference entity: (source_type, source_id, target_type, target_id, relationship_type, provenance). Any entity can link to any entity. Each reference carries provenance (see Provenance Model).
As high-value reference patterns emerge and prove to be heavily queried, they may be promoted to typed first-class relationships at the technical layer for performance — without changing this conceptual model.
Message ↔ Attachment
Attachment *──────1 MessageAn Attachment belongs to exactly one Message. Attachments are independently queryable.
Platform Metadata
Every entity that originates from a platform carries metadata at three tiers:
Tier 1: Canonical Fields
Universal across all platforms. Always populated when available. These are the entity’s core properties defined above (sender, recipients, content, timestamp, participants, etc.).
Tier 2: Normalized Platform Extensions
Platform-specific data that is common or valuable enough to deserve structured representation. Queryable and typed.
Examples:
- Reactions (Slack emoji reactions, iMessage tapbacks)
- Read receipts (WhatsApp blue checks, iMessage read status)
- Edit history (Slack message edits, email amendments)
- Delivery status (sent, delivered, failed)
- RSVP status on Events
- Thread metadata (reply count, participant count)
Normalized extensions are defined as the model evolves — patterns observed in Tier 3 data graduate to Tier 2 when they prove common and valuable.
Tier 3: Raw Metadata
Unstructured key-value store for platform-specific data that doesn’t fit Tiers 1 or 2. Preserves information that may be useful later without requiring upfront schema decisions.
Examples: platform-internal IDs, rendering hints, feature flags, platform-specific formatting.
Principle: Data flows upward. Raw metadata (Tier 3) that proves valuable and cross-platform gets promoted to normalized extensions (Tier 2). Extensions that prove universal may graduate to canonical fields (Tier 1).
Temporal Indexing
All entities carry timestamps and are indexed temporally. This enables cross-entity, cross-platform temporal queries: “everything that happened on Tuesday” returns Messages, Events, and Documents from that time range in a unified stream.
- Messages: sent timestamp (and optionally edited, deleted timestamps)
- Events: start time, end time
- Documents: created, modified timestamps
- Conversations: created, last active timestamps
Provenance Model
A cross-cutting concern across the data model: the system must distinguish between factual and inferred relationships.
- Factual — directly reported by source data. “This message was sent by this phone number to that email address.” Ground truth from the platform.
- Inferred — derived through analysis, heuristics, or AI. “This message is probably about this Event.” “These two PlatformIdentities likely belong to the same Person.” A guess that may be wrong.
Every relationship that could be either factual or inferred carries provenance metadata indicating:
- Method — how the relationship was established. Examples:
platform_reported,exact_match,user_confirmed,heuristic,ai_suggested. - Source — what produced the relationship (platform name, resolution algorithm, user action, etc.).
Consumers decide what to trust. A downstream app might treat platform_reported and user_confirmed as solid, show ai_suggested with a prompt to confirm, and filter out heuristic entirely.
This applies to:
- Identity resolution (PlatformIdentity → Person links)
- Cross-entity references (Message → Event, Document → Conversation links)
- Any relationship where the system makes an inference rather than recording a fact
Related Documents
Vision
- Project Vision — The thesis and principles driving this data model
Specifications
- API — The API contract that exposes this data model
- Entity Resolution — How Person and PlatformIdentity resolution works
- Ingestion — How data enters the system and maps to these entities
- Data Schema — Schema, indexes, and query patterns implementing this model
- Module Interfaces — Go interfaces for reading and writing these entities
Operational
- CLI Data Exploration — Terminal-based read commands for exploring entities
- Getting Started — Consumer guide for installing and configuring LifeDB
Decisions
- Entity resolution approach (TBD) — ADR on how identity merging works