Entity Resolution

Status: Published Last Updated: 2026-03-09

Overview

Entity resolution is the process of recognizing that the same real-world person appears across multiple platforms under different identities. A phone number in iMessage, an email in Gmail, a handle in Slack, and a username in Discord may all be the same person. Entity resolution connects them.

This spec defines how entity resolution works from the consumer’s perspective — what they see, what they interact with, and what guarantees the system provides.

Resolution Lifecycle

A PlatformIdentity progresses through resolution states:

Unresolved

A PlatformIdentity exists but isn’t linked to any Person. This is the starting state after ingestion. The identity is usable — messages from it appear in queries — but it’s isolated. Querying “all messages from this person across platforms” won’t include it because the system doesn’t know who it belongs to.

Candidate

The system has identified a potential match between this PlatformIdentity and an existing Person (or between two unresolved PlatformIdentities). The match is suggested but not applied. Candidates carry evidence — the signals that triggered the suggestion.

Candidates are not a static queue. They are a living pool where confidence evolves over time. New data points (a shared conversation, an overlapping contact, a matching phone number) add evidence. When accumulated evidence crosses the auto-resolution threshold, the candidate resolves automatically.

Resolved

The PlatformIdentity is linked to a Person. Resolved links have one of two origins:

Auto-resolved — the system established the link based on evidence exceeding the confidence threshold. Visible and reversible.
User-confirmed — a consumer explicitly confirmed the link.

Both carry provenance metadata indicating the method and evidence.

Rejected

A candidate match was explicitly rejected by a consumer (“no, these are different people”). The rejection is recorded to prevent the system from re-suggesting the same match. Rejected pairs can be un-rejected if the consumer changes their mind.

Auto-Resolution

The system automatically resolves identities when evidence is strong enough, without requiring consumer intervention.

Factual Matches

Deterministic matches auto-resolve immediately:

Same phone number across platforms (iMessage + WhatsApp + Signal)
Same email address across platforms (Gmail + Slack + Calendar)
Platform-reported links (Slack profile includes email, contact card includes phone and email)

These are facts, not inferences. The platform data directly states the connection.

Evidence Accumulation

Non-deterministic signals contribute to a confidence score. Individual signals may be weak, but they compound:

Similar display names across platforms
Overlapping conversation participants
Shared contact graph patterns
Temporal correlation (active at similar times)
Geographic signals (same timezone patterns)

When accumulated evidence crosses the auto-resolution threshold, the system resolves the match. The consumer never had to review it.

Threshold Configuration

The auto-resolution threshold is configurable:

Aggressive — resolve more automatically, fewer candidates to review, occasional wrong merges (correctable via split)
Conservative — resolve less automatically, more candidates to review, fewer mistakes
Manual only — only factual matches auto-resolve, everything else requires confirmation

The default should be sensible for most users. The threshold applies only to inferred matches — factual matches always auto-resolve.

Consumer Interaction

Consumers interact with entity resolution in two ways:

Active Management

A dedicated query for resolution candidates and status:

List candidates — pending matches with evidence and confidence, filterable and sortable
Confirm — accept a suggested match, linking the PlatformIdentity to the Person
Reject — decline a match, preventing re-suggestion
Merge — manually link two PlatformIdentities (or two Persons) that the system hasn’t suggested
Split — undo a resolution, separating PlatformIdentities back into distinct Persons
Adjust threshold — change auto-resolution sensitivity

Contextual Hints

When querying a Person, the API surfaces resolution context:

Number of linked PlatformIdentities
Pending candidates (“3 possible additional identities for this person”)
Resolution provenance on each link (factual, auto-resolved, user-confirmed)

This enables lightweight resolution management without visiting a dedicated candidates view — a consumer browsing a Person’s data can see and act on suggestions in context.

Merge and Split Semantics

Merge Is Non-Destructive

When PlatformIdentities resolve to one Person, the underlying data is unchanged. Messages still reference the PlatformIdentity that sent them. The Person is a layer above — resolution links PlatformIdentities to Persons.

Querying a Person returns data from all their linked PlatformIdentities. “Carrie’s messages” resolves to “messages from all PlatformIdentities currently linked to Carrie’s Person entity.”

Split Is Clean

Because merge doesn’t mutate underlying data, split is straightforward. Changing which Person a PlatformIdentity points to automatically changes which queries it appears in. All messages connected through that identity follow the link.

No Data Loss

The resolution layer is always adjustable without touching interaction data. Merge, split, confirm, reject — these change links, not messages, events, or documents. Historical data is never rewritten.

Consistency With Current State

Queries reflect the current resolution state. If a merge is undone, queries immediately reflect the separation. There is no “historical merge” concept — the resolution graph is always queried as it exists now.

Provenance

Every resolution link carries provenance metadata (see Data Model: Provenance Model):

Method — exact_match, exact_match_merge, platform_reported, evidence_accumulated, user_confirmed, user_merged
Source — what produced it (platform name, resolution algorithm version, user ID)
Evidence — for inferred resolutions, the signals that contributed
Timestamp — when the resolution was established

Consumer overrides (user_confirmed, user_merged) supersede system inferences. A user-confirmed link is never automatically reconsidered.

First-Party vs. Third-Party Trust

Resolution provenance distinguishes between data sources:

First-party connectors — maintained by LifeDB, higher trust for platform_reported signals
Third-party connectors — community or external integrations, same provenance model but potentially lower default confidence for inferred signals
Consumer input — always user_confirmed, highest trust level

The trust model is explicit and auditable. Consumers can filter by provenance to see only high-confidence resolutions or to review what third-party connectors have contributed.

Future: Non-Person Entity Resolution

This spec focuses on Person resolution, but the same pattern applies to other real-world entities that appear under different names across platforms:

Places — a restaurant appears as a calendar event location, a Google Maps link in a message, a Yelp review, an address in a contact card
Organizations — a company appears as an email domain, a Slack workspace, a calendar event organizer, a contact’s employer field
Locations — cities, neighborhoods, venues referenced across messages, events, and documents

These will likely be less complex than Person resolution (fewer identity signals, less ambiguity) but follow the same core model: multiple references resolve to one canonical entity, with provenance and confidence. The resolution lifecycle (candidate → resolved, merge/split, auto-resolution) applies.

This is not in scope for the initial implementation but the architecture should not preclude it. The generic cross-entity reference model and provenance system are designed to support this extension.

Performance Expectations

Entity resolution must not block data availability. When data is ingested:

Data is normalized and stored — available for queries immediately via PlatformIdentity
Resolution runs asynchronously — factual matches resolve quickly, evidence accumulation happens over time
As identities resolve, query results improve — a Person query that initially returned only Slack messages starts including iMessage conversations as those identities link up

The consumer API reads from the current resolution state and must return results with near-instantaneous latency regardless of how much resolution processing is pending. Resolution improves data quality over time without degrading query performance.

Specifications

Data Model — Person, PlatformIdentity, and Provenance Model definitions
API — Relationship management operations (confirm, reject, merge, split)
Ingestion — How data enters the system and triggers resolution
Data Schema — identity_links and resolution_candidates tables
Module Interfaces — IdentityGraph interface for resolution queries

Decisions

Entity resolution algorithm (TBD) — ADR on matching strategies and confidence scoring
Auto-resolution threshold defaults (TBD) — ADR on default sensitivity settings