Entity Resolution
Status: Published Last Updated: 2026-03-09
Overview
Entity resolution is the process of recognizing that the same real-world person appears across multiple platforms under different identities. A phone number in iMessage, an email in Gmail, a handle in Slack, and a username in Discord may all be the same person. Entity resolution connects them.
This spec defines how entity resolution works from the consumer’s perspective — what they see, what they interact with, and what guarantees the system provides.
Resolution Lifecycle
A PlatformIdentity progresses through resolution states:
Unresolved
A PlatformIdentity exists but isn’t linked to any Person. This is the starting state after ingestion. The identity is usable — messages from it appear in queries — but it’s isolated. Querying “all messages from this person across platforms” won’t include it because the system doesn’t know who it belongs to.
Candidate
The system has identified a potential match between this PlatformIdentity and an existing Person (or between two unresolved PlatformIdentities). The match is suggested but not applied. Candidates carry evidence — the signals that triggered the suggestion.
Candidates are not a static queue. They are a living pool where confidence evolves over time. New data points (a shared conversation, an overlapping contact, a matching phone number) add evidence. When accumulated evidence crosses the auto-resolution threshold, the candidate resolves automatically.
Resolved
The PlatformIdentity is linked to a Person. Resolved links have one of two origins:
- Auto-resolved — the system established the link based on evidence exceeding the confidence threshold. Visible and reversible.
- User-confirmed — a consumer explicitly confirmed the link.
Both carry provenance metadata indicating the method and evidence.
Rejected
A candidate match was explicitly rejected by a consumer (“no, these are different people”). The rejection is recorded to prevent the system from re-suggesting the same match. Rejected pairs can be un-rejected if the consumer changes their mind.
Auto-Resolution
The system automatically resolves identities when evidence is strong enough, without requiring consumer intervention.
Factual Matches
Deterministic matches auto-resolve immediately:
- Same phone number across platforms (iMessage + WhatsApp + Signal)
- Same email address across platforms (Gmail + Slack + Calendar)
- Platform-reported links (Slack profile includes email, contact card includes phone and email)
These are facts, not inferences. The platform data directly states the connection.
Evidence Accumulation
Non-deterministic signals contribute to a confidence score. Individual signals may be weak, but they compound:
- Similar display names across platforms
- Overlapping conversation participants
- Shared contact graph patterns
- Temporal correlation (active at similar times)
- Geographic signals (same timezone patterns)
When accumulated evidence crosses the auto-resolution threshold, the system resolves the match. The consumer never had to review it.
Threshold Configuration
The auto-resolution threshold is configurable:
- Aggressive — resolve more automatically, fewer candidates to review, occasional wrong merges (correctable via split)
- Conservative — resolve less automatically, more candidates to review, fewer mistakes
- Manual only — only factual matches auto-resolve, everything else requires confirmation
The default should be sensible for most users. The threshold applies only to inferred matches — factual matches always auto-resolve.
Consumer Interaction
Consumers interact with entity resolution in two ways:
Active Management
A dedicated query for resolution candidates and status:
- List candidates — pending matches with evidence and confidence, filterable and sortable
- Confirm — accept a suggested match, linking the PlatformIdentity to the Person
- Reject — decline a match, preventing re-suggestion
- Merge — manually link two PlatformIdentities (or two Persons) that the system hasn’t suggested
- Split — undo a resolution, separating PlatformIdentities back into distinct Persons
- Adjust threshold — change auto-resolution sensitivity
Contextual Hints
When querying a Person, the API surfaces resolution context:
- Number of linked PlatformIdentities
- Pending candidates (“3 possible additional identities for this person”)
- Resolution provenance on each link (factual, auto-resolved, user-confirmed)
This enables lightweight resolution management without visiting a dedicated candidates view — a consumer browsing a Person’s data can see and act on suggestions in context.
Merge and Split Semantics
Merge Is Non-Destructive
When PlatformIdentities resolve to one Person, the underlying data is unchanged. Messages still reference the PlatformIdentity that sent them. The Person is a layer above — resolution links PlatformIdentities to Persons.
Querying a Person returns data from all their linked PlatformIdentities. “Carrie’s messages” resolves to “messages from all PlatformIdentities currently linked to Carrie’s Person entity.”
Split Is Clean
Because merge doesn’t mutate underlying data, split is straightforward. Changing which Person a PlatformIdentity points to automatically changes which queries it appears in. All messages connected through that identity follow the link.
No Data Loss
The resolution layer is always adjustable without touching interaction data. Merge, split, confirm, reject — these change links, not messages, events, or documents. Historical data is never rewritten.
Consistency With Current State
Queries reflect the current resolution state. If a merge is undone, queries immediately reflect the separation. There is no “historical merge” concept — the resolution graph is always queried as it exists now.
Provenance
Every resolution link carries provenance metadata (see Data Model: Provenance Model):
- Method —
exact_match,exact_match_merge,platform_reported,evidence_accumulated,user_confirmed,user_merged - Source — what produced it (platform name, resolution algorithm version, user ID)
- Evidence — for inferred resolutions, the signals that contributed
- Timestamp — when the resolution was established
Consumer overrides (user_confirmed, user_merged) supersede system inferences. A user-confirmed link is never automatically reconsidered.
First-Party vs. Third-Party Trust
Resolution provenance distinguishes between data sources:
- First-party connectors — maintained by LifeDB, higher trust for
platform_reportedsignals - Third-party connectors — community or external integrations, same provenance model but potentially lower default confidence for inferred signals
- Consumer input — always
user_confirmed, highest trust level
The trust model is explicit and auditable. Consumers can filter by provenance to see only high-confidence resolutions or to review what third-party connectors have contributed.
Future: Non-Person Entity Resolution
This spec focuses on Person resolution, but the same pattern applies to other real-world entities that appear under different names across platforms:
- Places — a restaurant appears as a calendar event location, a Google Maps link in a message, a Yelp review, an address in a contact card
- Organizations — a company appears as an email domain, a Slack workspace, a calendar event organizer, a contact’s employer field
- Locations — cities, neighborhoods, venues referenced across messages, events, and documents
These will likely be less complex than Person resolution (fewer identity signals, less ambiguity) but follow the same core model: multiple references resolve to one canonical entity, with provenance and confidence. The resolution lifecycle (candidate → resolved, merge/split, auto-resolution) applies.
This is not in scope for the initial implementation but the architecture should not preclude it. The generic cross-entity reference model and provenance system are designed to support this extension.
Performance Expectations
Entity resolution must not block data availability. When data is ingested:
- Data is normalized and stored — available for queries immediately via PlatformIdentity
- Resolution runs asynchronously — factual matches resolve quickly, evidence accumulation happens over time
- As identities resolve, query results improve — a Person query that initially returned only Slack messages starts including iMessage conversations as those identities link up
The consumer API reads from the current resolution state and must return results with near-instantaneous latency regardless of how much resolution processing is pending. Resolution improves data quality over time without degrading query performance.
Related Documents
Specifications
- Data Model — Person, PlatformIdentity, and Provenance Model definitions
- API — Relationship management operations (confirm, reject, merge, split)
- Ingestion — How data enters the system and triggers resolution
- Data Schema — identity_links and resolution_candidates tables
- Module Interfaces — IdentityGraph interface for resolution queries
Decisions
- Entity resolution algorithm (TBD) — ADR on matching strategies and confidence scoring
- Auto-resolution threshold defaults (TBD) — ADR on default sensitivity settings