diff --git a/CHANGELOG.md b/CHANGELOG.md index dffd84e..1463f87 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -79,6 +79,23 @@ tool does not recognize the token and cannot expand the template. guarantees, session lifecycle, and the internal shape of `session` beyond `session_id` and `lang` are explicitly out of scope. +## OVOS-USER-ID-1 — User Identity Resolution + +### 1 + +- User identity resolution at the bus layer. The §2 identity fields, + claimed under OVOS-SESSION-1 §2.1: the consolidated `user_id`, the four + enrolled-signal ids `voice_id` / `face_id` / `name_id` / `passphrase_id`, + the configured `default_user_id`, and the integer `auth_level`. §3 — the + six `auth_level` strengths (`0` anonymous, `1` configured default, `2` + self-declared, `3` single passive biometric, `4` multiple passive + biometrics, `5` explicit credential). §4 — the four enrolled signal + types and the transformer / out-of-band paths that populate them. §5 — + the recognition plugin's ordering relative to the pipeline. §6 — Layer-2 + injection: a bridge MAY set the identity fields directly and preserves + `user_id` across relay hops. Conformance roles: Recognition Plugin, + Skill, Bridge. + ## OVOS-INTENT-4 — Intent and Entity Registration Bus Contract ### 1 diff --git a/README.md b/README.md index 24c655e..1769156 100644 --- a/README.md +++ b/README.md @@ -108,6 +108,7 @@ pre-spec status quo, `2` for one that is not backwards compatible. | OVOS-SESSION-1 | [Session Carrier Wire Shape](session-1.md) | 1 | Draft | | OVOS-SESSION-2 | [Session Lifecycle and State Ownership](session-2.md) | 1 | Draft | | OVOS-BRIDGE-1 | [Bus Bridge and Opaque Relay](bridge-1.md) | 2 | Draft | +| OVOS-USER-ID-1 | [User Identity Resolution](user-id-1.md) | 1 | Draft | ### Orchestrator stack — what processes utterances diff --git a/user-id-1.md b/user-id-1.md new file mode 100644 index 0000000..8d515f3 --- /dev/null +++ b/user-id-1.md @@ -0,0 +1,397 @@ +# User Identity Resolution Specification + +**Spec ID:** OVOS-USER-ID-1 · **Version:** 1 · **Status:** Draft + +This specification defines the session fields that carry user identity +and authentication evidence for an utterance. It prescribes what each +field means, how authentication strength is expressed, and what +obligations apply to recognition plugins and skills. + +It builds on **OVOS-SESSION-1** (field registry), **OVOS-TRANSFORM-1** +(transformer chain), **OVOS-BRIDGE-1** (Layer-2 injection), and +**OVOS-MSG-1** (session carrier). + +The key words **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**, +**MAY**, and **RECOMMENDED** are used as in RFC 2119. + +--- + +## 1. Scope + +This specification defines: + +- **identity fields** (§2) — the session fields carrying resolved + identity and per-signal evidence; +- **authentication level** (§3) — a derived integer summarising + evidence strength; +- **enrolled signals** (§4) — the four data points a user may enroll + and how they map to session fields; +- **resolution** (§5) — how a recognition plugin populates the fields + and how identity persists across utterances; +- **Layer-2 injection** (§6) — how a bridge may supply identity + directly; +- **guest fallback** (§7) — required behavior when no identity is + resolved; +- **skill use** (§8) — how skills gate behavior on authentication + level; +- **conformance** (§9). + +It does **not** define: + +- recognition algorithms, models, or biometric processing; +- enrollment procedures or credential storage; +- how audio, video, or other sensor data is acquired. + +--- + +## 2. Identity fields + +The fields below are claimed under **OVOS-SESSION-1 §2.1**. All are +optional on the wire; absence means the signal was not collected, did +not match, or was not attempted. All are stable once written and travel +with the session for the remainder of the utterance lifecycle. + +| Field | Type | Meaning | +|-------|------|---------| +| `user_id` | string (opaque) | The resolved user identity. Present means a known user was identified; absent means guest (anonymous). | +| `voice_id` | string (opaque) | ID of the enrolled voice-print record that matched this utterance. | +| `face_id` | string (opaque) | ID of the enrolled face-print record that matched. | +| `name_id` | string (opaque) | ID of the enrolled name record that the user's self-declaration matched. | +| `passphrase_id` | string (opaque) | ID of the enrolled secret-phrase record that matched. | +| `default_user_id` | string (opaque) | Configured identity for this site or session, set by a bridge or deployer config before any runtime signal is evaluated (§6). | +| `auth_level` | integer | Authentication strength (§3). | + +`user_id` is the **consolidated** identity resolved from one or more +per-signal fields. Per-signal fields (`voice_id`, `face_id`, `name_id`, +`passphrase_id`) MAY be set even when `user_id` cannot be resolved — +they indicate what was attempted, not that a user was identified. An +absent `user_id` with a present per-signal field means recognition ran +but could not produce a confirmed match. + +--- + +## 3. Authentication level + +`session.auth_level` summarises the strength of evidence backing +`session.user_id`. + +| Level | Evidence | +|-------|---------| +| `0` | Anonymous — `user_id` absent. Per-signal fields MAY still be set (attempted but unresolved). | +| `1` | Configured default — `default_user_id` was used; no runtime signal matched. | +| `2` | Self-declared — `name_id` matched; user stated their identity, unverified. | +| `3` | Single passive biometric — exactly one of `voice_id` or `face_id` matched. | +| `4` | Multiple passive biometrics — two or more of `voice_id`, `face_id` independently matched the same `user_id`. | +| `5` | Explicit credential — `passphrase_id` matched, with or without additional signals. | + +The recognition plugin or bridge **MUST** set `auth_level` to `0` +when `user_id` is absent. When `user_id` is present, it **MUST** set +`auth_level` to the highest level whose criteria are met. + +Skills **MUST** treat an absent `auth_level` as `0`. + +--- + +## 4. Enrolled signals + +A user may enroll any combination of four signal types. Each maps to +one session field. Enrollment procedures are out of scope; this +section defines only the runtime semantics. + +### 4.1 Voice print — `voice_id` + +A voice-print recognizer (typically an audio transformer, +OVOS-TRANSFORM-1 §3.1) compares the utterance audio against enrolled +voice prints. On a successful match it writes the opaque `voice_id` of +the matching enrollment record to the session. + +### 4.2 Face print — `face_id` + +Face recognition operates outside the utterance lifecycle — for +example, a camera sensor plugin running a continuous recognition loop. +Because it is not triggered by the utterance, it is typically less +temporally precise than inline signals. The result is injected into the +session before the utterance reaches the recognition plugin (§5), +either by the client or by a Layer-2 bridge (§6). + +`session.site_id` (OVOS-SESSION-1 §3.3) SHOULD be used to select the +camera associated with the physical location where the utterance +originated, so the correct feed is queried in multi-device deployments. + +A face-recognition implementation SHOULD employ liveness detection to +prevent spoofing by static images or replayed video. The specific +mechanism is deployer-defined. An implementation that cannot verify +liveness SHOULD reflect this in a lower `auth_level` or decline to set +`face_id`. + +> **Note (non-normative):** Liveness techniques include +> challenge–response prompts (blink, head-turn), texture analysis, and +> depth sensing. Absent liveness detection, face recognition alone +> yields `auth_level` 3 — suitable for personalised features but +> insufficient for high-trust operations. + +### 4.3 Name — `name_id` + +A self-declaration recognizer (typically an utterance transformer, +OVOS-TRANSFORM-1 §3.2) detects identity assertions in the transcript +("I am Alice", "You're talking to Bob") and matches against enrolled +names. On a successful match it writes the opaque `name_id` of the +matching enrollment record. + +If no names are enrolled, or if the declared name does not match any +enrollment, `name_id` is left absent. A matched name is unverified — +the user stated who they are but presented no credential — and +SHOULD NOT be the sole basis for any operation requiring verified +identity. + +### 4.4 Secret phrase — `passphrase_id` + +A passphrase recognizer (typically an utterance transformer) detects +a secret phrase in the transcript and matches against enrolled +credentials. Because the user actively produced the credential, +`passphrase_id` alone is sufficient for `auth_level` 5 regardless of +which passive signals are also present. + +--- + +## 5. Resolution + +A **user recognition plugin** is responsible for writing the identity +fields into the session. The plugin: + +1. reads available per-signal evidence from `Message.context`, the + incoming session, and out-of-band sources such as a camera feed; +2. resolves each signal to its enrollment record ID; +3. derives `user_id` from the resolved signals using its own logic; +4. computes `auth_level` per §3; +5. writes the fields it resolved to the session. + +How the plugin is implemented — as a metadata transformer +(OVOS-TRANSFORM-1 §3.3), as a standalone service, or as a combination +— is deployer-defined. The only normative constraint is that the fields +it has resolved are present in `context.session` by the time the +utterance enters the pipeline. + +When multiple signals conflict, the RECOMMENDED strategy is to resolve +to the signal with the highest confidence; if no signal meets a +deployer-defined threshold, leave `user_id` absent. + +### 5.1 Identity persistence across utterances + +A recognition plugin SHOULD preserve identity fields from the incoming +session unless the current utterance provides contrary evidence. Once +a user has been identified at a given `auth_level`, that level SHOULD +be maintained for subsequent utterances in the same session without +requiring re-recognition. + +A plugin MAY upgrade `auth_level` during a session (e.g. when the +user provides a passphrase in a later utterance) but SHOULD NOT +downgrade it without a positive disconfirming signal. + +### 5.2 Re-authentication + +When a skill requires a higher `auth_level` than the current session +carries, it SHOULD prompt the user and use **OVOS-CONVERSE-1** +`response_mode` to capture the next utterance before it enters the +normal pipeline. The responding utterance will be processed by the +recognition plugin as usual; if the new `auth_level` meets the skill's +requirement the skill can proceed. + +--- + +## 6. Layer-2 injection + +A Layer-2 bridge (OVOS-BRIDGE-1 §4.2) MAY inject any or all identity +fields into the session before the utterance reaches the pipeline — +for example, a chat interface where the user is authenticated at the +network layer, or a satellite terminal configured for a single known +user. A bridge MUST set `auth_level` consistently with §3 and MAY set +per-signal fields without running any recognizer. + +`default_user_id` is the designated field for a configured-default +identity: a bridge or deployer config sets it to indicate "this +site or session belongs to this user absent any other signal." A +recognition plugin that finds `default_user_id` set and no runtime +signal resolved SHOULD use it as `user_id` with `auth_level` 1. + +--- + +## 7. Guest fallback + +Skills and pipeline plugins **MUST NOT** fail or error when +`session.user_id` is absent, and **SHOULD** treat `user_id` absent +(or `auth_level` 0) as a guest session. + +--- + +## 8. Skill use + +Skills gate sensitive operations on `session.auth_level`. The table +below gives guidance; the thresholds are RECOMMENDED, not normative. + +| `auth_level` | Suitable for | +|-------------|--------------| +| 0 | Fully anonymous features: weather, timers, general knowledge. | +| ≥ 1 | Personalised features based on configured profile. | +| ≥ 2 | Low-trust personal features: preferences, non-sensitive reminders. | +| ≥ 3 | User-specific data: contacts, calendar, media history. | +| ≥ 4 | Sensitive personal data: private notes, location history. | +| 5 | High-trust operations: financial transactions, access control, privileged commands. | + +A skill that requires a minimum level the session does not meet +**SHOULD** respond with an appropriate prompt (e.g. request a +passphrase) rather than silently failing or returning another user's +data. + +### 8.1 Examples (non-normative) + +The following patterns illustrate how an intent handler reads identity +fields from the session it receives on dispatch. + +**Anonymous feature — no identity check** + +``` +handler(message): + speak("It is currently 22°C and sunny.") +``` + +Weather, timers, and general-knowledge queries require no identity. +The handler ignores all identity fields. + +--- + +**Personalised feature — identity required, any level** + +``` +handler(message): + session = message.context["session"] + user_id = session.get("user_id") + if not user_id: + speak("I don't know who you are yet. What's your name?") + return + prefs = load_preferences(user_id) + speak(f"Playing {prefs.favourite_station}.") +``` + +`user_id` present at any `auth_level` ≥ 1 is sufficient. The handler +does not inspect `auth_level` because the risk of returning the wrong +user's radio station is low. + +--- + +**Sensitive data — minimum auth_level** + +``` +handler(message): + session = message.context["session"] + user_id = session.get("user_id") + auth_level = session.get("auth_level", 0) + + if auth_level < 3: + speak("I need to recognise your voice before reading your messages.") + return + speak(summarise(fetch_messages(user_id))) +``` + +Reading private messages requires at least one confirmed biometric +(level 3). A name declaration (level 2) or configured default (level 1) +is not sufficient. + +--- + +**High-trust operation — explicit credential required** + +``` +handler(message): + session = message.context["session"] + user_id = session.get("user_id") + auth_level = session.get("auth_level", 0) + + if auth_level < 5: + speak("Please say your secret phrase to authorise this transfer.") + return + execute_transfer(user_id, amount, recipient) + speak(f"Transferred {amount} to {recipient}.") +``` + +Financial operations require `auth_level` 5. The handler prompts for +a passphrase rather than refusing silently. The next utterance re-enters +the pipeline; if the user speaks the enrolled phrase the recognition +plugin upgrades `auth_level` to 5 (§5.1) and the skill can proceed via +`response_mode` (§5.2). + +--- + +**Multi-factor corroboration — checking specific signals** + +``` +handler(message): + session = message.context["session"] + face_id = session.get("face_id") + passphrase_id = session.get("passphrase_id") + + if not (face_id and passphrase_id): + speak("Face and passphrase both required to unlock.") + return + unlock_door(session["user_id"]) +``` + +Some operations have requirements that cannot be expressed as a plain +level threshold. Here physical access control requires both `face_id` +and `passphrase_id` to be present regardless of overall `auth_level`. + +--- + +**Configured default — satellite terminal** + +A terminal configured for a single household member. The bridge sets +`default_user_id` and `auth_level: 1`; no runtime recognition runs. + +``` +# Session from bridge: { "user_id": "alice", "default_user_id": "alice", +# "auth_level": 1 } + +handler(message): + session = message.context["session"] + if session.get("auth_level", 0) >= 1: + prefs = load_preferences(session["user_id"]) + speak(f"Good morning, setting your usual temperature.") +``` + +--- + +## 9. Conformance + +### A user recognition plugin or bridge **MUST**: + +- write `auth_level` to `0` when `user_id` is absent (§3); +- write `auth_level` to the highest applicable level when `user_id` + is present (§3); +- leave `user_id` absent rather than using a sentinel value when + identity cannot be resolved (§2, §7); +- write only the identity fields it has resolved to `context.session` + before the utterance enters the pipeline (§5). + +### Skills and pipeline plugins **MUST**: + +- treat absent `user_id` as anonymous (§7); +- treat absent `auth_level` as `0` (§3); +- never fail or error on absent identity fields (§7). + +--- + +## See also + +- **OVOS-SESSION-1** — field registry (§2.1); all fields defined here + are claimed there. +- **OVOS-SESSION-2** — session resumption; identity fields in a resumed + session persist per §5.1 unless superseded by new signals. +- **OVOS-TRANSFORM-1** — audio-transformer (§3.1) and + utterance-transformer (§3.2) as signal injection points; metadata + transformer (§3.3) as consolidation point. +- **OVOS-BRIDGE-1** — Layer-2 injection of identity fields (§6). +- **OVOS-CONVERSE-1** — `response_mode` for capturing a re-authentication + utterance before normal pipeline matching (§5.2). +- **OVOS-MSG-1** — session carrier and `Message.context` semantics. +- **OVOS-AUDIO-IN-1** — audio-transformer chain; voice-print recognition + is an audio-transformer use case.