Multimodal
Multimodal work produces non-text artifacts. Runtime owns the canonical input fields, the artifact shape, the adapter routing, and the delivery gates. Apps display or consume artifacts; they do not redefine artifact truth.
Capability Surface
Runtime's multimodal contract covers:
| Capability | What it generates |
|---|---|
| Image | Raster images via image engines |
| Video | Video artifacts via video engines |
| Audio | Audio generation |
| Voice | Text-to-speech (AI_TTS), with voice cloning support (AI_TTS_CREATE_VOICE, AI_TTS_SYNTHESIZE) |
| Music | Music generation, including iteration support |
Each capability has admitted canonical input fields, an admitted artifact shape, and admitted delivery gates. Apps can't invent a new artifact MIME type or skip the delivery gate.
Canonical Input Fields
Every multimodal request has a canonical typed input. Apps build the input under the contract:
| Field | Purpose |
|---|---|
| Capability id | Which capability is being invoked |
| Provider context | Optional provider-specific extension |
| Resource references | Input resources (existing artifacts) |
| Generation parameters | Capability-specific parameters |
The canonical fields live in runtime/kernel/tables/multimodal-canonical-fields.yaml. Apps that produce off-contract input shapes fail closed at admission.
Provider Async Task Lifecycle
Multimodal generations are typically long-running. Runtime models them as provider async tasks with a typed lifecycle:
| State | Terminal? |
|---|---|
queued | no |
running | no |
succeeded | yes |
failed | yes |
expired | yes (timeout-equivalent) |
Note the lower_snake casing (vs ScenarioJob's UPPER_SNAKE). Provider async tasks are normalization at the provider boundary; the casing matches provider semantics.
Async-to-ScenarioJob mapping
Provider async terminal states map deterministically into ScenarioJob terminal states:
| Provider async state | ScenarioJob terminal |
|---|---|
succeeded | COMPLETED |
expired | TIMEOUT |
failed | FAILED |
The mapping rule (K-MMPROV-027) is admitted; apps see one unified shape across modalities.
Artifact Normalization
Multimodal output lands as an artifact with typed canonical fields. Apps consume artifacts through the artifact contract:
| Artifact field | Purpose |
|---|---|
| Artifact id | Stable identity |
| MIME type | From the contract, not guessed |
| Bytes / reference | Where to read the artifact |
| Provenance | Who produced it, under what request lineage |
| Delivery gate verdict | Whether the gate admitted delivery |
Artifact field admission lives in runtime/kernel/tables/multimodal-artifact-fields.yaml. An artifact missing a required field fails closed.
Delivery Gates
A multimodal artifact does not automatically reach the app the moment generation succeeds. The delivery gate decides when an artifact is allowed to be delivered.
| Gate concern | Why it matters |
|---|---|
| Sensitivity classification | Some artifacts may need approval |
| Provenance | Provenance-incomplete artifacts may be quarantined |
| Schema validation | Off-contract artifacts fail closed |
| User policy | User preferences may gate delivery |
The runtime delivery gates table (runtime/kernel/tables/runtime-delivery-gates.yaml) admits the specific gates. Apps see the gate verdict; they do not bypass it.
Music Iteration Support
Music generation admits an iteration model: an artifact can be iterated under typed parameters to produce variations.
| Property | Value |
|---|---|
| Iteration kind | MUSIC_GENERATE (admitted under K-MMPROV-*) |
| Lineage | Each iteration references the previous artifact |
| Audit | Iterations recorded as part of workflow lineage |
Iteration is bounded by the admitted contract; apps can't invent new iteration kinds at runtime.
Voice Cloning Support
The voice capability admits voice cloning under typed contracts.
| Operation | Purpose |
|---|---|
AI_TTS_CREATE_VOICE | Create a voice profile from input audio |
AI_TTS_SYNTHESIZE | Synthesize speech using an admitted voice profile |
AI_TTS | Standard TTS using an admitted voice |
VoiceAsset lifecycle is admitted in the voice contract (K-VOICE-*); voice profiles have admitted reference contracts.
Reader Scenario: An Image Generation Workflow
An app generates an image with a long-running provider.
- Workflow node. An
AI_IMAGEnode is part of a workflow. - ScenarioJob created. The node fans out to a
ScenarioJob. - Provider async task. The provider returns a task id; state moves
queued → running. - Polling / streaming. Runtime tracks the task. The workflow event stream emits external-async progress events.
- Task succeeds. Provider state moves to
succeeded. PerK-MMPROV-027, theScenarioJobterminal becomesCOMPLETED. - Artifact delivery. The image artifact has typed canonical fields, MIME type, provenance. The delivery gate validates schema, provenance, sensitivity. If admitted, delivery completes.
- App receives artifact. Through the SDK's typed artifact shape. The MIME type came from the contract; the app does not guess.
What did not happen: the app did not get a free-form URL with no provenance; the app did not see a guessed MIME type; the artifact did not bypass the delivery gate.
Reader Scenario: A Music Iteration
A user generates music and wants to iterate.
- First generation. A music workflow runs; an artifact is produced.
- Iteration request. The app issues an iteration with typed parameters referencing the original artifact.
MUSIC_GENERATEadmitted. The iteration is admitted underK-MMPROV-*.- Provider async lifecycle. The iteration runs through the provider async lifecycle. State maps into
ScenarioJobterminal as before. - New artifact. The iteration artifact references the original; lineage is preserved.
The iteration is a typed operation; lineage is structural, not docstring.
Reader Scenario: A Provider Async Task Expires
A long video generation hits its provider-side timeout.
- Provider state moves from
runningtoexpired. - Mapping. Per
K-MMPROV-027,ScenarioJobterminal becomesTIMEOUT. - Workflow effect. The node's workflow state moves to
FAILED(or to a retry path under admitted retry policy). - Audit. The expiry is recorded with reason.
The app sees a typed TIMEOUT not a "request failed in some way"; the ScenarioJob terminal type tells the app what happened.
Source Basis
.nimi/spec/runtime/multimodal-provider.md.nimi/spec/runtime/multimodal-delivery-gates.md.nimi/spec/runtime/kernel/multimodal-provider-contract.md.nimi/spec/runtime/kernel/voice-contract.md.nimi/spec/runtime/kernel/delivery-gates-contract.md.nimi/spec/runtime/kernel/tables/multimodal-canonical-fields.yaml.nimi/spec/runtime/kernel/tables/multimodal-artifact-fields.yaml.nimi/spec/runtime/kernel/tables/runtime-delivery-gates.yaml.nimi/spec/runtime/kernel/tables/voice-enums.yaml.nimi/spec/runtime/kernel/tables/tts-provider-capability-matrix.yaml