# Email Link Delivery - Phased Rollout

Companion to `EMAIL_LINK_DELIVERY_IMPLEMENTATION_PLAN.md`. This document breaks the plan into shippable phases with concrete tasks, file/migration targets, configuration, tests, and exit criteria.

Conventions:
- Migrations: sequential Flyway `V__` files under `src/main/resources/db/migration`.
- New Java code lives under `com.cbmportal.portal.*` matching existing layout.
- Feature flag `cbm.email.delivery.mode` starts as `ATTACHMENT`, moves to `DUAL`, then `LINK`.
- All new endpoints stay under `/api/v1/admin/...` and require `Admin`/`Office` auth.
- Every phase ships behind config so it can be turned off without redeploy.

---

## Phase 0 - Pre-flight (no code changes)

Goals:
- Confirm decisions, owners, and rollout windows.
- Identify "sensitive" form list (start with `NewHire`).
- Confirm internal-domain allowlist value(s).

Deliverables:
- Decision sign-off recorded in `EMAIL_LINK_DELIVERY_IMPLEMENTATION_PLAN.md`.
- Internal comms drafted for office users.

Exit Criteria:
- Sensitive-form list approved.
- Domain allowlist approved.
- Rollout owner identified.

---

## Phase 1 - Foundations (no user-visible changes)

Goal: Land the data model, services, and config plumbing without changing email behavior.

### 1.1 Flyway migrations
Add (sequential):
- `V22__form_file_asset.sql`
  - columns per plan: `id`, `form_type`, `form_id`, `file_category`, `storage_path`, `original_file_name`, `content_type`, `size_bytes`, `checksum`, `created_at`.
  - indexes on `(form_type, form_id)`, `(file_category)`.
- `V23__email_access_token.sql`
  - columns: `id`, `jti`, `form_type`, `form_id`, `scope`, `sensitivity`, `recipient_domain`, `expires_at`, `max_downloads`, `download_count`, `revoked_at`, `created_by`, `created_at`.
  - unique index on `jti`.
- `V24__email_access_audit.sql`
  - columns: `id`, `token_jti`, `action`, `result`, `actor_user_id`, `actor_email`, `ip_address`, `user_agent`, `correlation_id`, `created_at`.
  - indexes on `(token_jti)`, `(created_at)`.
- `V25__email_otp_challenge.sql`
  - columns: `id`, `token_jti`, `code_hash`, `attempts`, `max_attempts`, `expires_at`, `verified_at`, `granted_until`, `created_at`.

### 1.2 Entities & repositories
- `domains/FormFileAsset.java`
- `domains/EmailAccessToken.java`
- `domains/EmailAccessAudit.java`
- `domains/EmailOtpChallenge.java`
- Matching repositories under `repositories/`.

### 1.3 Services (interfaces + impl, no callers yet)
- `services/EmailLinkService` / `services/impl/EmailLinkServiceImpl`
  - `createToken(...)`, `validateToken(...)`, `consumeDownload(...)`, `revoke(...)`.
- `services/EmailOtpService` / `services/impl/EmailOtpServiceImpl`
  - `issue(jti, email)`, `verify(jti, code)`, `hasActiveGrant(jti, principal)`.
- `services/FormFileAssetService` / impl
  - `registerAsset(...)`, `listAssets(formType, formId)`, `findByCategory(...)`.
- `services/ZipPackageService` / impl
  - `streamUploadsZip(formType, formId, outputStream)`.

### 1.4 Configuration
Add to `application.yml` (default ATTACHMENT mode, all values from plan):
```yaml
cbm:
  email:
    delivery:
      mode: ATTACHMENT
    link:
      standard:
        ttl-hours: 24
        max-downloads: 2
      sensitive:
        ttl-hours: 12
        max-downloads: 1
      include-uploads-zip: true
      internal-domain-allowlist: carlsonbuilding.com
    otp:
      ttl-minutes: 5
      max-attempts: 5
      grant-window-minutes: 20
      sensitive-grant-window-minutes: 15
    rate-limit:
      per-minute:
        standard: 10
        sensitive: 5
```
- Bind via a `EmailLinkProperties` (`@ConfigurationProperties("cbm.email")`).

### 1.5 Tests
- Repository smoke tests.
- `EmailLinkServiceImplTest` for token lifecycle.
- `EmailOtpServiceImplTest` for issue/verify/lockout/grant.
- `FormFileAssetServiceImplTest` for category lookups.

### 1.6 Exit Criteria
- All migrations apply cleanly on `dev`.
- Unit tests green: `./mvnw test`.
- No behavior change in form submission flow.

---

## Phase 1.5 - Storage Confidentiality (Envelope Encryption + Azure Key Vault)

Goal: Encrypt all stored PDFs and uploads at rest using AES-256-GCM with per-file Data Encryption Keys (DEKs) that are wrapped by a Key Encryption Key (KEK) held in Azure Key Vault. Decryption happens only in-memory while streaming to an authenticated user. Defeats offline disk theft, limits server-breach blast radius, and provides independent Key Vault audit + revocation.

### 1.5.1 Threat model
- **In scope:** offline disk access, accidental filesystem exposure, backup leakage, insider with read-only filesystem access, fast detection of abnormal decrypt activity, kill-switch for incident response.
- **Out of scope (acknowledged limit):** an attacker with full root on the running app server can impersonate the service identity and call Vault unwrap. Vault still logs and rate-limits that activity, which is the value.

### 1.5.2 Cryptographic design
- **DEK:** random 256-bit AES key per file. Generated server-side via `SecureRandom`.
- **Cipher:** AES-256-GCM. Random 96-bit nonce per file. Authenticated.
- **File on disk layout:** random UUID filename; bytes are `nonce || ciphertext || authTag`. No plaintext, no original filename in path.
- **KEK:** RSA or AES key stored in Azure Key Vault. Used only for `wrapKey` / `unwrapKey`. Never leaves Vault.
- **DEK at rest:** wrapped DEK + `kek_key_id` + `kek_version` stored in DB row (`form_file_asset`).
- **Streaming reads:** unwrap DEK via Vault -> create `CipherInputStream` -> stream to HTTP response -> zero DEK from memory after stream completes.
- **No temp files.** All decrypted bytes flow through in-memory buffers.

### 1.5.3 Schema additions (Flyway `V26__file_asset_encryption.sql`)
Add to `form_file_asset` (created in Phase 1.1):
- `dek_wrapped` (`BYTEA` / `BLOB`)
- `dek_alg` (`VARCHAR`, e.g., `AES-256-GCM`)
- `nonce` (`BYTEA`, 12 bytes)
- `kek_key_id` (`VARCHAR`, full Vault key identifier)
- `kek_version` (`VARCHAR`, version GUID from Vault)
- `wrap_alg` (`VARCHAR`, e.g., `RSA-OAEP-256` or `A256KW`)
- `encrypted` (`BOOLEAN`, default `false`)

Note: making this an additive migration (not in Phase 1.1) so existing rows remain valid until backfill.

### 1.5.4 Code additions
- `services/security/KeyProvider` (interface)
  - `wrapKey(byte[] dek) -> WrappedDek`
  - `unwrapKey(WrappedDek wrapped) -> byte[] dek`
  - `currentKeyVersion()`
- `services/security/impl/AzureKeyVaultKeyProvider`
  - Backed by `azure-security-keyvault-keys` SDK.
  - Uses managed identity in stage/prod; dev uses environment credentials.
- `services/security/impl/LocalKeyProvider`
  - Dev/test only. Reads a KEK from `keys/kek-local.key`.
  - Hard-disabled in stage/prod profiles.
- `services/security/FileEncryptionService`
  - `encryptToFile(InputStream plaintext, Path target) -> EncryptionMetadata`
  - `decryptStream(FormFileAsset asset, OutputStream out)`
  - `decryptInputStream(FormFileAsset asset) -> InputStream`
- Update `FormFileAssetService.registerAsset(...)` to call `FileEncryptionService.encryptToFile(...)` and persist returned `EncryptionMetadata`.
- Update `ZipPackageService` and per-file download endpoint to stream via `FileEncryptionService.decryptInputStream(...)`.
- Add `dek` zeroization helper to clear key bytes after use.

### 1.5.5 Configuration additions
Add to `application.yml`:
```yaml
cbm:
  security:
    encryption:
      provider: LOCAL          # LOCAL | AZURE_KEY_VAULT
      azure:
        vault-uri: https://<vault-name>.vault.azure.net/
        kek-key-name: cbm-portal-kek
        # version is resolved dynamically; rotation creates a new version
      local:
        kek-path: keys/kek-local.key   # dev only
      enabled-categories:
        - PDF
        - ID_BADGE
        - GOV_ID_FRONT
        - GOV_ID_REAR
        - SSN_CARD
```
- Stage/prod overrides set `provider: AZURE_KEY_VAULT`.

### 1.5.6 Azure Key Vault setup (one-time)
Operator steps. Substitute actual values for `<RG>`, `<VAULT>`, `<REGION>`, `<APP_IDENTITY>`.

1. Create the vault (purge protection + soft delete enabled):
   ```bash
   az group create -n <RG> -l <REGION>
   az keyvault create \
     --name <VAULT> --resource-group <RG> --location <REGION> \
     --enable-rbac-authorization true \
     --enable-purge-protection true \
     --retention-days 90
   ```
2. Create the KEK (RSA-3072 recommended; HSM-backed if available):
   ```bash
   az keyvault key create \
     --vault-name <VAULT> --name cbm-portal-kek \
     --kty RSA --size 3072 --ops wrapKey unwrapKey
   ```
   For HSM-backed:
   ```bash
   az keyvault key create \
     --vault-name <VAULT> --name cbm-portal-kek \
     --kty RSA-HSM --size 3072 --ops wrapKey unwrapKey
   ```
3. Grant the app's managed identity least-privilege role:
   ```bash
   az role assignment create \
     --assignee <APP_IDENTITY_OBJECT_ID> \
     --role "Key Vault Crypto User" \
     --scope $(az keyvault show -n <VAULT> --query id -o tsv)
   ```
4. Network restrictions:
   - Restrict Vault firewall to the app's outbound IPs / VNet.
   - Deny public network access if app runs in a private VNet.
5. Enable Vault diagnostics:
   - Send `AuditEvent` logs to Log Analytics / SIEM.
   - Alert on:
     - unexpected `unwrapKey` source IP,
     - `unwrapKey` failure spikes,
     - any `delete`/`disable`/`purge` events on the KEK.
6. Production hardening:
   - No client-secret credentials. Use Managed Identity (Azure-hosted) or Workload Identity. Avoid storing Vault secrets on disk.
   - Tag the key with owner, purpose, rotation policy.
7. Rotation policy (recommended): yearly, manual review:
   ```bash
   az keyvault key rotation-policy update \
     --vault-name <VAULT> --name cbm-portal-kek \
     --value '{"lifetimeActions":[{"trigger":{"timeAfterCreate":"P1Y"},"action":{"type":"Rotate"}}],"attributes":{"expiryTime":"P2Y"}}'
   ```

### 1.5.7 Key rotation behavior (answering the rotation concern)
- Each `form_file_asset` row records `kek_key_id` AND `kek_version`.
- On read, app calls `unwrapKey` against the **stored version**. Old versions remain valid in Vault even after rotation, so files stay readable indefinitely.
- On new write, app uses the **current version**.
- Optional background `KekRewrapJob`:
  - Periodically re-wraps existing DEKs with the latest KEK version.
  - Lets you eventually retire old versions if needed.
- **Rotation does not break access.** **Revoke/disable does** — that is the kill switch for incident response.

### 1.5.8 Key revocation behavior (kill switch)
- `az keyvault key set-attributes --enabled false ...` on a specific version blocks unwrap immediately.
- Disabling the whole key blocks all unwraps. Files remain on disk but become indecipherable.
- Recovery: re-enable the version. Soft delete + purge protection prevents accidental permanent loss.
- Document this as the "compromise response" runbook.

### 1.5.9 Backfill of existing files
- Add `EncryptExistingFilesJob` (admin-triggered, idempotent).
- For each unencrypted `form_file_asset`:
  - Read plaintext bytes.
  - Generate new DEK, encrypt to a new file (`<uuid>.enc`).
  - Wrap DEK with current KEK version.
  - Update row: new `storage_path`, encryption metadata, `encrypted = true`.
  - Securely delete old plaintext file (`shred`/secure unlink).
- Run in stage first; require admin confirmation in prod.

### 1.5.10 OS-level defense in depth
- LUKS/dm-crypt on data volume.
- Service account owns data dir, mode `0700`; files `0600`.
- SELinux/AppArmor profile restricts service to its data dir + Vault outbound.
- No login shell for service account.
- Routine backup encryption with separately-managed keys.

### 1.5.11 Tests
- Unit:
  - `FileEncryptionService` round-trips: encrypt then decrypt yields original bytes; tampered ciphertext fails GCM tag check.
  - `LocalKeyProvider` wrap/unwrap parity.
  - `AzureKeyVaultKeyProvider` mocked client wrap/unwrap.
- Integration (against real Vault, stage only):
  - End-to-end submit -> encrypted file written -> authenticated download decrypts and streams.
  - Rotation: write file with KEK v1; rotate to v2; existing file still reads; new file uses v2.
  - Revocation: disable v1; existing v1-wrapped file now fails with `403` + audit row.
- Security:
  - Plaintext never appears on disk (instrument by failing the test if any temp file path is created).
  - Memory zeroization on DEK after stream close (sampled).

### 1.5.12 Exit Criteria
- Stage uses `AZURE_KEY_VAULT` provider; production config staged but disabled until rollout.
- All new submissions write encrypted files with valid metadata.
- Backfill of existing files completed in stage; runbook prepared for prod.
- Vault audit logs flowing to SIEM with alerts configured.
- Documented runbooks for rotation and revocation.

### 1.5.13 Operator Runbook
See `deployment/KEY_VAULT_RUNBOOK.md` for step-by-step procedures covering:
- Routine yearly rotation
- Emergency disable (kill switch) and re-enable
- Retiring old KEK versions (re-wrap)
- Restoring a soft-deleted key
- Backfilling encryption for legacy files
- Health checks and audit log queries
- Incident response playbooks

For one-time environment provisioning, see `deployment/KEY_VAULT_SETUP.md`.

---

## Phase 2 - Asset Registration in Form Pipeline

Goal: When forms are processed, persist PDF (and uploads where applicable) as `form_file_asset` rows. Still no email changes.

### 2.1 Wire registration
- In `AbstractFormProcessingService.processForm`, after PDF generation, call `formFileAssetService.registerAsset(...)` with `fileCategory = PDF`.
- In `NewHireServiceImpl.processNewHire`, after files are saved, register:
  - `PDF`, `ID_BADGE`, `GOV_ID_FRONT`, `GOV_ID_REAR`, `SSN_CARD`.

### 2.2 Backfill (optional)
- Add a `@Profile("dev")` admin endpoint or one-off command to backfill existing PDFs into `form_file_asset` if needed for testing only. Do not run in stage/prod yet.

### 2.3 Tests
- Integration test: submitting a form creates expected `form_file_asset` rows.
- `NewHire` submission creates all 5 categories.

### 2.4 Exit Criteria
- All existing form submission tests still pass.
- New asset rows appear on dev submissions.
- No mail behavior change.

---

## Phase 3 - Standard PDF Link Delivery (DUAL mode)

Goal: Add tokenized link to emails for non-`NewHire` forms while still attaching the PDF.

### 3.1 Email body builder
- Add `services/util/EmailLinkBodyBuilder` (or extend `HtmlFragmentBuilder`):
  - Renders link section: "Download PDF" -> tokenized URL.
  - Adds internal-use banner.

### 3.2 Update `AbstractFormProcessingService.sendEmailNotification`
- If `cbm.email.delivery.mode` in (`LINK`, `DUAL`):
  - Create token via `EmailLinkService` (standard sensitivity).
  - Append link block to email body.
- If mode is `LINK`: send without attachment.
- If mode is `DUAL`: send with attachment AND link.
- If mode is `ATTACHMENT`: existing behavior.

### 3.3 Endpoints
Reuse existing admin endpoint:
- `GET /api/v1/admin/forms/{formType}/{formToken}/pdf`
  - Update controller to honor `EmailAccessToken` semantics if token is an email link token (validate, increment download count, audit).
  - Existing admin tokenized URL behavior preserved for the admin UI.

### 3.4 Auditing
- On every PDF download, write to `email_access_audit`.
- Include correlation id.

### 3.5 Stage rollout
- Set `cbm.email.delivery.mode=DUAL` in `application-stage.yml`.
- Validate emails show both attachment and link.

### 3.6 Tests
- Unit: body builder, token issuance per submission.
- Integration: submission in `DUAL` mode -> email contains link + attachment; submission in `LINK` mode -> attachment removed.
- Security: tampered token returns `404`.

### 3.7 Exit Criteria
- DUAL mode works in stage with no regressions.
- Audit rows generated for downloads.
- Office users confirm UX is acceptable.

---

## Phase 4 - NewHire Link Package + Step-up

Goal: Switch `NewHire` to link-only delivery with uploads ZIP and mandatory OTP step-up.

### 4.1 Endpoints (new)
- `GET /api/v1/admin/forms/{formType}/{formToken}/uploads.zip`
  - `ZipPackageService` streams uploads only (no PDF).
- `GET /api/v1/admin/forms/{formType}/{formToken}/files/{fileCategory}`
  - Authorized per-file download for any registered category.
- `POST /api/v1/admin/forms/{formType}/{formToken}/otp/request`
- `POST /api/v1/admin/forms/{formType}/{formToken}/otp/verify`

### 4.2 OTP enforcement
- Add a method security check (e.g., `@RequiresStepUp("SENSITIVE")`) that:
  - Looks up active grant via `EmailOtpService.hasActiveGrant(...)`.
  - Returns `403` with structured `ErrorResponse` if missing/expired.

### 4.3 NewHire email switch
- Update `NewHireServiceImpl.sendEmailNotification`:
  - Compose link-only body: PDF link + uploads ZIP link + per-file links.
  - Drop attachments when mode is `LINK` for `NewHire`.

### 4.4 Landing/UX
- Frontend renders a package page:
  - List of available items.
  - OTP prompt for sensitive items.
  - Clear expiry/usage messaging.
- Coordinate UI ticket with frontend team.

### 4.5 Rate limiting & security
- Apply per-token + per-IP limits (`sensitive` profile).
- Path traversal protection in file serving (use canonical paths from `form_file_asset.storage_path`).
- Reject filename traversal from `fileCategory` (whitelist enum).

### 4.6 Tests
- Integration: full `NewHire` flow under `LINK`:
  - email contains links, no attachments;
  - download requires OTP;
  - valid OTP grants window;
  - exhausted attempts lock out;
  - ZIP and per-file routes both audited.
- Security: replay of consumed token rejected; revoked token rejected.

### 4.7 Exit Criteria
- `NewHire` delivered with no attachments in stage for a full week.
- Audit + alerts show healthy patterns.
- No outstanding security findings.

---

## Phase 5 - Cutover to LINK Mode (All Forms)

Goal: Default delivery becomes `LINK`.

### 5.1 Stage validation gate
- DUAL mode metrics reviewed:
  - Token issuance error rate < 0.1%.
  - PDF download success rate >= 99%.
  - OTP success rate >= 95% on `NewHire`.
- No unresolved P1 incidents in 7 days.

### 5.2 Flip default
- Set `cbm.email.delivery.mode=LINK` in `application-stage.yml`, then `application.yml` (prod profile).
- Communicate change to users prior to flip.

### 5.3 Observability
- Dashboards: token lifecycle counters, OTP failure rate, ZIP requests, rate-limit hits.
- Alerts: spike in `OTP_VERIFY_FAIL`, repeated `404` token attempts, ZIP errors.

### 5.4 Exit Criteria
- Production stable for 14 days under `LINK`.
- Office adoption confirmed by usage telemetry (links clicked, OTP completed).

---

## Phase 6 - Cleanup and Hardening

Goal: Remove dead paths and tighten defaults if metrics support it.

### 6.1 Code cleanup
- Remove or feature-flag-off attachment code paths in `AbstractFormProcessingService` and `NewHireServiceImpl` once `LINK` is stable.
- Keep `MicrosoftGraphEmailService.sendEmail(..., attachments)` API for break-glass scenarios; mark as `@Deprecated` if no longer routinely used.

### 6.2 Retention & cleanup jobs
- Scheduled job to:
  - Soft-delete expired `email_access_token` rows after grace period.
  - Purge old `email_access_audit` per retention policy.
  - Optionally archive uploads after defined retention.

### 6.3 Optional tightening
- Reduce standard `max-downloads` from `2` to `1` after adoption data.
- Reduce sensitive `ttl-hours` from `12` to `8` if support load allows.

### 6.4 Exit Criteria
- Dead code removed or clearly flagged.
- Retention job in place and verified.
- Documentation updated (`CHANGELOG.md`, integration guides).

---

## Cross-Phase Concerns

### Backward Compatibility
- All migrations are additive.
- `pdf_file_path` and `email_status` columns retained for existing flows.
- Admin UI tokenized PDF URLs unchanged in path shape (same `/api/v1/admin/forms/{type}/{token}/pdf`).

### Rollback Strategy
- Each phase reverts by flipping the feature flag back (`LINK` -> `DUAL` -> `ATTACHMENT`).
- Migrations are forward-compatible; no destructive changes until Phase 6.

### Security Review Checkpoints
- Phase 1: data model + token claims review.
- Phase 1.5: cryptographic design + Key Vault policy review.
- Phase 3: stage review of DUAL mode emails (no PII leakage in body).
- Phase 4: full security review of NewHire flow (OTP, ZIP, path traversal, rate limits).
- Phase 5: pre-prod sign-off.

### Documentation Updates
- After Phase 3: update `PDF_EMAIL_FRONTEND_INTEGRATION.md` with link/audit semantics.
- After Phase 4: add `NEWHIRE_LINK_PACKAGE_INTEGRATION.md` for frontend package page.
- After Phase 5: update `CHANGELOG.md` and release notes.

---

## Quick Phase Summary

| Phase | Theme                          | User Impact          | Default Mode |
|------:|--------------------------------|----------------------|--------------|
| 0     | Pre-flight                     | None                 | ATTACHMENT   |
| 1     | Foundations (data + services)  | None                 | ATTACHMENT   |
| 1.5   | Storage confidentiality (KMS)  | None                 | ATTACHMENT   |
| 2     | Asset registration             | None                 | ATTACHMENT   |
| 3     | Standard link delivery (DUAL)  | Link added to emails | DUAL         |
| 4     | NewHire link package + OTP     | NewHire link-only    | DUAL / LINK  |
| 5     | Cutover to LINK                | All forms link-only  | LINK         |
| 6     | Cleanup + retention            | Internal hardening   | LINK         |

---

## First Sprint Task Checklist (Phase 1)

- [ ] Create `V22..V25` Flyway migrations (`V26` for Phase 1.5 encryption).
- [ ] Add `FormFileAsset`, `EmailAccessToken`, `EmailAccessAudit`, `EmailOtpChallenge` entities and repos.
- [ ] Implement `EmailLinkService`, `EmailOtpService`, `FormFileAssetService`, `ZipPackageService`.
- [ ] Add `EmailLinkProperties` and wire defaults.
- [ ] Unit tests for token + OTP lifecycle.
- [ ] Update `AGENTS.md` references if needed.
- [ ] Verify `./mvnw clean test` passes.






