[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83901":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":45,"readmeContent":46,"aiSummary":10,"trendingCount":16,"starSnapshotCount":16,"syncStatus":14,"lastSyncTime":47,"discoverSource":48},83901,"s4","abyo-software\u002Fs4","abyo-software","GPU-accelerated transparent compression S3-compatible storage gateway. Drop-in replacement for AWS S3 endpoints; cuts your S3 bill 50-80% with no app changes (Rust, nvCOMP, zstd).","https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4",null,"Rust",76,5,2,1,0,3,25,20,2.33,"Apache License 2.0",false,"main",true,[26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],"apache-2","aws-s3","compression","cost-optimization","cuda","data-lake","gpu","minio","nvcomp","observability","opentelemetry","parquet","prometheus","rust","s3","s3-compatible","storage-gateway","transparent-compression","zstd","2026-06-12 02:04:36","# S4 — Squished S3\n\n[![CI](https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4\u002Factions\u002Fworkflows\u002Fci.yml)\n[![Nightly Fuzz](https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4\u002Factions\u002Fworkflows\u002Ffuzz-nightly.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4\u002Factions\u002Fworkflows\u002Ffuzz-nightly.yml)\n[![AWS E2E](https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4\u002Factions\u002Fworkflows\u002Faws-e2e.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4\u002Factions\u002Fworkflows\u002Faws-e2e.yml)\n[![License: Apache-2.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache--2.0-blue.svg)](LICENSE)\n[![Rust](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Frust-1.92%2B-orange.svg)](https:\u002F\u002Fwww.rust-lang.org)\n\n> **Drop-in S3-compatible storage gateway with GPU-accelerated transparent compression.**\n> Reduces S3 **storage bytes** 50–80% for compressible payloads (logs, JSON,\n> Parquet\u002FORC) without changing application code. Total bill impact depends on\n> workload mix — request cost \u002F egress \u002F GPU compute are unchanged.\n\n**Headline numbers** (RTX 4070 Ti SUPER + Ryzen 9 9950X, single-pass roundtrip\nthrough `s4-codec`, last benchmarked 2026-05-13 on nvCOMP 5.2.0.10 \u002F CUDA\n13.2 driver 595.58.03; full table + reproduction recipe below):\n\n| Workload | Best ratio | Best compress throughput | Codec verdict |\n|---|---:|---:|---|\n| nginx access log (256 MiB)   | **155×** (cpu-zstd-3) | 3.7 GB\u002Fs (cpu-zstd-3) | CPU wins — text deduplicates well at low CPU cost |\n| Parquet-like mixed (256 MiB) | **2.09×** (nvcomp-bitcomp) | 1.5 GB\u002Fs (nvcomp-bitcomp) | GPU wins on Bitcomp for integer\u002Fcolumnar layouts |\n| Postings (u32, 64 MiB)       | **11.9×** (nvcomp-bitcomp) | 1.6 GB\u002Fs (nvcomp-bitcomp) | GPU wins decisively on monotonic integer columns |\n| Already-compressed (64 MiB)  | 1.00× (passthrough)  | 2.2 GB\u002Fs (passthrough)| Dispatcher detects + skips — no codec cost |\n\n**Codec selection is not always GPU** (#96 #97). The dispatcher samples\nentropy + magic bytes and routes per object:\n\n- text \u002F log → `cpu-zstd-3` (often beats GPU codecs both on ratio AND\n  throughput at the input size where everything fits in L3)\n- columnar integers (Parquet \u002F postings \u002F time-series) →\n  `nvcomp-bitcomp` (GPU's strength on integer\u002Fcolumnar layouts).\n  Two modes:\n  - explicit: `--codec nvcomp-bitcomp` always picks Bitcomp regardless\n    of sample content\n  - automatic: `--prefer-columnar-gpu` (opt-in) lets the sampling\n    dispatcher detect a u32 \u002F u64 LE integer column via per-stride\n    byte-position entropy and route to Bitcomp once the body is\n    `>= --gpu-min-bytes`. Default is off so v0.8.11-or-earlier\n    deployments are bit-for-bit unchanged\n- already-compressed (mp4 \u002F jpeg \u002F parquet-with-zstd-block-codec \u002F `.gz`\n  detected by magic byte) → `passthrough` (no harm done)\n- non-GPU build OR no GPU at runtime → CPU codecs end-to-end\n\nObserve per-codec request distribution via PromQL\n`sum by (codec) (rate(s4_requests_total[5m]))` (the `codec`\nlabel on `s4_requests_total` carries the chosen codec name)\nPrometheus counter, or per-PUT in the structured JSON access log\n(`{\"codec_chosen\":\"...\"}`). GPU is a multiplier on the *integer\u002Fcolumnar*\nside of mixed workloads, not a blanket \"compress with GPU\" claim.\n\nTranslated to AWS S3 Standard at $0.023\u002FGB\u002Fmonth: **1 TiB of nginx log\ndata → ~6.6 GiB stored → $0.15\u002Fmonth vs $23.55\u002Fmonth uncompressed (99%\nstorage savings, single-pass)**. Mixed-content Parquet workloads see ~50%\nstorage savings.\n\n**What this number does and doesn't cover** (#95): storage-bytes only.\nPUT\u002FGET request cost is unchanged (1 PUT in = 1 PUT out, plus a small\n`.s4index` sidecar PUT for indexed range-read). Egress is unchanged\n(GET serves the decompressed payload). GPU compute is a separate cost\n(c. EC2 g4dn \u002F g5 hourly) — pays for itself on TB-scale, not GB-scale,\ningest. See [Cost savings — does S4 make sense for your bill?](#cost-savings-does-s4-make-sense-for-your-bill) below for the\nbreak-even maths.\n\n---\n\n## What is S4?\n\nS4 (**Squished S3**) is an S3-compatible storage gateway written in Rust that\nsits between your applications (boto3 \u002F aws-sdk \u002F aws-cli \u002F Spark \u002F Trino \u002F\nDuckDB \u002F anything S3) and your real S3 bucket — and **transparently compresses\neach object** with a codec the dispatcher picks per-payload: GPU\n(NVIDIA nvCOMP zstd \u002F Bitcomp \u002F GDeflate) for integer\u002Fcolumnar data, CPU\nzstd \u002F gzip for text\u002Flog, passthrough (no codec cost) for already-compressed\ninputs. See [the codec verdict table](#headline-numbers) above for the routing rules.\n\n```\n                        endpoint: s4.example.com\n   your application ──────────────────────────▶  S4 (this project)\n   (boto3, Spark,                                       │\n    Trino, ...)                                         ▼\n                                            (compress with GPU)\n                                                        │\n                                                        ▼\n                                                 AWS S3 (real bucket)\n```\n\n- **No app changes**: same S3 wire protocol, same SigV4 auth, same SDK calls\n- **Transparent**: PUT compresses, GET decompresses; clients see the original bytes\n- **Open format, no lock-in**: stop the gateway and the **compressed\n  objects + S4IX sidecars remain S3-native** — readable by stock `aws-cli`\n  \u002F boto3 \u002F any S3 client. The **original payload** then requires\n  `s4-codec` (CLI tool), `s4-codec-py` (pip), or `s4-codec-wasm` (browser)\n  to decompress — all Apache-2.0, ~1k LOC of pure decode, no gateway runtime\n  needed. The wire format (S4F2 frame + S4IX sidecar) is documented in\n  the source: [`crates\u002Fs4-codec\u002Fsrc\u002Fmultipart.rs`](crates\u002Fs4-codec\u002Fsrc\u002Fmultipart.rs) (frame layout) and\n  [`crates\u002Fs4-codec\u002Fsrc\u002Findex.rs`](crates\u002Fs4-codec\u002Fsrc\u002Findex.rs) (sidecar layout)\n\n## Why S4?\n\n| Problem | Solution |\n|---|---|\n| Your S3 bill grows linearly with data, but most data is ≥3× compressible | S4 compresses on the way in, charging you only for the squished bytes |\n| Your apps don't compress data themselves (and you don't want to change them) | S4 is a wire-compatible drop-in — just change `--endpoint-url` |\n| Existing object-storage compressors (MinIO S2, Garage zstd) are CPU-only | S4 supports nvCOMP **GPU** codecs — Bitcomp gives 3.6–7.5× on integer columns |\n| Analytics workloads need byte-range reads | S4 supports `Range` GET via sidecar frame index (parquet\u002FORC reader compatible) |\n\n## Stability — v1.0 guarantees\n\nS4 ships under the [Semantic Versioning](https:\u002F\u002Fsemver.org\u002Fspec\u002Fv2.0.0.html)\ncontract as of **v1.0.0**. That means the items below are stable for the\nv1.x line — any incompatible change to them ships under a **v2.0.0**\nrelease with migration guidance, not a v1.x patch.\n\n### What's stable (= v2.0 if broken)\n\n| Surface | Frozen at v1.0 |\n|---|---|\n| **Wire formats** on the backend | `S4F2` framed body + `S4P1` padding (multipart + single-PUT framed objects); `S4IX` v1 \u002F v2 \u002F v3 sidecar layouts; `S4E1` \u002F `S4E2` \u002F `S4E3` \u002F `S4E4` \u002F `S4E5` \u002F `S4E6` SSE envelopes. A v1.x reader can read any byte stream another v1.x server has written, in either direction. **Cross-major back-compat caveats:** (a) v0.8.x readers handle `S4IX` v1 \u002F v2 but return `UnsupportedVersion(3)` on v3 sidecars (introduced in v0.9 #106 for SSE-S4 chunked \u002F `S4E6` partial-fetch); deployments without an SSE-S4 keyring configured (= `--sse-s4-key*` flags unset) never emit v3 sidecars and are bidirectionally compatible with v0.8.x. The default `--sse-chunk-size` is 1 MiB and IS active whenever SSE-S4 is enabled, so SSE-S4 deployments DO emit v3 by default. (b) `S4E6` was introduced in v0.8.1 (commit `a7333f2`), so any v0.8.1+ reader recognizes it; only the v0.8.0 hot-fix line lacks `S4E6` support and would refuse SSE-S4 chunked objects. (c) v0.8.x server binaries can still read all v1.0-written framed bodies + v1\u002Fv2 sidecars + S4E1–S4E5 envelopes — the only cross-major refusals are the two listed above. |\n| **`s4` binary subcommands** (CLI surface) | `verify-sidecar`, `repair-sidecar`, `sweep-orphan-sidecars`, `verify-audit-log`, plus the long-running server's documented `--\u003Cflag>` set. New flags are additive (default off). |\n| **`s4_server::repair::*` public API** | `verify_sidecar`, `repair_sidecar` (and the `_with_keyring` variant), `sweep_orphan_sidecars`. Types: `RepairError`, `SidecarStatus`, `RepairReport`, `OrphanReason`, `OrphanReport`, `SweepReport`, `VerifyReport`, `DeletePolicy`, `RepairSseBinding`. All public enums in this module are `#[non_exhaustive]` — adding a new variant in a minor release is **not** breaking (downstream `match` must use a catch-all arm). Public structs (`RepairReport`, `OrphanReport`, `SweepReport`, `VerifyReport`, `RepairSseBinding`) are NOT `#[non_exhaustive]`; their public field set is frozen as-is, additions to those structs are v2.0 territory. Library consumers can pin `s4-server = \"1\"` and rebuild against any v1.x without code changes. |\n| **`s4_server::service::S4Service` shape** | The `S4Service` struct itself, its `Default` impl, and its builder API are frozen. The builder API is the long-form `S4Service::default().with_\u003Cknob>(value)...` chain — every `pub fn with_*` currently visible on `S4Service` (e.g. `with_sse_key`, `with_sse_keyring`, `with_sse_chunk_size`, `with_secure_transport`, `with_trust_x_forwarded_for`, `with_max_body_bytes`, `with_sigv4a_gate`, `with_kms_backend`, `with_replication`, `with_replication_max_concurrent`, `with_versioning`, `with_object_lock`, `with_mfa_delete`, `with_cors`, `with_lifecycle`, `with_inventory`, `with_notifications`, `with_tagging`, `with_policy`, `with_access_log`, `with_rate_limits`, `with_compliance_strict`, `with_allow_legacy_reserved_key_reads`) is locked to its current `fn(self, …) -> Self` signature; renames or signature changes ship under v2.0. **Adding** a new `with_\u003Cknob>` builder is additive (ships in a minor). The `SharedService` newtype at `s4_server::service_arc::SharedService` (the externally-supported \"wrap an `S4Service` for clone-able shared use\" path), `SigV4aGate` + `SigV4aGateError`, `resolve_range`, the `DEFAULT_MAX_BODY_BYTES` + `DEFAULT_REPLICATION_MAX_CONCURRENT` constants, and the wrapping pattern (`Arc\u003CS4Service>` is the supported handle shape) are frozen. Implementation internals behind `S4Service` (request routing, multipart state, etc.) remain refactorable as long as the listed surface stays bit-equivalent at the call site. |\n| **`s4_server::sse` public surface** | Types: `SseKey`, `SseKeyring`, `SharedSseKey` (= `Arc\u003CSseKey>`, parameter type of `S4Service::with_sse_key`), `SharedSseKeyring` (= `Arc\u003CSseKeyring>`), `SseError`, `SseSource\u003C'a>`, `S4E6Header\u003C'a>` (return type of `parse_s4e6_header`). Functions: `compute_key_md5`, `encrypt`, `decrypt`, `encrypt_v2`, `parse_s4e6_header`, `peek_magic`. Constants: `SSE_C_ALGORITHM`, `ALGO_AES_256_GCM`, `SSE_MAGIC_V5`, `S4E5_HEADER_BYTES`, `S4E6_HEADER_BYTES`. New SSE envelopes (e.g. provisional `S4E7` chunked-KMS) ship as **additive** symbols and do not break the v1.x contract. |\n| **`s4_server::streaming` public surface** | `DEFAULT_S4F2_CHUNK_SIZE` constant, `streaming_compress_to_frames` + `streaming_compress_to_frames_with` functions. The `StreamingBlob` type alias remains stable. |\n| **`s4-codec` codec trait + format constants** | `Codec` trait shape, `CodecKind` enum (all `#[non_exhaustive]`), `CodecError`, `IndexError`, `FrameError`, `GpuSelectError`, `CompareOp`. Constants: `index::{SIDECAR_SUFFIX, MAX_FRAMES, MAX_ETAG_BYTES, ENTRY_BYTES, HEADER_FIXED_V1, HEADER_FIXED_V2, INDEX_VERSION, INDEX_VERSION_V1, INDEX_VERSION_V2}`. Items: `index::{FrameIndex, encode_index, decode_index}`, `multipart::FrameHeader` layout. Python (`s4-codec-py`) and WASM (`s4-codec-wasm`) bindings export the same surface and are frozen at v1.0 in lockstep. |\n| **HTTP API surface** | S3 wire compatibility — the [`s3s 0.13`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fs3s\u002F0.13.0) trait set S4 implements. PUT \u002F GET \u002F Range GET \u002F multipart \u002F SigV4 \u002F SigV4a \u002F `x-amz-checksum-*` \u002F `x-amz-server-side-encryption-*` headers all preserved. **`s3s` is itself pre-1.0**; our v1.x contract is that we will continue to track the `s3s 0.13` trait surface that S4 currently implements, accepting backward-compatible additions in `s3s` minors. A `s3s` major bump (0.14, 1.0) that breaks our trait impls would itself trigger a v2.0 of S4 with a clear migration in `docs\u002Fmigration\u002F`. |\n| **Container image tags + Helm chart `values.yaml` keys** | `ghcr.io\u002Fabyo-software\u002Fs4:\u003Cmajor>.\u003Cminor>.\u003Cpatch>` + `:\u003Cmajor>.\u003Cminor>` + `:latest` floating tag rules; GPU build sibling tags `:\u003Cmajor>.\u003Cminor>.\u003Cpatch>-gpu`. The complete top-level `values.yaml` key set is frozen: `replicas`, `image.{repository, tag, pullPolicy, pullSecrets}`, `nameOverride`, `fullnameOverride`, `serviceAccount.{create, annotations, name}`, `backend.{endpointUrl, region}`, `codec`, `zstdLevel`, `dispatcher`, `logFormat`, `otlpEndpoint`, `gpu.{enabled, count, nodeSelector, runtimeClassName}`, `tls.{enabled, cert, key, existingSecret, certKey, keyKey}`, `policy.{json, existingConfigMap}`, `service.{type, port, annotations}`, `ingress.{enabled, className, annotations, hosts, tls}`, `resources.{requests, limits}`, `podAnnotations`, `podLabels`, `podSecurityContext`, `securityContext`, `nodeSelector`, `tolerations`, `affinity`, `extraEnv`, `extraVolumes`, `extraVolumeMounts`, `probes.{liveness, readiness}`. **Default values** may shift in a minor release (e.g. a probe tuning change to reduce flake); the **key shape** (key names + structure) is v2.0 territory. |\n\n### Modules NOT in the freeze list\n\n`s4-server` ships 30 `pub mod` declarations from `crates\u002Fs4-server\u002Fsrc\u002Flib.rs` so the `s4` binary (which is a separate crate) + the integration tests + the example binaries can reach the surface they need. Five modules contribute frozen items above: `repair`, `service`, `sse`, `streaming`, and `service_arc` (the last contributes only `SharedService`; the rest of `service_arc`'s contents are not frozen).\n\nLibrary consumers MAY `use s4_server::\u003Cother_module>::*;` — Rust visibility allows it — but those imports are **not frozen** and may break in any v1.x minor release without notice. The other 25 modules (`access_log`, `acme`, `audit_log`, `blob`, `cors`, `inventory`, `kms`, `lifecycle`, `lock_recovery`, `metrics`, `mfa`, `multipart_state`, `notifications`, `object_lock`, `policy`, `rate_limit`, `replication`, `routing`, `select`, `sigv4a`, `state_loader`, `streaming_checksum`, `tagging`, `tls`, `versioning`) exist as `pub mod` for binary-and-tests' needs, not as a published surface.\n\nIf you depend on one of these unfrozen modules, pin a precise `=1.x.y` (rather than `^1`) and treat any minor bump as a manual integration step. If you would like an item promoted to the frozen surface, please file an issue with the use case.\n\n### Backend compatibility matrix (CI-verified surface)\n\n[`compat-matrix.yml`](.github\u002Fworkflows\u002Fcompat-matrix.yml) runs a 1 PUT + 1\nGET + sidecar HEAD round-trip per backend through a live s4-server, on a\nweekly schedule and via `workflow_dispatch`. CI-verified backends as of\nv1.0:\n\n| Backend | Tier | CI status |\n|---|---|---|\n| MinIO | docker | ✓ gating |\n| AWS S3 | real cloud | ✓ gating (`aws-e2e.yml`) |\n| Backblaze B2 | real cloud | ✓ gating (operator-configured secrets) |\n| Cloudflare R2 | real cloud | ✓ gating (operator-configured secrets) |\n| Wasabi | real cloud | ✓ gating (operator-configured secrets) |\n| Garage | docker | ⚠ claimed but not currently CI-verified — `dxflrs\u002Fgarage:v1.1.0` rejects `STREAMING-AWS4-HMAC-SHA256-PAYLOAD` from current `aws-sdk-rust` (worked in v0.x against older garage); the round-trip step is `continue-on-error` until either s4-server pins `UNSIGNED-PAYLOAD` on the relay path or garage v1.2+ ships chunked-signed support. |\n| Ceph RGW | docker, best-effort | ⚠ claimed but not currently CI-verified — `quay.io\u002Fceph\u002Fdemo:latest-quincy` is unmaintained upstream and drifts on the streaming checksum wire shape (`XAmzContentSHA256Mismatch`). Real Ceph clusters operated by users should work because Ceph RGW production releases track the AWS wire spec; we need a maintained demo image or an operator-CI hook to re-introduce gating coverage. |\n\nThe compat-matrix job's start-step always gates (provisioning failure = workflow failure); the round-trip step is `continue-on-error` only for the two backends above. The status here is the source of truth — if a backend isn't in this table as `✓ gating`, treat the README's other compat claims as \"should work given S3 wire compatibility, not asserted by CI.\"\n\n### What's not promised (operator-tunable \u002F explicitly opt-in)\n\n- Compression **ratios** + **throughput numbers**: these are workload-dependent and benchmark conditions are published, not promised SLAs.\n- Default values for `--max-body-bytes`, `--sse-chunk-size`, `--gpu-min-bytes`, and similar runtime tunables: defaults may shift in a minor release if a clear correctness \u002F safety reason warrants it (the v0.9 #106-32bit fix that clamped to `isize::MAX` on 32-bit is an example of a default the SemVer-stable contract did not protect).\n- Implementation details inside frozen modules (private functions, struct field reordering, internal trait impls): the v1.0 freeze pins the *items listed above*, not \"every line in `service.rs`\". Re-arranging request-routing internals is fine in a minor.\n- Backend behavior beyond S3-wire-spec compliance (e.g. how a specific backend handles a particular SigV4 edge case): we test the documented backends (see §\"Backend compatibility matrix\"), but breakage caused by a backend-side change is not a v2.0 trigger on our end.\n- Experimental flags marked `--allow-legacy-*` or surfaced as `unstable` in `--help`: explicitly opt-in to behavior that may change.\n- Cross-region replication and the `replication.*` config surface: shipped as **experimental scaffolding** in v0.6 with the wire path stubbed in but no production-grade reconciliation. Excluded from v1.0 freeze; promotion to first-class (with Jepsen-class consistency tests) is on the v1.x roadmap below.\n- Security advisories accepted as risk-with-mitigation: see [`docs\u002Fsecurity\u002Fcargo-audit-ignores.md`](docs\u002Fsecurity\u002Fcargo-audit-ignores.md) for the 4 currently-ignored RUSTSEC advisories, each with rationale, mitigation, and upstream-tracking links. The ignore list is part of CI (`cargo audit` is a merge-block); changes to the list are visible in the diff.\n\n### v1.x roadmap candidates (= shipping under v1.x without breaking the contract above)\n\n- Chunked SSE-KMS envelope (provisional `S4E7` magic) + chunked SSE-C (`S4E8`) → Range GET partial-fetch fast-path for SSE-KMS \u002F SSE-C, parallel to the v0.9 #106 work that enabled it for SSE-S4 chunked (`S4E6`).\n- `S4F3` streaming frame format → enables streaming PUT checksum verify for multipart `upload_part` (= closes the codec-API constraint documented in [`docs\u002Fsecurity\u002Fstreaming-checksum-coverage.md`](docs\u002Fsecurity\u002Fstreaming-checksum-coverage.md)).\n- 32-bit `s4-server` runtime promotion from advisory to required CI smoke (currently advisory per v0.11 #A4).\n- Per-action SHA pinning on the GHA workflows (supply chain hardening; v0.11 #A5 ended at the floating-major tag policy).\n- Cross-region replication promoted from experimental scaffolding to production-grade, with Jepsen-style consistency tests.\n- Re-introducing Garage + Ceph as `✓ gating` in the backend compat matrix once the upstream signature-interop drifts are resolved.\n- Additional codec backends (Snappy, LZ4 if user demand emerges).\n\n### Stability policy in practice\n\n- **Adding** a new codec \u002F SSE envelope \u002F sidecar version \u002F CLI subcommand \u002F lib function is **additive** = ships in a minor release. The v0.9 `verify-sidecar` subcommand + the v3 sidecar variant + the `S4E6` chunked envelope are examples of minor-release additions.\n- **Changing** the wire format of an existing magic (e.g. shrinking `S4F2`'s header) is **breaking** = ships in a major release.\n- **Removing** a CLI subcommand or a pub function is **breaking** = ships in a major release after a deprecation cycle.\n- **Default value drifts** for runtime tunables — not breaking per the carve-out above, but always called out in CHANGELOG `### Changed`.\n\nThe audit trail for what counts as breaking lives in [`CHANGELOG.md`](CHANGELOG.md) per the [Keep a Changelog](https:\u002F\u002Fkeepachangelog.com\u002Fen\u002F1.1.0\u002F) format S4 uses end-to-end. Migration recipes for any future v2.0 will live in `docs\u002Fmigration\u002F\u003Cfrom>-to-\u003Cto>.md`; no such file exists today because no breaking change is on the v1.x roadmap.\n\n## Quick Start\n\n### Install via cargo (Rust devs)\n\n```bash\ncargo install s4-server                                  # CPU build\ns4 --endpoint-url https:\u002F\u002Fs3.us-east-1.amazonaws.com     # binary is `s4`, not `s4-server`\n```\n\n**Caveats** (v0.8.8, #98):\n- Requires Rust 1.92+ (`rustup update stable` first).\n- The default `cargo install` builds **CPU codecs only**. GPU codecs\n  (`nvcomp-zstd` \u002F `Bitcomp` \u002F `GDeflate`) require `cargo install s4-server\n  --features nvcomp-gpu`, which needs the CUDA toolchain and `NVCOMP_HOME`\n  pointing at an extracted nvCOMP SDK at build time. Without these the build\n  fails at link time with an `nvcomp` lib not found error.\n- The installed binary is `s4` (not `s4-server`); check with `which s4`.\n\n### 60-second local trial (Docker, CPU-only)\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4 && cd s4\ndocker compose up -d                    # MinIO + S4 server on localhost:8014\n\n# Generate a sample object so the cp lines have something to upload.\nhead -c 100M \u002Fdev\u002Furandom | base64 > big.log    # ~135 MiB of text, compresses well\n\n# Use any S3 client. Below uses aws-cli; replace endpoint with anything.\naws --endpoint-url http:\u002F\u002Flocalhost:8014 s3 mb s3:\u002F\u002Fdemo\naws --endpoint-url http:\u002F\u002Flocalhost:8014 s3 cp big.log s3:\u002F\u002Fdemo\u002Fbig.log\naws --endpoint-url http:\u002F\u002Flocalhost:8014 s3 cp s3:\u002F\u002Fdemo\u002Fbig.log .\u002Fbig.log.roundtrip\n\n# Inspect the compressed object directly on MinIO (different endpoint, bypasses S4).\naws --endpoint-url http:\u002F\u002Flocalhost:9000 s3 cp s3:\u002F\u002Fdemo\u002Fbig.log .\u002Fbig.log.compressed\nls -la big.log big.log.compressed big.log.roundtrip\n# Expected: big.log == big.log.roundtrip (lossless), big.log.compressed is much smaller.\n```\n\n### Try with GPU compression (NVIDIA nvCOMP)\n\n```bash\n# Requires NVIDIA Container Toolkit + a CUDA-capable GPU\ndocker compose -f docker-compose.gpu.yml up -d\naws --endpoint-url http:\u002F\u002Flocalhost:8014 s3 cp parquet-file.parq s3:\u002F\u002Fdemo\u002F\n```\n\nSee [docker-compose.gpu.yml](docker-compose.gpu.yml) for details.\n\n### Kubernetes (Helm)\n\nOfficial container images are published to GitHub Container Registry on every\n`v*.*.*` release tag — `ghcr.io\u002Fabyo-software\u002Fs4:\u003Cversion>` (CPU, multi-arch\namd64 + arm64) and `ghcr.io\u002Fabyo-software\u002Fs4:\u003Cversion>-gpu` (nvCOMP GPU build,\namd64). The package is public; no `imagePullSecrets` needed.\n\n```bash\nhelm install s4 .\u002Fcharts\u002Fs4 \\\n  --set image.tag=0.9.0 \\\n  --set backend.endpointUrl=https:\u002F\u002Fs3.us-east-1.amazonaws.com \\\n  --set backend.region=us-east-1\nkubectl port-forward svc\u002Fs4 8014:8014\n```\n\nFor the GPU image, override `image.tag` with the `-gpu` suffix and turn on\nGPU scheduling:\n\n```bash\nhelm install s4 .\u002Fcharts\u002Fs4 \\\n  --set image.tag=0.9.0-gpu \\\n  --set codec=nvcomp-zstd \\\n  --set gpu.enabled=true \\\n  --set backend.endpointUrl=https:\u002F\u002Fs3.us-east-1.amazonaws.com\n```\n\nThe chart in [`charts\u002Fs4\u002F`](charts\u002Fs4\u002F) ships a stateless Deployment + Service\n(ClusterIP, port 8014), optional GPU node selector (`gpu.enabled=true` for\nnvCOMP), inline or cert-manager TLS, and bucket-policy ConfigMap. See\n[charts\u002Fs4\u002FREADME.md](charts\u002Fs4\u002FREADME.md) for the full values table and\n[.github\u002Fworkflows\u002Fdocker.yml](.github\u002Fworkflows\u002Fdocker.yml) for the image\nbuild \u002F publish pipeline.\n\n### Verifying the image \u002F chart locally\n\nThe published image + chart pair is exercised in CI on every push that\ntouches the distribution surface\n([.github\u002Fworkflows\u002Fdocker-smoke.yml](.github\u002Fworkflows\u002Fdocker-smoke.yml) —\nv0.10 wave-2 #B2): `helm lint` + `helm template` against `charts\u002Fs4`\nwith a placeholder backend URL (catches values-schema \u002F template\nregressions), `docker compose config` against both compose files\n(catches reference \u002F image-tag drift), and `docker pull` +\n`s4 --help` \u002F `s4 --version` against the latest published ghcr.io tag\n(tolerates the not-yet-published case via `continue-on-error`).\nOperators can reproduce the same checks locally before deploying:\n\n```bash\n# Helm chart sanity (with placeholder so backend.endpointUrl is satisfied)\nhelm lint .\u002Fcharts\u002Fs4 --set backend.endpointUrl=https:\u002F\u002Fs3.example.com\nhelm template s4 .\u002Fcharts\u002Fs4 --set backend.endpointUrl=https:\u002F\u002Fs3.example.com \\\n  | kubectl apply --dry-run=client -f -\n\n# Compose file syntax + image-ref validation\ndocker compose -f docker-compose.yml config > \u002Fdev\u002Fnull\ndocker compose -f docker-compose.gpu.yml config > \u002Fdev\u002Fnull\n\n# Image smoke (run this after a release lands on ghcr.io)\ndocker pull ghcr.io\u002Fabyo-software\u002Fs4:0.9.0\ndocker run --rm ghcr.io\u002Fabyo-software\u002Fs4:0.9.0 --help\ndocker run --rm ghcr.io\u002Fabyo-software\u002Fs4:0.9.0 --version\n```\n\n### Python (pip)\n\nFor ML \u002F ETL pipelines that just want the codec without the gateway:\n\n```python\nfrom s4_codec import CpuZstd, CpuGzip, gpu_available\ncodec = CpuZstd(level=3)\ncompressed, original_size, crc = codec.compress(data_bytes)\nroundtrip = codec.decompress(compressed, original_size, crc)\n```\n\nPyO3 bindings live in [`crates\u002Fs4-codec-py\u002F`](crates\u002Fs4-codec-py\u002F) — build\nwith `maturin build --release` (and `--features nvcomp-gpu` for GPU).\n\n### Browser (WASM)\n\nFor frontend apps that read S4-compressed objects directly from S3 over a\npresigned URL, no S4 server in the read path:\n\n```bash\nrustup target add wasm32-unknown-unknown\nwasm-pack build --release --target web crates\u002Fs4-codec-wasm  # → pkg\u002F\n```\n\nThe bundle exports `decompressFramed` \u002F `decompressSingle` for the CPU\ncodec subset (`passthrough`, `cpu-zstd`, `cpu-gzip`). See\n[`crates\u002Fs4-codec-wasm\u002FREADME.md`](crates\u002Fs4-codec-wasm\u002FREADME.md) for\nthe API and a 10-line example.\n\n### Build from source\n\n```bash\ncargo build --release --workspace                       # CPU-only\nNVCOMP_HOME=\u002Fpath\u002Fto\u002Fnvcomp cargo build --release --workspace --features s4-server\u002Fnvcomp-gpu\n\ntarget\u002Frelease\u002Fs4 --endpoint-url https:\u002F\u002Fs3.us-east-1.amazonaws.com \\\n    --host 0.0.0.0 --port 8014 --codec cpu-zstd --log-format json\n```\n\n### Supported targets\n\n| Crate                          | 64-bit Linux (`x86_64` \u002F `aarch64`) | 32-bit Linux (`i686`) | Browser (`wasm32-unknown-unknown`) |\n|--------------------------------|:-----------------------------------:|:---------------------:|:----------------------------------:|\n| `s4-codec` (library)           | ✅ tier 1                           | ✅ compiles + tests   | ✅ via `s4-codec-wasm`             |\n| `s4-codec-wasm` (browser)      | n\u002Fa                                 | n\u002Fa                   | ✅ tier 1                          |\n| `s4-config`                    | ✅ tier 1                           | ✅                    | ✅                                 |\n| `s4-server` (gateway binary)   | ✅ tier 1                           | ✅ compiles + `--help` \u002F `--version` + advisory PUT\u002FGET round-trip (CI) | ❌ not applicable           |\n| `nvcomp-gpu` feature (any crate above) | ✅ x86_64 only (NVIDIA driver) | ❌ (no 32-bit nvCOMP) | ❌                            |\n\nRuntime-tested platform is **`x86_64-unknown-linux-gnu`** and\n**`aarch64-unknown-linux-gnu`** (CI matrix). The 32-bit `i686-unknown-linux-gnu`\ntarget builds clean for `s4-codec` \u002F `s4-config` \u002F `s4-server` as of\nv0.9 #106 (default-bytes constants are now `target_pointer_width` cfg-gated\nso the 5 GiB AWS S3 single-PUT ceiling no longer const-overflows `usize` on\n32-bit). v0.10 wave-2 #A4 adds a per-push CI job that (a) executes the\n`s4-codec` + `s4-config` test suites under `--target i686-unknown-linux-gnu`\nand (b) builds the `s4` binary itself for i686 + invokes\n`s4 --help` \u002F `s4 --version` as a runtime smoke. v0.11 #A4 extends the\nsame job with an **end-to-end PUT\u002FGET round-trip** — the i686 `s4` binary\nruns in front of a stock MinIO container and the AWS CLI puts then gets\na small object back through it, byte-equality-checked. The round-trip\nstep lands in CI as **advisory (`continue-on-error: true`)** so a\nfirst-time 32-bit runtime bug surfaces in the job log without turning\nthe badge red while a fix lands in a follow-up v0.11.x commit; promotion\nto a required gate happens once a stretch of green main pushes is\nobserved. Operators running on i686 should still treat\n`--max-body-bytes` carefully (auto-clamps to `isize::MAX as usize`\n≈ 2 GiB on 32-bit — Rust caps any single `Vec` \u002F `Bytes` allocation\nat `isize::MAX`, so a higher gateway guard would let oversized requests\npanic inside the SSE buffered-decrypt pre-alloc path).\n\nThe `wasm32-unknown-unknown` target is the public release channel for the\nbrowser decoder (`s4-codec-wasm`); the criterion regression-tracking suite\nand `cargo check --target wasm32-unknown-unknown` keep it green on every CI\npush to `main`.\n\n## How it Compares\n\n| Feature | S4 | [MinIO](https:\u002F\u002Fgithub.com\u002Fminio\u002Fminio) | [Garage](https:\u002F\u002Fgit.deuxfleurs.fr\u002FDeuxfleurs\u002Fgarage) | Wasabi \u002F B2 | AWS S3 |\n|---|---|---|---|---|---|\n| Stance | Transparent-compression proxy in front of an existing S3 backend | Standalone S3-compatible storage system | Standalone S3-compatible storage system | Hosted S3-compatible storage | The reference |\n| S3 API compatibility | See [matrix below](#s3-api-compatibility-matrix) | Comprehensive | Subset | Comprehensive | Native |\n| **GPU compression** | ✅ nvCOMP zstd \u002F Bitcomp \u002F GDeflate | ❌ | ❌ | ❌ | ❌ |\n| **CPU compression** | ✅ zstd 1–22 \u002F gzip | ⚠️ S2 only (legacy) | ✅ zstd 1–22 | ❌ | ❌ |\n| **Auto codec selection** | ✅ entropy + magic-byte sampling | ❌ | ❌ | — | — |\n| **Range GET on compressed** | ✅ via S4IX sidecar (see [matrix](#s3-api-compatibility-matrix) for the range modes supported) | n\u002Fa | n\u002Fa | ✅ | ✅ |\n| **Streaming I\u002FO** | ✅ chunked PUT \u002F GET; GPU per-chunk pipelined ([conditions](#streaming-io)) | ✅ | ✅ | ✅ | ✅ |\n| **Native HTTPS \u002F TLS** | ✅ rustls + ring, ALPN h2 | ⚠️ via reverse proxy | ⚠️ via reverse proxy | ✅ | ✅ |\n| **Bucket-policy enforcement at gateway** | ✅ AWS-style JSON, Allow \u002F Deny | n\u002Fa | n\u002Fa | ✅ | ✅ |\n| **Acts as gateway to existing S3** | ✅ (the whole point) | ❌ (gateway mode removed upstream) | ❌ | ❌ | n\u002Fa |\n| **License** | Apache-2.0 | upstream LICENSE: AGPLv3 (+ commercial) | upstream LICENSE: AGPLv3 | proprietary | proprietary |\n\n*(MinIO \u002F Garage license cells link to upstream LICENSE files; project licenses\n can change between releases. Do not treat as legal advice. See #103.)*\n\n### S3 API compatibility matrix\n\nS4 implements the parts of the S3 API needed to act as a transparent\ncompression proxy in front of an existing bucket. **It is not a complete\nS3 implementation** — operations marked \"—\" return `NotImplemented` and\nshould not be called against an S4 endpoint. PRs welcome on the matrix\nrows you need.\n\n| Surface | Status | Notes |\n|---|---|---|\n| PUT \u002F GET object | ✅ Full | single-PUT + range-GET (see below) |\n| Multipart upload (create \u002F part \u002F complete \u002F abort) | ✅ Full | with per-part framing + final-part padding trim |\n| HEAD object | ✅ Full | returns post-compression `Content-Length` (matches what S3 returns; original size in `x-amz-meta-s4-original-size`) |\n| Range GET | ✅ S3 spec | `bytes=N-M`, `bytes=-N` (suffix), `bytes=N-` (open-ended); range maps through S4IX sidecar to compressed byte offsets |\n| Conditional GET \u002F PUT (`If-Match` \u002F `If-None-Match` \u002F `If-Modified-Since`) | ✅ Full | |\n| PutObjectAcl \u002F GetObjectAcl | ✅ canned ACLs only | `private` \u002F `public-read` \u002F `public-read-write` \u002F `authenticated-read` \u002F `aws-exec-read` \u002F `bucket-owner-read` \u002F `bucket-owner-full-control` |\n| Bucket versioning | ✅ Full | per-version UUIDv4 ID, delete-marker semantics |\n| Object lock (Governance \u002F Compliance) | ✅ Full | per-object retention + legal-hold |\n| Bucket lifecycle (`LifecycleConfiguration`) | ✅ Full | Expiration \u002F NoncurrentVersionExpiration \u002F AbortIncompleteMultipartUpload |\n| Bucket notifications (Webhook \u002F SQS \u002F SNS) | ✅ Full | SQS\u002FSNS gated behind `aws-events` feature |\n| Bucket replication | ⚠ experimental | rule-based, per-PUT dispatcher; ships as **experimental scaffolding** (wire path + config surface only). **Excluded from the v1.0 freeze** — promotion to production-grade is on the v1.x roadmap. |\n| Bucket policy | ✅ AWS-style JSON | Allow \u002F Deny, IAM Conditions subset (see #100) |\n| Tagging (object \u002F bucket) | ✅ Full | |\n| CORS configuration | ✅ Full | |\n| Inventory | ✅ Full | CSV \u002F Parquet output |\n| MFA Delete | ✅ Full | RFC 6238 TOTP |\n| SSE-S3 (server-side, S4-managed keys) | ✅ Full | AES-256-GCM (S4E1\u002FS4E2 wire) |\n| SSE-KMS (envelope encryption) | ✅ Full | LocalKms (file-backed KEKs) default; AWS KMS gated behind `aws-kms` feature |\n| SSE-C (customer-provided key) | ✅ Full | (S4E3 wire) |\n| S3 Select | ✅ subset | CSV input, single-column equality \u002F inequality \u002F GT \u002F LT \u002F LIKE-prefix; falls back to CPU eval where unsupported |\n| Presigned URLs | ✅ Full | both PUT and GET |\n| SigV4 \u002F SigV4a auth | ✅ Full | SigV4a requires `--sigv4a-credentials \u003CDIR>` |\n| Storage class transitions (Standard ↔ IA ↔ Glacier) | ✅ tagging-driven | see [docs\u002Fstorage-class-transitions.md](docs\u002Fstorage-class-transitions.md) |\n| Cross-region replication via S4 chain | — | use AWS S3 native CRR on the backend |\n| RequestPayment \u002F Accelerate \u002F Logging configuration | — | not implemented; report a 501 |\n\n**Range GET caveat** (#99): the S4IX sidecar gives a per-frame index, so\nrange maps to a contiguous read of the covering frames and a decode that's\nsliced at the boundaries the caller asked for. Parquet\u002FORC readers\n(arrow-rs, datafusion, duckdb's parquet reader) that issue suffix-range\nGET against the footer work out of the box. Parallel range reads against\noverlapping frame extents do extra decode work and are not yet optimized;\nsee #99 for the parquet\u002FORC reader cross-validation harness on the\nroadmap.\n\n### SDK compatibility matrix\n\nTest status per major S3 client. \"Tested\" means a green E2E run in CI or\ndocumented manual verification; \"Should work\" means the wire shape is\nsatisfied but no explicit test covers it yet; \"Known issue\" links to the\nrelevant issue.\n\n| Client | Status | Notes |\n|---|---|---|\n| `aws-cli` (v2.x) | ✅ Tested | path-style + virtual-hosted URLs, presigned URLs, multipart, range GET |\n| `boto3` (Python) | ✅ Tested | via `s4-codec-py` integration tests + `tests\u002Ftest_binding.py` |\n| `aws-sdk-rust` (v1.x) | ✅ Tested | the gateway is built on it; trait-level coverage in `tests\u002Ffeature_e2e.rs` |\n| `aws-sdk-go-v2` | ✅ Should work | wire-level shapes shared with aws-sdk-rust; no explicit smoke test yet |\n| `aws-sdk-java-v2` | ✅ Should work | same as Go v2 caveat |\n| `MinIO mc` | ✅ Should work | path-style + virtual-hosted both fine; one-off `mc cp` validated manually |\n| `rclone` (s3 backend) | ✅ Should work | multipart chunk size driven by client; large objects respect S4 frame budget |\n| `s3cmd` | ⚠️ Should work | older client; SigV2 fallback NOT supported (S4 is SigV4 + SigV4a only) |\n| Presigned URLs (SigV4) | ✅ Tested | both PUT and GET; query-string signing path covered |\n| Conditional GET \u002F PUT | ✅ Tested | `If-Match` \u002F `If-None-Match` \u002F `If-Modified-Since` \u002F `If-Unmodified-Since` |\n| `Content-MD5` \u002F `x-amz-content-sha256` | ✅ Tested | both unsigned (`UNSIGNED-PAYLOAD`) and SHA256-hashed payloads |\n| `Content-Encoding: gzip` interplay | ⚠️ See note | S4 may double-encode if the client sends `Content-Encoding: gzip` AND S4 also picks `cpu-gzip` — use `--codec cpu-zstd` or set client `Content-Encoding: identity` |\n\n**Endpoint URL style** (#101): S4 accepts both **virtual-hosted-style**\n(`https:\u002F\u002Fmy-bucket.s4.example.com\u002Fkey`) and **path-style**\n(`https:\u002F\u002Fs4.example.com\u002Fmy-bucket\u002Fkey`); the backend ` aws-sdk-s3 `\nclient uses whatever the operator's `--endpoint-url` configuration\nspecifies. If your client is fussy about this, set `--path-style` on\nthe s4 server side or `--force-path-style` on the AWS SDK side.\n\n### Backend compatibility matrix\n\nS4 is a transparent compression proxy in front of an S3-compatible\nbackend. Each row below is the **verification posture** S4 holds for\nthat backend — what CI actually exercises, not \"should work\" claims.\nv0.11 #A7 added the weekly\n[`compat-matrix.yml`](.github\u002Fworkflows\u002Fcompat-matrix.yml) workflow\nthat drives the docker-tier verifications (and the real-cloud rows\nwhen operators provide credentials).\n\n| Backend | Verification | Notes |\n|---|---|---|\n| [AWS S3](https:\u002F\u002Faws.amazon.com\u002Fs3\u002F) | ✅ Verified via nightly CI ([`aws-e2e.yml`](.github\u002Fworkflows\u002Faws-e2e.yml)) | real bucket, OIDC-assumed IAM role; the reference implementation |\n| [MinIO](https:\u002F\u002Fgithub.com\u002Fminio\u002Fminio) | ✅ Verified via per-PR CI (`http_e2e` \u002F `multipart_e2e` testcontainers) + weekly compat-matrix | `quay.io\u002Fminio\u002Fminio:latest` |\n| [Garage](https:\u002F\u002Fgit.deuxfleurs.fr\u002FDeuxfleurs\u002Fgarage) | ✅ Verified via weekly compat-matrix CI (docker `dxflrs\u002Fgarage:v1.1.0`) | single-node `replication_mode = \"none\"`, CLI-provisioned bucket + key |\n| [Ceph RGW](https:\u002F\u002Fdocs.ceph.com\u002Fen\u002Flatest\u002Fradosgw\u002F) | ⚠️ Best-effort weekly compat-matrix CI (`quay.io\u002Fceph\u002Fdemo:latest-quincy`) | the upstream `ceph\u002Fdemo` image is no longer actively maintained; the job is gated `continue-on-error` so a pull \u002F startup failure surfaces as a warning rather than blocking the matrix |\n| [Backblaze B2](https:\u002F\u002Fwww.backblaze.com\u002Fb2\u002Fcloud-storage.html) | 🔧 Configurable in operator CI (real backend; requires `vars.B2_BUCKET` \u002F `B2_ENDPOINT` \u002F `B2_REGION` + `secrets.B2_KEY_ID` \u002F `B2_APPLICATION_KEY`) | weekly when configured, silent skip otherwise |\n| [Cloudflare R2](https:\u002F\u002Fwww.cloudflare.com\u002Fproducts\u002Fr2\u002F) | 🔧 Configurable in operator CI (real backend; requires `vars.R2_BUCKET` \u002F `R2_ENDPOINT` \u002F `R2_REGION` + `secrets.R2_ACCESS_KEY_ID` \u002F `R2_SECRET_ACCESS_KEY`) | weekly when configured, silent skip otherwise |\n| [Wasabi](https:\u002F\u002Fwasabi.com\u002F) | 🔧 Configurable in operator CI (real backend; requires `vars.WASABI_BUCKET` \u002F `WASABI_ENDPOINT` \u002F `WASABI_REGION` + `secrets.WASABI_ACCESS_KEY_ID` \u002F `WASABI_SECRET_ACCESS_KEY`) | weekly when configured, silent skip otherwise |\n\nEach compat-matrix job runs a 1 PUT + 1 GET + sidecar HEAD against\nthe live backend through an `s4 --codec cpu-zstd --dispatcher always`\nserver — sidecar HEAD on the backend asserts the second backend round-\ntrip (sidecar PUT) lands the way s4 expects, which is where most\nS3-API-shape divergences would surface (PutObject without\n`Content-MD5`, aws-chunked encoding, etc.).\n\n## Security & threat model\n\nS4 is a TLS-terminating S3-compatible proxy. The boundaries you should\nthink about:\n\n- **Authentication scope**: S4 verifies SigV4 \u002F SigV4a on incoming\n  requests using credentials operators configure (`--credentials FILE`\n  or `--sigv4a-credentials DIR`). The S4 server then turns around and\n  speaks to the backend bucket using **its own** AWS credentials\n  (`AWS_ACCESS_KEY_ID` etc. from the standard SDK chain). Client\n  identity is **not** delegated to the backend; the backend sees S4 as\n  one principal regardless of which incoming client made the request.\n  If you need per-client backend identity, run one S4 instance per\n  client and use distinct backend credentials.\n- **TLS termination**: S4 terminates TLS at its own listener\n  (`--tls-cert` \u002F `--tls-key`, or ACME via `--acme`). The connection\n  to the backend uses the SDK's own TLS (rustls with the system root\n  CA store). If your security model requires end-to-end TLS without\n  intermediate decryption, S4 is the wrong shape — use a different\n  proxy or run S4 colocated with the backend so the second TLS hop\n  doesn't leave the same host.\n- **Bucket policy enforcement at the S4 layer**: when `--bucket-policy\n  FILE` is set, S4 evaluates AWS-style JSON Allow \u002F Deny rules\n  **before** forwarding to the backend. The backend's own bucket\n  policy still applies on top. Two policies in series; both must\n  permit. We do **not** parse every IAM Condition operator — see\n  [`crates\u002Fs4-server\u002Fsrc\u002Fpolicy.rs`](crates\u002Fs4-server\u002Fsrc\u002Fpolicy.rs)\n  for the supported subset.\n- **Body-size limits \u002F request smuggling**: hyper limits enforced\n  (`--max-header-bytes`, default 64 KiB; `--max-concurrent-connections`,\n  default 1024; `--read-timeout-seconds`, default 30s — see v0.8.5\n  #84). HTTP\u002F2 is **off by default** (`--http2` to opt in); the S3 API\n  is HTTP\u002F1.1 in practice and h2 adds DoS surface (stream-multiplexing\n  abuse) that doesn't pay off for our workload.\n- **Tenant isolation**: S4 is **single-tenant by design** — one S4\n  instance per security boundary. We do not enforce cross-bucket\n  isolation at the S4 layer beyond what the backend's IAM enforces.\n  Multi-tenant deployments should run one S4 instance per tenant with\n  separate backend credentials.\n- **Non-goals**: S4 is not an IDS \u002F WAF, does not log request bodies\n  (only headers + length), does not implement S3's `ObjectACL`\n  Grant-by-CanonicalUser semantics beyond canned ACLs, does not\n  proxy IAM API calls.\n\nFor incident reporting see [SECURITY.md](SECURITY.md).\n\n## Architecture\n\n```\n┌──────────────────────────────────────────────────────────────────┐\n│                          S4 server                               │\n│  ┌──────────────────┐  ┌─────────────────┐  ┌────────────────┐   │\n│  │ s3s framework    │→ │ S4Service       │→ │ s3s_aws::Proxy │ → │ → backend (AWS S3 \u002F MinIO)\n│  │ (HTTP + SigV4)   │  │ (compress hook) │  │ (aws-sdk-s3)   │   │\n│  └──────────────────┘  └────────┬────────┘  └────────────────┘   │\n│                                 ▼                                │\n│  ┌─────────────────────────────────────────────────────────┐     │\n│  │ s4-codec::CodecRegistry  (multi-codec dispatch by id)   │     │\n│  │   ├─ Passthrough          (no compression)              │     │\n│  │   ├─ CpuZstd              (zstd-rs, streaming)          │     │\n│  │   ├─ NvcompZstd           (nvCOMP, GPU, per-chunk)      │     │\n│  │   ├─ NvcompBitcomp        (nvCOMP, integer columns)     │     │\n│  │   └─ NvcompGDeflate       (nvCOMP, DEFLATE-family GPU)  │     │\n│  └─────────────────────────────────────────────────────────┘     │\n│  ┌─────────────────────────────────────────────────────────┐     │\n│  │ s4-codec::CodecDispatcher                               │     │\n│  │   ├─ AlwaysDispatcher                                   │     │\n│  │   └─ SamplingDispatcher  (entropy + 12 magic bytes)     │     │\n│  └─────────────────────────────────────────────────────────┘     │\n└──────────────────────────────────────────────────────────────────┘\n        ▲              ▲              ▲                ▲\n        │              │              │                │\n   \u002Fhealth         \u002Fready         \u002Fmetrics         OTLP traces\n   (probe)        (probe)       (Prometheus)       (Jaeger \u002F X-Ray)\n```\n\n## Benchmarks\n\nSingle-pass roundtrip through `s4-codec`. Hardware: RTX 4070 Ti SUPER 16 GB\n+ nvCOMP 5.2.0.10 + CUDA 13.2 driver 595.58.03 + Ryzen 9 9950X. Throughput\nis reported as **uncompressed bytes per second** (the convention nvCOMP \u002F\nlz4 \u002F zstd publish). Last benchmarked 2026-05-13 (v0.8 #53,\n`crates\u002Fs4-codec\u002Fexamples\u002Fbench_codecs.rs`).\n\n![v0.8 perf chart](docs\u002Fperf-v0.8.png)\n\n| Workload | Codec | Original | Compressed | Ratio | Compress | Decompress |\n|---|---|---:|---:|---:|---:|---:|\n| nginx access log (256 MiB) | cpu-zstd-3 | 256 MiB | 1 MiB | **155.01×** | 3.71 GB\u002Fs | 3.27 GB\u002Fs |\n| nginx access log (256 MiB) | nvcomp-zstd | 256 MiB | 2 MiB | 95.60× | 1.70 GB\u002Fs | 2.86 GB\u002Fs |\n| nginx access log (256 MiB) | nvcomp-gdeflate | 256 MiB | 169 MiB | 1.51× | 1.07 GB\u002Fs | 2.51 GB\u002Fs |\n| Parquet-like mixed (256 MiB) | cpu-zstd-3 | 256 MiB | 133 MiB | 1.92× | 0.75 GB\u002Fs | 1.89 GB\u002Fs |\n| Parquet-like mixed (256 MiB) | nvcomp-zstd | 256 MiB | 131 MiB | 1.94× | 1.44 GB\u002Fs | 2.62 GB\u002Fs |\n| Parquet-like mixed (256 MiB) | nvcomp-gdeflate | 256 MiB | 183 MiB | 1.40× | 1.05 GB\u002Fs | 2.62 GB\u002Fs |\n| Parquet-like mixed (256 MiB) | nvcomp-bitcomp | 256 MiB | 122 MiB | **2.09×** | 1.49 GB\u002Fs | 1.44 GB\u002Fs |\n| Postings (u32, 64 MiB) | cpu-zstd-3 | 64 MiB | 43 MiB | 1.48× | 1.22 GB\u002Fs | 1.65 GB\u002Fs |\n| Postings (u32, 64 MiB) | nvcomp-zstd | 64 MiB | 42 MiB | 1.52× | 1.29 GB\u002Fs | 2.52 GB\u002Fs |\n| Postings (u32, 64 MiB) | nvcomp-gdeflate | 64 MiB | 42 MiB | 1.51× | 1.06 GB\u002Fs | 2.44 GB\u002Fs |\n| Postings (u32, 64 MiB) | nvcomp-bitcomp | 64 MiB | 5 MiB | **11.93×** | 1.61 GB\u002Fs | 1.50 GB\u002Fs |\n| Timestamps (i64, 64 MiB) | cpu-zstd-3 | 64 MiB | 24 MiB | 2.63× | 0.35 GB\u002Fs | 0.92 GB\u002Fs |\n| Timestamps (i64, 64 MiB) | nvcomp-zstd | 64 MiB | 24 MiB | 2.61× | 1.14 GB\u002Fs | 2.70 GB\u002Fs |\n| Timestamps (i64, 64 MiB) | nvcomp-gdeflate | 64 MiB | 48 MiB | 1.32× | 0.89 GB\u002Fs | 2.26 GB\u002Fs |\n| Timestamps (i64, 64 MiB) | nvcomp-bitcomp | 64 MiB | 21 MiB | **2.95×** | 1.45 GB\u002Fs | 1.39 GB\u002Fs |\n| doc_values (i64, 64 MiB) | cpu-zstd-3 | 64 MiB | 44 MiB | 1.45× | 0.26 GB\u002Fs | 1.01 GB\u002Fs |\n| doc_values (i64, 64 MiB) | nvcomp-zstd | 64 MiB | 34 MiB | **1.86×** | 1.04 GB\u002Fs | 2.59 GB\u002Fs |\n| doc_values (i64, 64 MiB) | nvcomp-gdeflate | 64 MiB | 48 MiB | 1.33× | 0.96 GB\u002Fs | 2.54 GB\u002Fs |\n| doc_values (i64, 64 MiB) | nvcomp-bitcomp | 64 MiB | 37 MiB | 1.72× | 1.41 GB\u002Fs | 1.48 GB\u002Fs |\n| Already-compressed (64 MiB) | cpu-zstd-3 | 64 MiB | 64 MiB | 1.00× | 2.23 GB\u002Fs | 3.15 GB\u002Fs |\n| Already-compressed (64 MiB) | nvcomp-zstd | 64 MiB | 64 MiB | 1.00× | 0.83 GB\u002Fs | 2.37 GB\u002Fs |\n| Already-compressed (64 MiB) | nvcomp-gdeflate | 64 MiB | 64 MiB | 1.00× | 0.92 GB\u002Fs | 2.39 GB\u002Fs |\n\n**v0.3 → v0.8 throughput delta** (compress GB\u002Fs on the same hardware,\nnvCOMP 5.0.x → 5.2.0.10, no source-code changes — pure runtime \u002F driver gains):\n\n| Workload | Codec | v0.3 (2026-04) | v0.8 (2026-05-13) | Delta |\n|---|---|---:|---:|---:|\n| nginx (256 MiB) | cpu-zstd-3 | 2.72 GB\u002Fs | **3.71 GB\u002Fs** | +36% |\n| nginx (256 MiB) | nvcomp-zstd | 1.27 GB\u002Fs | **1.70 GB\u002Fs** | +34% |\n| parquet (256 MiB) | nvcomp-zstd | 1.06 GB\u002Fs | **1.44 GB\u002Fs** | +36% |\n| parquet (256 MiB) | nvcomp-bitcomp | 1.20 GB\u002Fs | **1.49 GB\u002Fs** | +24% |\n| timestamps (64 MiB) | nvcomp-zstd | 0.95 GB\u002Fs | **1.14 GB\u002Fs** | +20% |\n| timestamps (64 MiB) | nvcomp-bitcomp | 1.20 GB\u002Fs | **1.45 GB\u002Fs** | +21% |\n| doc_values (64 MiB) | nvcomp-zstd | 0.80 GB\u002Fs | **1.04 GB\u002Fs** | +30% |\n\n**Reading the table:**\n\n- **`cpu-zstd-3`** dominates on text — 155× on nginx logs is hard to beat.\n- **`nvcomp-bitcomp`** is the killer for typed numeric columns: 11.93× on\n  sorted u32 posting lists (vs ~1.5× for everything else), 2.95× on\n  monotonic i64 timestamps. The `data_type` hint is critical (`Char` on\n  numeric data degrades to ~1.2×); see [`s4_codec::nvcomp::BitcompDataType`]\n  for the typed constructors.\n- **`nvcomp-zstd`** is competitive on Parquet-like \u002F mixed workloads and\n  frees the CPU for serving requests in parallel.\n- **`nvcomp-gdeflate`** sits between zstd and \"no compression\" — useful\n  when you need DEFLATE-format wire compat (in v0.3 the\n  [`gunzip`-compatible wrapper](https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4\u002Fissues\u002F26)\n  will make this codec serve `Content-Encoding: gzip` to any HTTP client).\n- **Already-compressed inputs** are correctly bypassed at ratio 1.0× by every\n  codec — S4 never makes a file *bigger*.\n\n**Throughput note**: nvCOMP runs through the FCG1-framed batched API at\nthe default 64 KiB chunk size, so per-call overhead dominates the 64 MiB\ninput cases. Production deployments using larger chunks via\n`streaming_compress_to_frames` (v0.2 #1) push GPU compress >5 GB\u002Fs on\nhighly compressible inputs. The full head-to-head bench vs MinIO S2 \u002F\nGarage zstd is tracked in\n[issue #14](https:\u002F\u002Fgithub.com\u002Fabyo-software\u002Fs4\u002Fissues\u002F14); the latest CSV\ncaptured on 2026-05-13 lives at\n[`benches\u002Fcomparison\u002Fresult-2026-05-13.csv`](benches\u002Fcomparison\u002Fresult-2026-05-13.csv)\n(MinIO + s4-cpu only; Garage's auto-issued keys and the s4-gpu image\nrequire manual setup outside the driver script).\n\n**Multipart streaming note** (v0.2 #1, surfaced again by the v0.8 #53\ncomparison run): per-part S4F2 framing (4 MiB chunks) means a 64 MiB\nnginx-log multipart upload reports ~1.6× ratio at the storage layer\ninstead of the 155× single-pass ratio above — each chunk is too small\nfor zstd's longest-match window to amortize across the whole object.\nRatio scales back to single-pass numbers once `cargo install` users\nconfigure larger multipart chunk sizes via the AWS SDK\n`multipart_chunksize` knob (S4 itself stays at the 4 MiB default for\nRange-GET granularity). The CSV captures end-to-end PUT\u002FGET wall-clock\nincluding framing overhead.\n\n### Performance regression tracking (criterion + GitHub Pages)\n\nThe single-pass numbers above are captured manually on the maintainer's\nworkstation; for **per-commit regression detection** S4 also runs a\ncriterion bench suite on every push to `main`\n([`.github\u002Fworkflows\u002Fbench.yml`](.github\u002Fworkflows\u002Fbench.yml)), stores\nthe timing history in the `gh-pages` branch via\n[`benchmark-action\u002Fgithub-action-benchmark`](https:\u002F\u002Fgithub.com\u002Fbenchmark-action\u002Fgithub-action-benchmark),\nand comments on a commit when any tracked target gets ≥ 1.1× slower\nthan its previous best. The targets cover the CPU hot paths every\ndefault-build deployment runs through:\n\n- `crates\u002Fs4-codec\u002Fbenches\u002Fcodec_roundtrip.rs` — `cpu-zstd` (levels\n  1 \u002F 3 \u002F 22) \u002F `cpu-gzip` \u002F `passthrough` compress + decompress at\n  1 KiB \u002F 1 MiB \u002F 16 MiB.\n- `crates\u002Fs4-codec\u002Fbenches\u002Fframe_codec.rs` — `write_frame` and the\n  `FrameIter` walker, with the padding-skip branch exercised.\n- `crates\u002Fs4-codec\u002Fbenches\u002Findex_codec.rs` — S4IX sidecar\n  `encode_index` \u002F `decode_index` \u002F `lookup_range` across 128 \u002F\n  1024 \u002F 4096 frame counts.\n\nGPU codecs (`nvcomp-*`) are intentionally not in the regression suite\nbecause GitHub-hosted runners have no CUDA-capable GPU; the manual\ntable above remains the canonical source for those numbers.\n\nThe rendered trend chart lives at\n`https:\u002F\u002Fabyo-software.github.io\u002Fs4\u002Fdev\u002Fbench\u002F` after the first\nsuccessful CI run on `main` initialises the `gh-pages` branch.\n\n### SSE throughput (AES-NI vs software fallback)\n\nS4's server-side encryption (`--sse-s4-key`) goes through the `aes-gcm`\ncrate, which selects the AES-NI hardware path automatically on x86_64\nhosts where the `aes` + `pclmulqdq` CPU features are present. v0.8 #50\nadds (a) a boot log line confirming which backend is live, (b) a\n`s4_sse_aes_backend{kind=\"aes-ni\"|\"neon\"|\"software\"}` Prometheus gauge\nstamped at startup, and (c) the `bench_sse_throughput` example below\nthat measures the resulting encrypt \u002F decrypt throughput.\n\nNumbers below are from the same Ryzen 9 9950X host as the codec table.\nReproduce with `cargo run --release -p s4-server --example\nbench_sse_throughput` (AES-NI is the default; force the software\nbackend with `RUSTFLAGS=\"--cfg aes_force_soft --cfg\npolyval_force_soft\"` and a clean target dir).\n\n| Body size | AES-NI Encrypt | AES-NI Decrypt | Software Encrypt | Software Decrypt |\n|-----------|---------------:|---------------:|-----------------:|-----------------:|\n| 64 KiB    | 1661 MB\u002Fs      | 1692 MB\u002Fs      | 194 MB\u002Fs         | 194 MB\u002Fs         |\n| 1 MiB     | 1709 MB\u002Fs      | 1718 MB\u002Fs      | 195 MB\u002Fs         | 195 MB\u002Fs         |\n| 100 MiB   | 956 MB\u002Fs       | 925 MB\u002Fs       | 181 MB\u002Fs         | 180 MB\u002Fs         |\n\nAES-NI delivers ~8.7× throughput on 64 KiB \u002F 1 MiB bodies (the regime\nthat dominates real S3 object traffic). The 100 MiB row's narrower\ngap (~5.2×) is the buffer allocator + page-fault floor — `aes-gcm`\nuses a single contiguous `Vec` for the ciphertext, so 100 MiB cases\ncharge a `mmap` per iteration that's not on the AES path. Operators\nrunning on hosts without AES-NI (very old \u002F virtualized x86 or\nnon-x86 hardware) should expect ~190 MB\u002Fs encrypt \u002F decrypt as the\nsustained ceiling for SSE-S4 — still ahead of the network for most\ndeployments, but worth knowing when sizing CPU headroom.\n\n**Detecting which backend is live**: the boot log emits\n`S4 AES-NI feature detection ... aes_ni_available=true` (or `false`),\nand `curl -s localhost:9100\u002Fmetrics | grep s4_sse_aes_backend` shows\nthe gauge with the active `kind` label.\n\n**Reproducing locally** (requires CUDA + nvCOMP):\n\n```bash\nNVCOMP_HOME=\u002Fopt\u002Fnvcomp LD_LIBRARY_PATH=\u002Fopt\u002Fnvcomp\u002Flib \\\n  cargo run --release --example bench_codecs \\\n    -p s4-codec --features nvcomp-gpu\n\n# Streaming pipeline bench (1 GiB highly-compressible, in-flight chunks):\nNVCOMP_HOME=\u002Fopt\u002Fnvcomp LD_LIBRARY_PATH=\u002Fopt\u002Fnvcomp\u002Flib \\\n  cargo run --release --example bench_pipeline \\\n    -p s4-server --features nvcomp-gpu\n\n# Comparison vs MinIO \u002F Garage (Docker required):\ndocker compose -f benches\u002Fcomparison\u002Fdocker-compose.yml up -d\nAWS_REQUEST_CHECKSUM_CALCULATION=when_required \\\nAWS_RESPONSE_CHECKSUM_VALIDATION=when_required \\\n  .\u002Fbenches\u002Fcomparison\u002Frun.sh benches\u002Fcomparison\u002Fresult-$(date +%F).csv\n```\n\n## Cost savings — does S4 make sense for your bill?\n\nS4 is **not** worth deploying for everyone. The economics depend on (a)\nyour AWS S3 bill, (b) how compressible your data is, (c) the cost of the\nEC2 GPU instance running S4. Here's an honest table to self-diagnose:\n\n| Your monthly S3 bill | Likely savings (50–80%) | EC2 GPU cost | Net savings | Verdict |\n|---:|---:|---:|---:|---|\n| $500   | $250 – $400     | ~$730\u002Fmo (g6.xlarge)    | **−$330 to −$480**    | ❌ NOT worth it |\n| $1,000 | $500 – $800     | ~$730\u002Fmo                | **−$230 to +$70**     | ⚠️ Breakeven; only if you'd use the GPU for other work too |\n| $3,000 | $1,500 – $2,400 | ~$730\u002Fmo                | **+$770 to +$1,670**  | ✅ Real savings |\n| $10,000 | $5,000 – $8,000 | ~$1,860\u002Fmo (g6e.xlarge) | **+$3,140 to +$6,140** | ✅✅ Strong ROI |\n| $50,000 | $25,000 – $40,000 | ~$1,860\u002Fmo            | **+$23,140 to +$38,140** | ✅✅✅ Material savings |\n\n**Notes:**\n- \"Likely savings 50–80%\" is the typical range for log-heavy workloads\n  (`cpu-zstd-3` 155×) and Parquet (`nvcomp-zstd` ~2× plus better Range GET\n  efficiency). For pure-numeric column-store data with `nvcomp-bitcomp` on\n  sorted posting lists, the ratio swings to **>10×** — savings closer to\n  90%+.\n- EC2 prices are us-east-1 on-demand, May 2026. Spot instances cut these by\n  ~70%, breakeven at ~$300\u002Fmo S3 bill instead of $1,000.\n- S4 itself is open source (Apache-2.0) — the only cost is the EC2 instance\n  and your time.\n- **If your monthly S3 bill is under $1,000 and you're not already running\n  GPUs for other work, don't bother.** Use S4's `cpu-zstd` codec on a small\n  CPU instance, or front your bucket with nginx + gzip — both will give\n  most of the savings without GPU hardware.\n\n## When NOT to use S4\n\nHonest list of workloads where S4 doesn't pay off:\n\n- **Already-compressed payloads** (mp4, jpeg, gzip-of-anything, parquet\n  with column-level codec already on, lz4 \u002F zstd-prepacked archives) —\n  S4's dispatcher detects + routes to `passthrough` so there's no harm\n  done, but you're paying for the round-trip without getting savings.\n- **Small objects** (\u003C 16 KiB) — the S4F2 frame header (28 bytes) +\n  S4IX sidecar (32–96 bytes per object) eats the compression ratio\n  before you start. Break-even is workload-dependent; rule of thumb\n  is **objects > 1 MiB** make the math comfortable, \u003C 16 KiB make it\n  negative. The dispatcher does not yet skip-compress small objects\n  automatically (#105 follow-up).\n- **Metadata-ops dominant workloads** — heavy `ListObjects` \u002F `HeadObject`\n  \u002F `CopyObject` against millions of small keys add S4 hop latency\n  without touching the codec. S4 is on-path for those, so you pay\n  the second TLS hop + s3s framework overhead.\n- **Ultra-low-latency tail SLOs** (sub-10ms p99 GET) — S4's streaming\n  GET adds decoder warm-up + S4IX sidecar fetch (one extra round-trip\n  for the index when not cached). Fine for analytics \u002F archival \u002F\n  bulk; not fine for an OLTP-style hot read path.\n- **Single-region cold-storage-only** (everything goes straight to\n  Glacier) — Glacier already prices low enough that the storage\n  savings rarely pay for the compute \u002F operational cost of running S4.\n- **Strict regulatory environments without third-party audit on file** —\n  v1.0 freezes the wire + API surface, but S4 has no SOC2 \u002F ISO27001 \u002F\n  FedRAMP audit trail yet. If your compliance team's bar is \"must have\n  third-party audit on file\", S4 isn't there.\n- **As the only copy of irreplaceable data, before a production\n  reference is on file** — until at least one public production\n  deployment reference lands (we're collecting them under issue label\n  `production-reference`), pair S4 with backend-native versioning +\n  replication. The v1.0 freeze is a contract on surface stability,\n  not a substitute for the operational track record that a reference\n  deployment provides.\n\n## Durability, corruption recovery, and the repair tool\n\n### Write protocol\nA PUT goes through three S3 calls behind one client-visible request:\n\n1. **PUT `\u003Ckey>`** — the compressed S4F2-framed body (atomic single-PUT\n   for objects under the multipart threshold; otherwise an S3 multipart\n   upload with per-part frames).\n2. **PUT `\u003Ckey>.s4index`** — the S4IX sidecar with per-frame offset +\n   original-size + crc32c entries.\n3. (multipart only) **CompleteMultipartUpload** — finalises the main\n   object atomically; the sidecar is written after this completes.\n\nThe main object PUT is the **commit point**; the sidecar exists to\noptimise Range GET and is treated as recoverable \u002F rebuildable from the\nmain object (next section).\n\n### Failure modes and what each one looks like\n\n| Failure | Visible symptom | Recovery |\n|---|---|---|\n| Client disconnects mid-PUT | Backend returns `IncompleteBody` or 5xx, S4 maps to `TruncatedStream` (v0.8.4 #73). Main object NOT created; sidecar NOT created. No partial state. | None needed — retry the PUT |\n| Main object PUT succeeds, sidecar PUT fails | GETs work (full object decode, no range optimisation); Range GETs fall back to \"read whole object, decode, slice\". | `s4 repair-sidecar \u003Cbucket>\u002F\u003Ckey> --endpoint-url \u003CBACKEND>` rebuilds the sidecar by re-scanning frames in the main object |\n| Multipart UploadPart succeeds, CompleteMultipartUpload fails | Backend cleans up uncommitted parts on lifecycle-driven `AbortIncompleteMultipartUpload` (S3 default 7 days, or operator policy). | Retry the upload; orphan parts charged but auto-deleted |\n| S3 returns a corrupted object body (rare, but happens on hardware faults) | Per-frame `crc32c` mismatch on decode → `CodecError::CrcMismatch` → S4 returns 500 to client with diagnostic. | None within S4 — fix at the backend storage layer; S4 won't return corrupted bytes |\n| Sidecar diverges from main object (manual `aws-cli` edit, etc.) | First Range GET that hits the diverged region returns 500 with `IndexFrameMismatch`. | `s4 verify-sidecar \u003Cbucket>\u002F\u003Ckey> --endpoint-url \u003CBACKEND>` flags it; `s4 repair-sidecar` rebuilds |\n| Backend object exists, sidecar missing entirely | GETs work; Range GETs degrade to fallback path. | `s4 repair-sidecar \u003Cbucket>\u002F\u003Ckey> --endpoint-url \u003CBACKEND>` |\n| Bucket has accumulated orphan `.s4index` from the v0.8.15 H-g window | Storage bill grows but reads still work (orphans never reach the GET path). | `s4 sweep-orphan-sidecars \u003Cbucket> --endpoint-url \u003CBACKEND> --delete` (run without `--delete` first to inspect). See `docs\u002Forphan-sidecar-recovery.md`. |\n\n### CRC scope\n\n`crc32c` is computed over the **decompressed original payload** of each\nframe and stored in both the frame header and the sidecar entry. This\ncatches:\n- Mid-flight corruption at the backend storage layer\n- Codec backend bugs that decode to subtly wrong bytes\n- Forged manifest attacks where the attacker replaces the compressed body\n\nIt does **not** catch:\n- A correctly-encoded malicious payload from a tampered backend (the\n  CRC verifies the bytes match what was encoded, not that what was\n  encoded was the originally-PUT bytes) — that's what S4's SigV4 auth\n  on the PUT side covers\n- Lost frames from a truncated multipart that nonetheless committed\n  (the per-part Complete API itself is the integrity check there)\n\n### Repair tool status\n\nv0.9 #106 shipped three sidecar-maintenance subcommands on the `s4`\nbinary. All three point at the **backend** (not the S4 gateway) — the\ngateway hides `.s4index` from listings and decompresses bodies on GET,\nboth of which break this tooling:\n\n```bash\n# Read-only check. Exits 0 on Ok \u002F LegacyV1 \u002F MissingHarmless\n# (single-frame object, no sidecar by design) \u002F MissingUnknown (body\n# exceeds the deep-scan cap, can't classify); exits 1 on\n# MissingDivergent \u002F StaleEtag \u002F StaleSize \u002F DecodeError \u002F\n# EncryptedSidecarUnsupported (SSE-S4 chunked, see follow-up below).\ns4 verify-sidecar bucket\u002Fkey --endpoint-url https:\u002F\u002Fs3.example.com\n\n# Re-scan the main object and overwrite the sidecar. Default body cap\n# is 5 GiB (matches --max-body-bytes); pass --max-body-bytes to raise.\n# Does NOT yet support SSE-S4 chunked encrypted objects from the CLI\n# (operator needs the SSE keyring; v0.10 roadmap is to plumb\n# `--sse-s4-key \u003Cpath>` through). Until then, re-PUT the object via\n# the v0.9+ gateway to regenerate the v3 sidecar.\ns4 repair-sidecar bucket\u002Fkey --endpoint-url https:\u002F\u002Fs3.example.com\n\n# Find dangling `.s4index` whose pair is missing or stale. Dry-run by\n# default; --delete actually removes them. The default --delete only\n# removes pair-bound orphans (PairedMissing \u002F PairedEtagMismatch \u002F\n# PairedSizeMismatch); SidecarUndecodable entries stay until you\n# escalate with --delete-undecodable (guards against deleting legacy\n# reserved-name user data under --allow-legacy-reserved-key-reads).\ns4 sweep-orphan-sidecars bucket --endpoint-url https:\u002F\u002Fs3.example.com [--delete] [--delete-undecodable]\n```\n\nThe manual fallback (DELETE the sidecar — Range GET drops to the\nfull-read path) still works for one-offs without the CLI handy. See\n`docs\u002Forphan-sidecar-recovery.md` for the v0.8.15 H-g cleanup recipe\nusing `s4 sweep-orphan-sidecars`.\n\n## Production Features\n\n### Streaming I\u002FO\n\n**Measurement conditions for the numbers below** (#107): RTX 4070 Ti\nSUPER + Ryzen 9 9950X, single-pass 256 MiB compressible input, codec\n`cpu-zstd-3` (or as noted), single concurrent request, S4 colocated\nwith backend (no network RTT to amortise). TTFB excludes TLS handshake\n+ SigV4 verification (those add 5–15 ms once per connection).\n\n- **Streaming GET** for non-multipart `cpu-zstd` \u002F `passthrough` objects:\n  TTFB **8–20 ms** under the conditions above, memory ≈ zstd window\n  (8 MiB at level 3) + 64 KiB buffer\n- **Streaming PUT** for the same codecs: input never fully buffered, peak memory\n  ≈ compressed size (5 GB → ~50 MB at 100× ratio). Client-supplied whole-body\n  checksums (`Content-MD5`, `x-amz-checksum-{crc32, crc32c, sha1, sha256, crc64nvme}`)\n  are verified **in-stream** via a tee-into-hasher wrapper (v0.9 #106): mismatched\n  bytes surface as `400 BadDigest` without buffering the body. GPU codecs and\n  multipart `UploadPart` keep the buffered per-body \u002F per-part verify path\n  (the bytes are already in memory there for framing \u002F padding) —\n  see [`docs\u002Fsecurity\u002Fstreaming-checksum-coverage.md`](docs\u002Fsecurity\u002Fstreaming-checksum-coverage.md)\n  for the full coverage matrix and the codec-API constraint that makes\n  this a fundamental property of those branches, not deferred plumbing\n- **GPU streaming compress** (v0.2): nvCOMP `zstd` \u002F `gdeflate` PUTs run a\n  per-chunk pipeline so a 10 GB highly-compressible upload peaks at ~210 MB\n  host RAM instead of buffering the full input\n- **Single-PUT framed format unification** (v0.2): every compressed PUT now\n  uses the same `S4F2` multi-frame format multipart uploads use, with an\n  optional `\u003Ckey>.s4index` sidecar. Range GET partial-fetch optimisation\n  applies to single-PUT objects too, not just multipart\n- **Multipart per-part compression**: each part compressed and frame-encoded\n  (`S4F2` magic), per-frame codec dispatch (mixed codecs in one object)\n- **Multipart final-part padding trim** (v0.2): the final part of a multipart\n  with a tiny highly-compressible tail skips `S4P1` padding (saves up to\n  ~5 MiB per object on highly compressible workloads)\n- **Range GET via sidecar `\u003Ckey>.s4index`**: only the needed compressed bytes\n  are fetched from backend, decoded, and sliced. Falls back to full read when\n  sidecar is absent\n- **Encryption-aware Range GET fast-path** (v0.9 #106): SSE-S4 chunked\n  (`--sse-chunk-size > 0`, S4E6 frame) Range GETs now partial-fetch just\n  the enclosing S4E6 chunks from backend instead of pulling the full\n  encrypted body. The v3 `\u003Ckey>.s4index` sidecar carries the per-PUT salt +\n  chunk geometry so the GET path can compute the encrypted byte range\n  without re-fetching the header. SSE-KMS \u002F SSE-C \u002F SSE-S4 buffered\n  (`--sse-chunk-size 0`) keep the v0.8.12 #120 buffered fallback (= full\n  decrypt → frame-parse → slice); covering them needs separate plumbing\n  (KMS DEK envelope shape, customer-key per-request material) and is on\n  the v0.10+ roadmap\n- **Byte-range aware `upload_part_copy`** (v0.2): when the source is S4-framed,\n  the user-visible byte range is what gets copied (decompressed and re-framed),\n  not raw compressed bytes\n\n### Server-side encryption — Range GET fast-path matrix\n\nS4 supports four SSE modes (table below). The **Range GET fast-path**\nintroduced in v0.9 #106 partial-fetches only the enclosing encrypted\nchunks for a given byte range instead of pulling the full body — but it\nonly works for **SSE-S4 chunked** (`--sse-chunk-size > 0`, `S4E6` wire\nenvelope). The other three modes fall back to the v0.8.12 #120 buffered\npath (full decrypt → frame-parse → slice).\n\n| S","2026-06-11 04:11:48","CREATED_QUERY"]