[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80558":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":12,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":14,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":10,"rankLanguage":10,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":17,"topics":20,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":14,"starSnapshotCount":14,"syncStatus":36,"lastSyncTime":37,"discoverSource":38},80558,"fc","xtellect\u002Ffc","xtellect","fc is a research-grade, lossless floating point compressor.","",null,"C",73,1,0,0.9,"Other",false,"main",true,[21,22,23,24,25,26,27,28,29,30,31,32],"avx2","c","codec","compression","data-compression","floating-point","ieee-754","linux","lossless-compression","scientific-computing","simd","time-series","2026-06-12 02:04:03","# fc — Floating-Point Compressor\n\nCopyright (c) 2026 Praveen Vaddadi \u003Cthynktank@gmail.com>\nLicensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) and\n[NOTICE](NOTICE).\n\n`fc` is a lossless compressor for streams of IEEE-754 64-bit doubles. It\nsplits the input into adaptively-sized blocks (quanta), runs a competition\nbetween many specialized codecs on each block, and emits the smallest\nresult. Compression and decompression are multi-threaded with POSIX\nthreads and the hot paths are hand-vectorized for x86-64 with AVX2 +\nSSE4.2 + BMI + LZCNT.\n\nThe current version string is `fc 1.56` (see `fc_ver` in `fc.c`).\n\n## Representative numbers\n\nAggregate result of the bundled `test_fc` harness across **17 synthetic\ndatasets**, 1 Mi doubles each (8 MiB per dataset, ≈136 MiB \u002F 142.6 MB\ndecimal total), median-of-5 trials per dataset, 8 encoder threads,\ndecoder threads auto-selected, on the maintainer's x86-64 + AVX2 box.\nThe \"Ratio\" column is `total_original \u002F total_compressed`; the throughput\ncolumns are the harmonic mean of per-dataset MB\u002Fs. Numbers come straight\nout of `test_fc.csv` and will move on different hardware — re-run\n`make test` to regenerate.\n\n| Codec       | Ratio  | Encode (MB\u002Fs) | Decode (MB\u002Fs) |\n| ----------- | -----: | ------------: | ------------: |\n| **fc**      | **3.07** | **120**     | **1277**      |\n| zstd -3     |   2.07 |           528 |          1556 |\n| zstd -9     |   2.09 |           111 |          1572 |\n| fpzip       |   1.71 |           123 |           121 |\n| lz4hc -9    |   1.69 |            51 |          5835 |\n| gorilla     |   1.66 |           683 |           971 |\n| lz4         |   1.62 |          2176 |          5353 |\n\n**Headline:** `fc` is the **best size-oriented floating-point\ncompressor in this benchmark, not the best all-around compressor.** It\nwins ratio on **10 of 17** datasets outright and is never worse than\nroughly third on any of them.\n\n**Where `fc` wins (often by a lot):**\n\n- Structured \u002F analytic floats — constant ≈ 39,756× (vs. zstd-9 ≈\n  11,619×), parabolic ≈ 2,973× (vs. 2.5×), int×1000 ≈ 5,388× (vs. 4.3×),\n  piecewise ≈ 26.8× (vs. 14.9×), stocks ≈ 15.6× (vs. 7.1×).\n- Periodic \u002F quasi-periodic signals — sin-low-freq, audio-mix, ar2-damped\n  beat gorilla and fpzip by 1.3×–2.7×.\n- **Decode is fast and parallel** (auto-threaded inside `fc_dec`):\n  aggregate ≈ 1.28 GB\u002Fs, ~10× faster than encode, ~10× faster than\n  fpzip, ~1.3× faster than gorilla, and ~80% of zstd-3 decode. Good fit\n  for write-once \u002F read-many time-series stores.\n\n**Where it loses:**\n\n- **Byte-pattern-friendly quantized data** — zstd wins on\n  `decimal-cents` (zstd-9 ≈ 3,465× vs fc 268×), `dict-16` (zstd-9 ≈\n  10,268× vs fc 4,660×), and `quantized-4lvl` (zstd-9 ≈ 9,675× vs fc\n  1,037×). General-purpose LZ catches structure `fc` doesn't currently\n  model.\n- **Noisy natural-float arrays** — fpzip is marginally better on\n  `random-walk` (1.39 vs 1.38), `climate` (1.395 vs 1.380),\n  `geo-coords` (2.012 vs 2.000), and `sensor-noisy` (1.397 vs 1.386).\n  Differences are small but consistent.\n- **Encode throughput** — ~120 MB\u002Fs aggregate. The mode competition is\n  what buys the ratio; you pay for it in encoder CPU. If encode is the\n  bottleneck, lz4 (~2.2 GB\u002Fs) or zstd-3 (~530 MB\u002Fs) dominate.\n- **Pseudo-random data** — ratio ≈ 1.000. Same as every other lossless\n  codec; `fc` just doesn't waste headers.\n\n**Hard constraints:** lossless only (no quantization). The encoder takes\n8-byte-aligned input in multiples of 8 bytes; the heuristics and most\nmodes are tuned for IEEE-754 double-precision bit patterns, so other\n8-byte payloads will compress but you may leave ratio on the table.\nRequires x86-64 with AVX2 + SSE4.2 + BMI + LZCNT.\n\n**Rule of thumb:**\n\n- Want the smallest file for a floating-point stream and decode-side\n  latency matters → `fc`.\n- Encode-bound on heterogeneous data → zstd or lz4.\n- Noisy scientific arrays where every byte counts → benchmark `fc` vs\n  fpzip on your data; they're within ~1% of each other.\n\n## Status\n\nThis is a research-grade single-file library. The on-disk format is\nversioned by a magic number in the stream header; backwards-incompatible\nformat changes will bump the magic. Treat compressed output as\nopaque — do not rely on internal layout.\n\n## Repository layout\n\n| File          | Purpose                                                      |\n| ------------- | ------------------------------------------------------------ |\n| `fc.h`        | Public API (encode, decode, monitoring counters).            |\n| `fc.c`        | Compressor implementation. ~50 codecs + dispatch + threads.  |\n| `gorilla.h`   | Bundled Redis \"Gorilla\" XOR\u002Fdelta codec — third-party.       |\n| `gorilla.c`   | Bundled Redis \"Gorilla\" XOR\u002Fdelta codec — third-party.       |\n| `test_fc.c`   | Round-trip + benchmark harness over 17 synthetic datasets.   |\n| `Makefile`    | Builds `fc.o`, `gorilla.o`, and the `test_fc` benchmark.     |\n| `LICENSE`     | Apache License 2.0 (this project's own code).                |\n| `NOTICE`      | Attribution and third-party license disclosures.             |\n| `CONTRIBUTING.md` | Contribution guide and DCO.                              |\n\n## Public API\n\nThe whole API is in `fc.h`:\n\n```c\ntypedef struct { int p, t, c; } fc_cfg;\n\nsize_t fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg);\nsize_t fc_dec(const void *src, size_t bytes, void *dst);\n\nextern const char  fc_ver[];\nextern unsigned long fc_dec_mode_hist[64];\nextern unsigned long fc_enc_mode_time_ns[64];\nextern unsigned long fc_enc_mode_calls[64];\nextern unsigned long fc_enc_mode_wins[64];\n\nvoid        fc_dec_mode_hist_reset(void);\nconst char *fc_mode_name(int mode);\n```\n\n### `fc_cfg` fields\n\n| Field | Meaning                                                          |\n| ----- | ---------------------------------------------------------------- |\n| `p`   | log2 of the predictor table size. Clamped to **[10, 16]** inside `fc_enc`. The `test_fc` benchmark passes 18, which is silently clamped to 16. |\n| `t`   | Worker thread count for the encoder. The decoder takes no `fc_cfg` and auto-selects threads via `sysconf(_SC_NPROCESSORS_ONLN)`, capped at 128 and at the per-stream block count. |\n| `c`   | Reserved. Currently unused (`(void)cfg.c;` in `fc_enc`).         |\n\n### `fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg)`\n\nCompresses `bytes` of input (a multiple of 8 — one `double` per 8 bytes).\nReturns the number of compressed bytes written, or `0` on failure: NULL\n`src` \u002F `dst`, `bytes` not a multiple of 8, or an internal allocation\nfailure.\n\nThe destination buffer must be sized by the caller. The library does not\npublish a closed-form `comp_bound`. The bundled benchmark allocates\n`2 * bytes + 64 KiB` (`test_fc.c`), which has been sufficient for every\ndataset in the test suite; for arbitrary inputs, allocate at least that\nmuch and check the return value (the encoder will return `0` rather than\noverflow).\n\n### `fc_dec(const void *src, size_t bytes, void *dst)`\n\nDecompresses a stream produced by `fc_enc`. Returns the number of\ndecompressed bytes written, or `0` on failure. Failure conditions:\nNULL `src` \u002F `dst`, truncated header, wrong magic, header\n`predsizelg2` outside `[10, 16]`, a per-block header that claims more\ninput or output than remains, total decoded size that doesn't match the\nheader's `original_bytes`, or an internal allocation failure. The caller\nis responsible for sizing `dst` from the original byte count recorded in\nthe stream header (use `fc_dec` once with a header read, or simply\nallocate the original buffer length you compressed).\n\n**Note:** unknown \u002F unimplemented mode IDs are *not* currently treated\nas a hard failure — the affected block is silently left as zeros. Treat\nstreams from untrusted sources accordingly.\n\n### Monitoring counters\n\nThe `fc_*_mode_*` arrays are 64-entry tables indexed by mode ID. They are\nupdated atomically and are intended for diagnostics only:\n\n- `fc_dec_mode_hist[m]` — number of blocks decoded with mode `m`.\n- `fc_enc_mode_calls[m]` — number of times mode `m` was *evaluated* during\n  the encoder competition.\n- `fc_enc_mode_wins[m]` — number of times mode `m` *won* and was written.\n- `fc_enc_mode_time_ns[m]` — cumulative wall-clock nanoseconds spent\n  evaluating mode `m`.\n\n`fc_mode_name(m)` returns a short string for mode `m` (or `\"?\"` for\nunused IDs); see the table below.\n\n## Modes\n\n`fc` currently defines **50** mode IDs (0–49 used; 12, 14, and 50–63\nreserved). Mode names as exposed by `fc_mode_name`:\n\n```\n 0 PRED                    25 PRED2\n 1 CONST                   26 PRED_ADAPTIVE\n 2 STRIDE                  27 VITERBI\n 3 XORZ                    28 DELTA_BINNED\n 4 LZ                      29 PRED_RC\n 5 RAW                     30 PRED_INTERLEAVED\n 6 FLOAT32                 31 BWT\n 7 ORDERED_DELTA           32 LZ_DICT\n 8 FUZZY_STRIDE            33 CONV_N\n 9 ALP                     34 SIGN_CONV\n10 TRAILING_ZERO_BP        35 CONV_DOUBLE\n11 BYTE_TRANSPOSE          36 MTF_LZ\n13 XOR128                  37 CONV_DOUBLE_BP\n15 LSB_STRIP               38 CONV_N_BINNED\n16 LOOKBACK_DELTA          39 PRED_SIMD_INTERLEAVED\n17 FLOAT_MULT              40 FUZZY_STRIDE_ANS\n18 FCM_RLE                 41 PAQ_MIXER\n19 DICT                    42 PAQ4_MIXER\n20 DELTA2                  43 BWT_MTF_TANS\n21 BITPLANE                44 PRED4\n22 INT_MULT                45 DELTA_DP_BINNED\n23 CONV1                   46 CONV_N_DP_BINNED\n24 PRED_TANS               47 ELF\n                           48 LZ_SPLIT\n                           49 BWT_MTF_RC\n```\n\nRoughly grouped:\n\n- **Predictors**: `PRED`, `PRED2`, `PRED4`, `PRED_TANS`, `PRED_RC`,\n  `PRED_ADAPTIVE`, `PRED_INTERLEAVED`, `PRED_SIMD_INTERLEAVED`, `VITERBI`,\n  `LSB_STRIP` — FCM\u002FDFCM-style hash predictors with various residual\n  coders (raw, tANS, range coding) and SIMD layouts.\n- **XOR \u002F delta**: `XORZ`, `XOR128`, `LOOKBACK_DELTA`, `ORDERED_DELTA`,\n  `DELTA2`, `DELTA_BINNED`, `DELTA_DP_BINNED`.\n- **Constant \u002F stride \u002F dictionary**: `CONST`, `STRIDE`, `FUZZY_STRIDE`,\n  `FUZZY_STRIDE_ANS`, `DICT`, `LZ_DICT`, `MTF_LZ`.\n- **Lempel-Ziv**: `LZ`, `LZ_SPLIT`.\n- **Floating-point specific**: `FLOAT32`, `FLOAT_MULT`, `INT_MULT`,\n  `ALP` (Adaptive Lossless Floating-point), `ELF` (Erasing Last bits).\n- **Transforms**: `BYTE_TRANSPOSE`, `BITPLANE`, `TRAILING_ZERO_BP`,\n  `SIGN_CONV`, `BWT`, `BWT_MTF_TANS`, `BWT_MTF_RC`.\n- **Convolutional \u002F linear models**: `CONV1`, `CONV_N`, `CONV_DOUBLE`,\n  `CONV_DOUBLE_BP`, `CONV_N_BINNED`, `CONV_N_DP_BINNED`.\n- **Mixers**: `PAQ_MIXER`, `PAQ4_MIXER`, `FCM_RLE`.\n- **Fallback**: `RAW`.\n\n`fc.c` is the source of truth for what each mode actually does.\n\n## How it works\n\n1. **Header.** The stream begins with a fixed-size header containing a\n   magic number, the original byte count, the quantum size used, and the\n   clamped predictor `log2`.\n2. **Block planning.** Input is scanned and split into blocks. The\n   default quantum is **256 KiB** (`C_QUANTUM_BYTES`) but a probe\n   (`ceq_probe_chunk_values`) can grow a block up to **1 MiB**\n   (`C_QUANTUM_MAX_BYTES`) when the data looks low-entropy.\n3. **Mode competition.** Workers pull blocks from a queue. For each\n   block, the encoder evaluates a feature-gated subset of the modes\n   (`exp_range`, `sign_flips`, `distinct_count`, `looks_like_repeats`,\n   and the running best decide which modes are worth trying) and keeps\n   the smallest output. Per-mode wall time, call count, and win count\n   are recorded in the `fc_enc_mode_*` counters.\n4. **Emission.** Each block is written as `[1-byte mode][payload]`, with\n   length and offset bookkeeping in the stream so `fc_dec` can find\n   block boundaries without scanning the payload.\n5. **Decode.** `fc_dec` walks the block index and dispatches each\n   block's payload to the matching codec. Like encode, decode is\n   multi-threaded; `fc_dec` itself picks the worker count\n   automatically (online CPUs, capped at 128 and at the block count).\n\n## Building\n\n```bash\nmake            # builds fc.o, gorilla.o, and the test_fc benchmark\nmake test       # runs .\u002Ftest_fc\nmake clean\n```\n\n### Toolchain\n\n- A C11 compiler (`clang` or `gcc`). `clang` is the Makefile default.\n- POSIX threads.\n- `pthread` and `libm` (linked by default).\n\n### Required CPU features\n\nThe default `CFLAGS` enable `-mavx2 -msse4.2 -mbmi -mlzcnt`. The library\nuses these intrinsics directly; running the binary on a CPU without them\nwill fault with `SIGILL`. There is no portable fallback.\n\n### Optional benchmark dependencies\n\nAuto-detected by the `Makefile` and only used by `test_fc.c` for\nside-by-side comparisons — they are not needed to build or use the\nlibrary itself:\n\n- `libzstd` (via `pkg-config libzstd`)\n- `liblz4`  (via `pkg-config liblz4`)\n- `fpzip`   (header probe at `\u002Fusr\u002Finclude\u002Ffpzip.h`, links `-lfpzip`)\n\n## Benchmark harness\n\n`test_fc` runs each of 17 synthetic generators (constant, linear,\nparabolic, AR(2)-damped, piecewise, integer-multiples, decimal-cents,\ndict-16, sin-low-freq, audio-mix, random-walk, quantized-4lvl, climate,\ngeo-coords, stocks, sensor-noisy, pseudo-random) at 1 Mi doubles per\ndataset (overridable via `FCBENCH_N`), 5 trials timed at median, 8\nencoder threads (compile-time `THREADS`), and writes `test_fc.csv` with\ncompression ratio and encode\u002Fdecode throughput against the optional\nbaselines (zstd, lz4, fpzip, gorilla) that were detected at build time.\n\nRound-trip correctness is checked on every dataset. A `MISMATCH` line in\nthe output is a regression.\n\n## Limitations\n\n- x86-64 only; no ARM\u002FNEON path. AVX2 + SSE4.2 + BMI + LZCNT required\n  at run time.\n- `fc_dec` thread count is auto-selected and not user-configurable\n  (the decode API takes no `fc_cfg`).\n- `cfg.c` is reserved and currently a no-op (`(void)cfg.c;`).\n- The encoder requires `bytes` to be a multiple of 8 (one full double).\n- The predictor parameter `cfg.p` is silently clamped to `[10, 16]`.\n- Unknown mode IDs in a compressed stream are not flagged as errors;\n  affected blocks decode to zeros.\n- The on-disk format is not stable across major versions.\n\n## License\n\nThis project's own code (`fc.c`, `fc.h`, `test_fc.c`, `Makefile`,\ndocumentation, and build infrastructure) is licensed under the\n**Apache License, Version 2.0**. See [LICENSE](LICENSE).\n\nThe bundled files `gorilla.c` and `gorilla.h` are **not** Apache 2.0.\nThey are © Redis Ltd. and remain under their original triple license:\nRSALv2, SSPLv1, or AGPLv3 (your choice). See the file headers and\n[NOTICE](NOTICE) for the full text and attribution. Downstream users who\ncannot accept any of those three licenses should rebuild without\n`gorilla.o` (the core `fc` library does not link against it; only the\n`test_fc` benchmark does).\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md). Contributions to this project's\nown files are accepted under Apache 2.0; please sign off your commits\n(`git commit -s`).\n\n## Contact\n\n- Maintainer: Praveen Vaddadi — \u003Cthynktank@gmail.com>\n- Security reports: same address; please do not file public issues for\n  vulnerabilities.\n","fc是一个研究级的无损浮点数压缩器，专门用于压缩IEEE-754 64位双精度浮点数流。它通过将输入分割成自适应大小的块，并在每个块上运行多个专业编解码器之间的竞争，输出最小的结果。该项目利用多线程技术（POSIX线程）进行压缩和解压缩，并针对x86-64架构使用AVX2、SSE4.2、BMI和LZCNT指令集优化了性能关键路径，以实现高效处理。特别适用于需要高效存储与快速读取大量结构化或周期性浮点数据的场景，如科学计算中的时间序列分析等。",2,"2026-06-11 04:01:13","CREATED_QUERY"]