[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2689":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":16,"starSnapshotCount":16,"syncStatus":37,"lastSyncTime":38,"discoverSource":39},2689,"magika","google\u002Fmagika","google","Fast and accurate AI powered file content types detection ","https:\u002F\u002Fsecurityresearch.google\u002Fmagika\u002F",null,"Python",17129,1051,63,108,0,5,38,156,21,108.07,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32,33],"ai","deep-learning","filetype","keras-classification-models","keras-models","mime-types","onnx","2026-06-12 04:00:15","# Magika\n\n[![image](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmagika.svg)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fmagika)\n[![NPM Version](https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fv\u002Fmagika)](https:\u002F\u002Fnpmjs.com\u002Fpackage\u002Fmagika)\n[![image](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fl\u002Fmagika.svg)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fmagika)\n[![image](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fmagika.svg)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fmagika)\n[![Go Version](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Ftag\u002Fgoogle\u002Fmagika?filter=go%2F*&label=go&sort=semver)](https:\u002F\u002Fpkg.go.dev\u002Fgithub.com\u002Fgoogle\u002Fmagika\u002Fgo)\n\u003C!-- [![OpenSSF Scorecard](https:\u002F\u002Fapi.scorecard.dev\u002Fprojects\u002Fgithub.com\u002Fgoogle\u002Fmagika\u002Fbadge)](https:\u002F\u002Fscorecard.dev\u002Fviewer\u002F?uri=github.com\u002Fgoogle\u002Fmagika) -->\n[![OpenSSF Best Practices](https:\u002F\u002Fwww.bestpractices.dev\u002Fprojects\u002F8706\u002Fbadge)](https:\u002F\u002Fwww.bestpractices.dev\u002Fen\u002Fprojects\u002F8706)\n![CodeQL](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmagika\u002Fworkflows\u002FCodeQL\u002Fbadge.svg)\n[![Actions status](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmagika\u002Factions\u002Fworkflows\u002Fpython-build-and-release-package.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmagika\u002Factions)\n[![PyPI Monthly Downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fmagika\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fprojects\u002Fmagika)\n[![PyPI Downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fmagika)](https:\u002F\u002Fpepy.tech\u002Fprojects\u002Fmagika)\n\nMagika is a novel AI-powered file type detection tool that relies on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized model that only weighs about a few MBs, and enables precise file identification within milliseconds, even when running on a single CPU. Magika has been trained and evaluated on a dataset of ~100M samples across 200+ content types (covering both binary and textual file formats), and it achieves an average ~99% accuracy on our test set.\n\nHere is an example of what Magika command line output looks like:\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\".\u002Fassets\u002Fmagika-screenshot.png\" width=\"600\">\n\u003C\u002Fp>\n\nMagika is used at scale to help improve Google users' safety by routing Gmail, Drive, and Safe Browsing files to the proper security and content policy scanners, processing hundreds billions samples on a weekly basis. Magika has also been integrated with [VirusTotal](https:\u002F\u002Fwww.virustotal.com\u002F) ([example](.\u002Fassets\u002Fmagika-vt.png)) and [abuse.ch](https:\u002F\u002Fbazaar.abuse.ch\u002F) ([example](.\u002Fassets\u002Fmagika-abusech.png)).\n\nFor more context you can read our initial [announcement post on Google's OSS blog](https:\u002F\u002Fopensource.googleblog.com\u002F2024\u002F02\u002Fmagika-ai-powered-fast-and-efficient-file-type-identification.html), you can consult [Magika's website](https:\u002F\u002Fsecurityresearch.google\u002Fmagika\u002F), and you can read more in our [research paper](https:\u002F\u002Fsecurityresearch.google\u002Fmagika\u002Fadditional-resources\u002Fresearch-papers-and-citation\u002F), published at the IEEE\u002FACM International Conference on Software Engineering (ICSE) 2025.\n\nYou can try Magika without installing anything by using our [web demo](https:\u002F\u002Fsecurityresearch.google\u002Fmagika\u002Fdemo\u002Fmagika-demo\u002F), which runs locally in your browser!\n\n\n# Highlights\n\n- Available as a command line tool written in Rust, a Python API, and additional bindings for Rust, JavaScript\u002FTypeScript (with an experimental npm package, which powers our [web demo](https:\u002F\u002Fsecurityresearch.google\u002Fmagika\u002Fdemo\u002Fmagika-demo\u002F)), and GoLang (WIP).\n- Trained and evaluated on a dataset of ~100M files across [200+ content types](.\u002Fassets\u002Fmodels\u002Fstandard_v3_3\u002FREADME.md).\n- On our test set, Magika achieves ~99% average precision and recall, outperforming existing approaches -- especially on textual content types.\n- After the model is loaded (which is a one-off overhead), the inference time is about 5ms per file, even when run on a single CPU.\n- You can invoke Magika with even thousands of files at the same time. You can also use `-r` for recursively scanning a directory.\n- Near-constant inference time, independently from the file size; Magika only uses a limited subset of the file's content.\n- Magika uses a per-content-type threshold system that determines whether to \"trust\" the prediction for the model, or whether to return a generic label, such as \"Generic text document\" or \"Unknown binary data\".\n- The tolerance to errors can be controlled via different prediction modes, such as `high-confidence`, `medium-confidence`, and `best-guess`.\n- The client and the bindings are already open source, and more is coming soon!\n\n# Table of Contents\n\n1. [Getting Started](#getting-started)\n   1. [Installation](#installation)\n   1. [Quick Start](#quick-start)\n1. [Documentation](#documentation)\n1. [Security Vulnerabilities](#security-vulnerabilities)\n1. [License](#license)\n1. [Disclaimer](#disclaimer)\n\n# Getting Started\n\n## Installation\n\n### Command Line Tool\n\nMagika ships a CLI written in Rust, and can be installed in several ways.\n\nVia `magika` python package:\n```shell\npipx install magika\n```\n\nVia brew (macOS \u002F Linux)\n```shell\nbrew install magika\n```\n\nVia installer script:\n```shell\ncurl -LsSf https:\u002F\u002Fsecurityresearch.google\u002Fmagika\u002Finstall.sh | sh\n```\n\nor:\n```shell\npowershell -ExecutionPolicy Bypass -c \"irm https:\u002F\u002Fsecurityresearch.google\u002Fmagika\u002Finstall.ps1 | iex\"\n```\n\nVia `magika-cli` Rust package:\n```shell\ncargo install --locked magika-cli\n```\n\n### Python package\n\n```shell\npip install magika\n```\n\n### JavaScript package\n\n```shell\nnpm install magika\n```\n\n\n## Quick Start\n\nHere you can find a number of quick examples just to get you started.\n\nTo learn about Magika's inner workings, see the [Core Concepts](https:\u002F\u002Fsecurityresearch.google\u002Fmagika\u002Fcore-concepts\u002F) section of Magika's website.\n\n### Command Line Tool Examples\n\n```shell\n% cd tests_data\u002Fbasic && magika -r * | head\nasm\u002Fcode.asm: Assembly (code)\nbatch\u002Fsimple.bat: DOS batch file (code)\nc\u002Fcode.c: C source (code)\ncss\u002Fcode.css: CSS source (code)\ncsv\u002Fmagika_test.csv: CSV document (code)\ndockerfile\u002FDockerfile: Dockerfile (code)\ndocx\u002Fdoc.docx: Microsoft Word 2007+ document (document)\ndocx\u002Fmagika_test.docx: Microsoft Word 2007+ document (document)\neml\u002Fsample.eml: RFC 822 mail (text)\nempty\u002Fempty_file: Empty file (inode)\n```\n\n```shell\n% magika .\u002Ftests_data\u002Fbasic\u002Fpython\u002Fcode.py --json\n[\n  {\n    \"path\": \".\u002Ftests_data\u002Fbasic\u002Fpython\u002Fcode.py\",\n    \"result\": {\n      \"status\": \"ok\",\n      \"value\": {\n        \"dl\": {\n          \"description\": \"Python source\",\n          \"extensions\": [\n            \"py\",\n            \"pyi\"\n          ],\n          \"group\": \"code\",\n          \"is_text\": true,\n          \"label\": \"python\",\n          \"mime_type\": \"text\u002Fx-python\"\n        },\n        \"output\": {\n          \"description\": \"Python source\",\n          \"extensions\": [\n            \"py\",\n            \"pyi\"\n          ],\n          \"group\": \"code\",\n          \"is_text\": true,\n          \"label\": \"python\",\n          \"mime_type\": \"text\u002Fx-python\"\n        },\n        \"score\": 0.996999979019165\n      }\n    }\n  }\n]\n```\n\n```shell\n% cat tests_data\u002Fbasic\u002Fini\u002Fdoc.ini | magika -\n-: INI configuration file (text)\n```\n\n```shell\n% magika --help\nDetermines file content types using AI\n\nUsage: magika [OPTIONS] [PATH]...\n\nArguments:\n  [PATH]...\n          List of paths to the files to analyze.\n\n          Use a dash (-) to read from standard input (can only be used once).\n\nOptions:\n  -r, --recursive\n          Identifies files within directories instead of identifying the directory itself\n\n      --no-dereference\n          Identifies symbolic links as is instead of identifying their content by following them\n\n      --colors\n          Prints with colors regardless of terminal support\n\n      --no-colors\n          Prints without colors regardless of terminal support\n\n  -s, --output-score\n          Prints the prediction score in addition to the content type\n\n  -i, --mime-type\n          Prints the MIME type instead of the content type description\n\n  -l, --label\n          Prints a simple label instead of the content type description\n\n      --json\n          Prints in JSON format\n\n      --jsonl\n          Prints in JSONL format\n\n      --format \u003CCUSTOM>\n          Prints using a custom format (use --help for details).\n\n          The following placeholders are supported:\n\n            %p  The file path\n            %l  The unique label identifying the content type\n            %d  The description of the content type\n            %g  The group of the content type\n            %m  The MIME type of the content type\n            %e  Possible file extensions for the content type\n            %s  The score of the content type for the file\n            %S  The score of the content type for the file in percent\n            %b  The model output if overruled (empty otherwise)\n            %%  A literal %\n\n  -h, --help\n          Print help (see a summary with '-h')\n\n  -V, --version\n          Print version\n```\n\nFor more examples and documentation about the CLI, see https:\u002F\u002Fcrates.io\u002Fcrates\u002Fmagika-cli.\n\n\n### Python Examples\n\n```python\n>>> from magika import Magika\n>>> m = Magika()\n>>> res = m.identify_bytes(b'function log(msg) {console.log(msg);}')\n>>> print(res.output.label)\njavascript\n```\n\n```python\n>>> from magika import Magika\n>>> m = Magika()\n>>> res = m.identify_path('.\u002Ftests_data\u002Fbasic\u002Fini\u002Fdoc.ini')\n>>> print(res.output.label)\nini\n```\n\n```python\n>>> from magika import Magika\n>>> m = Magika()\n>>> with open('.\u002Ftests_data\u002Fbasic\u002Fini\u002Fdoc.ini', 'rb') as f:\n>>>     res = m.identify_stream(f)\n>>> print(res.output.label)\nini\n```\n\nFor more examples and documentation about the Python module, see the [Python `Magika` module](https:\u002F\u002Fsecurityresearch.google\u002Fmagika\u002Fcli-and-bindings\u002Fpython\u002F) section.\n\n\n# Documentation\n\nPlease consult [Magika's website](https:\u002F\u002Fsecurityresearch.google\u002Fmagika) for detailed documentation about:\n- Core Concepts\n  - How Magika works\n  - Models & content types\n  - Prediction modes\n  - Understanding the output\n- CLI & Bindings (Python module, JavaScript module, ...)\n- Contributing\n- FAQ\n- ...\n\n\n# Security Vulnerabilities\n\nPlease contact us directly at magika-dev@google.com.\n\n\n# License\n\nApache 2.0; see [`LICENSE`](LICENSE) for details.\n\n\n# Disclaimer\n\nThis project is not an official Google project. It is not supported by\nGoogle and Google specifically disclaims all warranties as to its quality,\nmerchantability, or fitness for a particular purpose.\n","Magika 是一个基于 AI 的文件类型检测工具，利用深度学习技术实现快速准确的文件内容识别。该项目的核心功能在于其轻量级（仅几 MB）且高度优化的模型，能够在毫秒级时间内完成文件类型的精准识别，支持超过200种二进制和文本格式，平均准确率高达99%。Magika适用于需要高效处理大量文件并进行安全检查的场景，如电子邮件、云存储服务等，已被广泛应用于Google的Gmail、Drive以及Safe Browsing中，每周处理数百亿个样本。此外，它还与VirusTotal和abuse.ch等平台集成，进一步增强了网络环境的安全性。",2,"2026-06-11 02:50:56","top_language"]