[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-84095":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":16,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":18,"hasPages":18,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":9,"trendingCount":15,"starSnapshotCount":15,"syncStatus":23,"lastSyncTime":24,"discoverSource":25},84095,"capcut-tts-api","K07VN\u002Fcapcut-tts-api","K07VN","capcut api tts & stt",null,"Python",113,69,98,1,0,5,5.54,false,"main",[],"2026-06-12 02:04:38","# CapCut Common Task Client\n\nPure Python command-line client for CapCut common task workflows:\n\n- Text to Speech (TTS)\n- Speech to Text \u002F subtitle recognition (STT)\n- Audio upload for STT\n- Task polling for TTS and STT\n\nThis client does not call native libraries, does not load `.dylib` files, does not use C++ helpers, and does not use `ctypes`. Request construction, payload signing, upload signing, and VOD authorization are implemented in Python.\n\n> Use this tool only with accounts, devices, sessions, and media that you are authorized to use.\n\n## Donate\n\nIf this project helps your work, you can support development with USDT on TRC20:\n\n```text\nTL4sPkfSTVnmneKvvuCfa2wSDnADjxDqYV\n```\n\nNetwork: TRC20\n\n---\n\n## English\n\n### Features\n\n- Builds CapCut `\u002Flv\u002Fv1\u002Fcommon_task\u002Fnew` requests for TTS and STT.\n- Uploads local audio\u002Fvideo files to the CapCut text-recognition VOD space before STT.\n- Polls `\u002Flv\u002Fv1\u002Fcommon_task\u002Fquery` for task results.\n- Generates request body hashes with `x-ss-stub`.\n- Generates the common `sign` header used by the captured CapCut flow.\n- Generates the TTS inner payload RSA signature in pure Python.\n- Generates AWS SigV4 authorization for `ApplyUploadInner` and `CommitUploadInner` in pure Python.\n- Supports `--device-json` overrides for device\u002Fsession values.\n\n### Requirements\n\n- Python 3.9+\n- `requests`\n\nInstall dependency:\n\n```bash\npython3 -m pip install requests\n```\n\n### Device Configuration\n\nThe script includes a default CapCut desktop device profile in `DEFAULT_DEVICE`. You can override any field by passing a JSON file:\n\n```bash\npython3 capcut_common_task_client.py tts-new \\\n  --device-json device.json \\\n  --text \"Hello world\"\n```\n\nExample `device.json`:\n\n```json\n{\n  \"device_id\": \"7647183892936328721\",\n  \"iid\": \"7647185302080423697\",\n  \"tdid\": \"7647183892936328721\",\n  \"appvr\": \"8.7.0\",\n  \"version_name\": \"8.7.0\",\n  \"version_code\": \"8.7.0\",\n  \"lan\": \"vi-VN\",\n  \"loc\": \"VN\",\n  \"region\": \"VN\"\n}\n```\n\n### Commands\n\n#### 1. Create a TTS Task\n\n```bash\npython3 capcut_common_task_client.py tts-new \\\n  --text \"Hello world\"\n```\n\nUseful voice options:\n\n```bash\npython3 capcut_common_task_client.py tts-new \\\n  --text \"Hello world\" \\\n  --voice BV074_streaming \\\n  --resource-id 7102355709945188865 \\\n  --rate 1.0\n```\n\nThe response contains a task `id` and `token`:\n\n```json\n{\n  \"data\": {\n    \"tasks\": [\n      {\n        \"id\": \"...\",\n        \"status\": \"queueing\",\n        \"token\": \"...\"\n      }\n    ]\n  }\n}\n```\n\n#### 2. Query a TTS Task\n\n```bash\npython3 capcut_common_task_client.py tts-query \\\n  --task-id \"TASK_ID\" \\\n  --token \"TOKEN\"\n```\n\n#### 3. Upload an Audio\u002FVideo File\n\n```bash\npython3 capcut_common_task_client.py upload-audio \\\n  --audio-file 1.mp4\n```\n\nExample output:\n\n```json\n{\n  \"vid\": \"v10639g5000...\",\n  \"md5\": \"6171f4249ae1561cab6c4e4f1e1d71fa\",\n  \"local_md5\": \"6171f4249ae1561cab6c4e4f1e1d71fa\",\n  \"duration_ms\": 1008,\n  \"format\": \"mp3\",\n  \"size\": 20160,\n  \"file_type\": \"audio\",\n  \"store_uri\": \"tos-alisg-v-37d494-sg\u002F...\"\n}\n```\n\n#### 4. Create an STT Task from an Uploaded File\n\nUse this when you already have `vid` and `md5` from `upload-audio`:\n\n```bash\npython3 capcut_common_task_client.py stt-new \\\n  --audio-vid \"VID_FROM_UPLOAD\" \\\n  --audio-md5 \"MD5_FROM_UPLOAD\" \\\n  --duration-ms 1008 \\\n  --language vi-VN\n```\n\n#### 5. Upload and Create STT in One Command\n\n```bash\npython3 capcut_common_task_client.py stt-file \\\n  --audio-file 1.mp4 \\\n  --language vi-VN\n```\n\nThe command first uploads the media, then submits the STT task. The response contains a task `id` and `token`.\n\n#### 6. Query an STT Task\n\n```bash\npython3 capcut_common_task_client.py stt-query \\\n  --task-id \"TASK_ID\" \\\n  --token \"TOKEN\"\n```\n\n#### 7. Save Response to a File\n\n```bash\npython3 capcut_common_task_client.py stt-query \\\n  --task-id \"TASK_ID\" \\\n  --token \"TOKEN\" \\\n  --out response.json\n```\n\n#### 8. Preview a Request Without Calling the API\n\n```bash\npython3 capcut_common_task_client.py stt-new \\\n  --audio-vid \"VID_FROM_UPLOAD\" \\\n  --audio-md5 \"MD5_FROM_UPLOAD\" \\\n  --duration-ms 1000 \\\n  --language vi-VN \\\n  --dry-run\n```\n\n### How It Works\n\n#### TTS Flow\n\n1. Build SSML from `--text`, `--voice`, `--resource-id`, and `--rate`.\n2. Create the inner TTS payload.\n3. Generate the payload `sign` using RSA PKCS#1 v1.5 in pure Python.\n4. Wrap the payload in a CapCut common task body.\n5. Generate `x-ss-stub`, `x-khronos`, `device-time`, and request `sign`.\n6. POST to `\u002Flv\u002Fv1\u002Fcommon_task\u002Fnew`.\n7. Poll `\u002Flv\u002Fv1\u002Fcommon_task\u002Fquery` with the returned task `id` and `token`.\n\n#### STT File Flow\n\n1. Call `\u002Flv\u002Fv1\u002Fupload_sign` to obtain temporary VOD credentials.\n2. Sign `ApplyUploadInner` with AWS SigV4.\n3. Upload the media bytes to the returned VOD upload host.\n4. Finish the upload with the part CRC32.\n5. Sign and call `CommitUploadInner` to receive the media `vid`, `md5`, and duration.\n6. Submit `\u002Flv\u002Fv1\u002Fcommon_task\u002Fnew` with `req_key=cc_audio_subtitle_asr`.\n7. Poll `\u002Flv\u002Fv1\u002Fcommon_task\u002Fquery` until the task succeeds.\n\n### Where Are the Subtitles?\n\nSTT query responses store subtitles inside:\n\n```text\ndata.tasks[0].payload\n```\n\n`payload` is itself a JSON string. Parse it, then read:\n\n```text\npayload.utterances[].text\npayload.utterances[].start_time\npayload.utterances[].end_time\npayload.utterances[].words[]\n```\n\nQuick extractor:\n\n```bash\npython3 - \u003C\u003C'PY'\nimport json\n\ndata = json.load(open(\"response.json\", encoding=\"utf-8\"))\npayload = json.loads(data[\"data\"][\"tasks\"][0][\"payload\"])\n\nfor item in payload.get(\"utterances\", []):\n    print(f'[{item[\"start_time\"]}ms -> {item[\"end_time\"]}ms] {item[\"text\"]}')\nPY\n```\n\n### Notes\n\n- `upload-audio` accepts media files such as `.mp3`, `.m4a`, and `.mp4` when CapCut's upload service can parse the media.\n- `duration_ms` is read from the upload commit result when using `stt-file`.\n- The removed device-generation flow is intentionally not part of this client. Device identity should be configured explicitly through `DEFAULT_DEVICE` or `--device-json`.\n\n---\n\n## Tiếng Việt\n\n### Donate \u002F Ủng hộ\n\nNếu project hữu ích cho công việc của bạn, có thể ủng hộ bằng USDT mạng TRC20:\n\n```text\nTL4sPkfSTVnmneKvvuCfa2wSDnADjxDqYV\n```\n\nNetwork: TRC20\n\n### Tính năng\n\n- Tạo request CapCut `\u002Flv\u002Fv1\u002Fcommon_task\u002Fnew` cho TTS và STT.\n- Upload file audio\u002Fvideo local lên VOD space dùng cho nhận diện phụ đề.\n- Query `\u002Flv\u002Fv1\u002Fcommon_task\u002Fquery` để lấy kết quả task.\n- Tạo `x-ss-stub` bằng MD5 của body.\n- Tạo header `sign` theo flow CapCut đã phân tích.\n- Tạo chữ ký RSA cho payload TTS bằng Python thuần.\n- Tạo AWS SigV4 cho `ApplyUploadInner` và `CommitUploadInner` bằng Python thuần.\n- Hỗ trợ override cấu hình thiết bị\u002Fsession bằng `--device-json`.\n\n### Yêu cầu\n\n- Python 3.9+\n- `requests`\n\nCài dependency:\n\n```bash\npython3 -m pip install requests\n```\n\n### Cấu hình thiết bị\n\nScript có sẵn profile thiết bị CapCut desktop trong `DEFAULT_DEVICE`. Có thể override bằng file JSON:\n\n```bash\npython3 capcut_common_task_client.py tts-new \\\n  --device-json device.json \\\n  --text \"Xin chào\"\n```\n\nVí dụ `device.json`:\n\n```json\n{\n  \"device_id\": \"7647183892936328721\",\n  \"iid\": \"7647185302080423697\",\n  \"tdid\": \"7647183892936328721\",\n  \"appvr\": \"8.7.0\",\n  \"version_name\": \"8.7.0\",\n  \"version_code\": \"8.7.0\",\n  \"lan\": \"vi-VN\",\n  \"loc\": \"VN\",\n  \"region\": \"VN\"\n}\n```\n\n### Cách dùng\n\n#### 1. Tạo task TTS\n\n```bash\npython3 capcut_common_task_client.py tts-new \\\n  --text \"Xin chào\"\n```\n\nTuỳ chỉnh giọng đọc:\n\n```bash\npython3 capcut_common_task_client.py tts-new \\\n  --text \"Xin chào\" \\\n  --voice BV074_streaming \\\n  --resource-id 7102355709945188865 \\\n  --rate 1.0\n```\n\nResponse sẽ có `id` và `token` của task.\n\n#### 2. Query task TTS\n\n```bash\npython3 capcut_common_task_client.py tts-query \\\n  --task-id \"TASK_ID\" \\\n  --token \"TOKEN\"\n```\n\n#### 3. Upload file audio\u002Fvideo\n\n```bash\npython3 capcut_common_task_client.py upload-audio \\\n  --audio-file 1.mp4\n```\n\nKết quả trả về gồm `vid`, `md5`, `duration_ms`, `format`, `size`, `file_type`, và `store_uri`.\n\n#### 4. Tạo task STT từ file đã upload\n\nKhi đã có `vid` và `md5`:\n\n```bash\npython3 capcut_common_task_client.py stt-new \\\n  --audio-vid \"VID_FROM_UPLOAD\" \\\n  --audio-md5 \"MD5_FROM_UPLOAD\" \\\n  --duration-ms 1008 \\\n  --language vi-VN\n```\n\n#### 5. Upload rồi tạo STT bằng một lệnh\n\n```bash\npython3 capcut_common_task_client.py stt-file \\\n  --audio-file 1.mp4 \\\n  --language vi-VN\n```\n\nLệnh này upload file trước, lấy `vid\u002Fmd5\u002Fduration_ms`, rồi tự submit task STT.\n\n#### 6. Query task STT\n\n```bash\npython3 capcut_common_task_client.py stt-query \\\n  --task-id \"TASK_ID\" \\\n  --token \"TOKEN\"\n```\n\n#### 7. Lưu response ra file\n\n```bash\npython3 capcut_common_task_client.py stt-query \\\n  --task-id \"TASK_ID\" \\\n  --token \"TOKEN\" \\\n  --out response.json\n```\n\n#### 8. Xem request mà không gọi API\n\n```bash\npython3 capcut_common_task_client.py stt-new \\\n  --audio-vid \"VID_FROM_UPLOAD\" \\\n  --audio-md5 \"MD5_FROM_UPLOAD\" \\\n  --duration-ms 1000 \\\n  --language vi-VN \\\n  --dry-run\n```\n\n### Cách thức hoạt động\n\n#### Flow TTS\n\n1. Tạo SSML từ `--text`, `--voice`, `--resource-id`, và `--rate`.\n2. Tạo payload TTS bên trong.\n3. Ký payload bằng RSA PKCS#1 v1.5 thuần Python.\n4. Đóng payload vào body common task.\n5. Tạo `x-ss-stub`, `x-khronos`, `device-time`, và header `sign`.\n6. POST tới `\u002Flv\u002Fv1\u002Fcommon_task\u002Fnew`.\n7. Query `\u002Flv\u002Fv1\u002Fcommon_task\u002Fquery` bằng `task_id` và `token`.\n\n#### Flow STT từ file\n\n1. Gọi `\u002Flv\u002Fv1\u002Fupload_sign` để lấy credential VOD tạm thời.\n2. Ký `ApplyUploadInner` bằng AWS SigV4.\n3. Upload bytes của file lên VOD upload host.\n4. Finish upload bằng CRC32 của part.\n5. Ký và gọi `CommitUploadInner` để lấy `vid`, `md5`, và duration.\n6. Submit `\u002Flv\u002Fv1\u002Fcommon_task\u002Fnew` với `req_key=cc_audio_subtitle_asr`.\n7. Query `\u002Flv\u002Fv1\u002Fcommon_task\u002Fquery` đến khi task thành công.\n\n### Phụ đề nằm ở đâu?\n\nResponse STT chứa phụ đề trong:\n\n```text\ndata.tasks[0].payload\n```\n\n`payload` là JSON string. Parse string này rồi đọc:\n\n```text\npayload.utterances[].text\npayload.utterances[].start_time\npayload.utterances[].end_time\npayload.utterances[].words[]\n```\n\nTrích phụ đề nhanh:\n\n```bash\npython3 - \u003C\u003C'PY'\nimport json\n\ndata = json.load(open(\"response.json\", encoding=\"utf-8\"))\npayload = json.loads(data[\"data\"][\"tasks\"][0][\"payload\"])\n\nfor item in payload.get(\"utterances\", []):\n    print(f'[{item[\"start_time\"]}ms -> {item[\"end_time\"]}ms] {item[\"text\"]}')\nPY\n```\n\n### Ghi chú\n\n- `upload-audio` có thể dùng với `.mp3`, `.m4a`, `.mp4` nếu dịch vụ upload của CapCut đọc được media.\n- Với `stt-file`, `duration_ms` được lấy tự động từ kết quả commit upload.\n- Flow tạo thiết bị tự động đã bị loại bỏ. Cấu hình thiết bị nên được khai báo rõ bằng `DEFAULT_DEVICE` hoặc `--device-json`.\n",2,"2026-06-11 04:12:16","CREATED_QUERY"]