[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72925":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":8,"rankLanguage":8,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":8,"pushedAt":8,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":14,"starSnapshotCount":14,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},72925,"gpt-prompt-engineer","mshumer\u002Fgpt-prompt-engineer","mshumer",null,"Jupyter Notebook",9661,678,81,29,0,7,62.2,"MIT License",false,"main",true,[],"2026-06-12 04:01:07","# gpt-prompt-engineer\n[![Twitter Follow](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Fmattshumer_?style=social)](https:\u002F\u002Ftwitter.com\u002Fmattshumer_) [![Open Main Version In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fmshumer\u002Fgpt-prompt-engineer\u002Fblob\u002Fmain\u002Fgpt_prompt_engineer.ipynb) [![Open Classification Version In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F16NLMjqyuUWxcokE_NF6RwHD8grwEeoaJ?usp=sharing)\n\n[Be the first to know when I publish new AI builds + demos!](https:\u002F\u002Ftally.so\u002Fr\u002Fw2M17p)\n\n## Overview\n\nPrompt engineering is kind of like alchemy. There's no clear way to predict what will work best. It's all about experimenting until you find the right prompt. `gpt-prompt-engineer` is a tool that takes this experimentation to a whole new level.\n\n**Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best.**\n\n## *New 3\u002F20\u002F24: The Claude 3 Opus Version*\nI've added a new version of gpt-prompt-engineer that takes full advantage of Anthropic's Claude 3 Opus model. This version auto-generates test cases and allows for the user to define multiple input variables, making it even more powerful and flexible. Try it out with the claude-prompt-engineer.ipynb notebook in the repo!\n\n## *New 3\u002F20\u002F24: Claude 3 Opus -> Haiku Conversion Version*\nThis notebook enables you to build lightning-fast, performant AI systems at a fraction of the typical cost. By using Claude 3 Opus to establish the latent space and Claude 3 Haiku for the actual generation, you can achieve amazing results. The process works by leveraging Opus to produce a collection of top-notch examples, which are then used to guide Haiku in generating output of comparable quality while dramatically reducing both latency and cost per generation. Try it out with the opus-to-haiku-conversion.ipynb notebook in the repo!\n\n## Features\n\n- **Prompt Generation**: Using GPT-4, GPT-3.5-Turbo, or Claude 3 Opus, `gpt-prompt-engineer` can generate a variety of possible prompts based on a provided use-case and test cases.\n\n- **Prompt Testing**: The real magic happens after the generation. The system tests each prompt against all the test cases, comparing their performance and ranking them using an ELO rating system.\n\u003Cimg width=\"1563\" alt=\"Screen Shot 2023-07-04 at 11 41 54 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fmshumer\u002Fgpt-prompt-engineer\u002Fassets\u002F41550495\u002Ff8171cff-1703-40ca-b9fd-f0aa24d07110\">\n\n- **ELO Rating System**: Each prompt starts with an ELO rating of 1200. As they compete against each other in generating responses to the test cases, their ELO ratings change based on their performance. This way, you can easily see which prompts are the most effective.\n\n- **Classification Version**: The `gpt-prompt-engineer -- Classification Version` notebook is designed to handle classification tasks. It evaluates the correctness of a test case by matching it to the expected output ('true' or 'false') and provides a table with scores for each prompt.\n\u003Cimg width=\"1607\" alt=\"Screen Shot 2023-07-10 at 5 22 24 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fmshumer\u002Fgpt-prompt-engineer\u002Fassets\u002F41550495\u002Fd5c9f2a8-97fa-445d-9c38-dec744f77854\">\n\n- **Claude 3 Version**: The claude-prompt-engineer notebook is designed to work with Anthropic's Claude 3 Opus model. It auto-generates test cases and allows for multiple input variables, making it even more powerful and flexible.\n\n- **Claude 3 Opus -> Haiku Conversion Version**: Designed to preserve Opus' quality for your use-case while getting the speed + cost benefits of using Haiku.\n\n- **[Weights & Biases](https:\u002F\u002Fwandb.ai\u002Fsite\u002Fprompts) Logging**: Optional logging to [Weights & Biases](https:\u002F\u002Fwandb.ai\u002Fsite) of your configs such as temperature and max tokens, the system and user prompts for each part, the test cases used and the final ranked ELO rating for each candidate prompt. Set `use_wandb` to `True` to use. \n\n- **[Portkey](https:\u002F\u002Fportkey.ai)**: Optional tool to log and trace all the prompt chains and their responses. Set `use_portkey` to `True` to use.\n\n## Setup\n1. [Open the notebook in Google Colab](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fmshumer\u002Fgpt-prompt-engineer\u002Fblob\u002Fmain\u002Fgpt_prompt_engineer.ipynb) or in a local Jupyter notebook. For classification, use [this one.](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F16NLMjqyuUWxcokE_NF6RwHD8grwEeoaJ?usp=sharing). For the Claude 3 version, use [this one.](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1likU_S4VfkzoLMPfVdMs3E54cn_W6I7o?usp=sharing)\n\n2. Add your OpenAI API key to the line `openai.api_key = \"ADD YOUR KEY HERE\"`. If you're using the Claude 3 version, add your Anthropic API key to the line `ANTHROPIC_API_KEY = \"ADD YOUR KEY HERE\"`.\n\n## How to Use\n\n1. If you are using the GPT-4 version, define your use-case and test cases. The use-case is a description of what you want the AI to do. Test cases are specific prompts that you would like the AI to respond to. For example:\n\n```\ndescription = \"Given a prompt, generate a landing page headline.\" # this style of description tends to work well\n\ntest_cases = [\n    {\n        'prompt': 'Promoting an innovative new fitness app, Smartly',\n    },\n    {\n        'prompt': 'Why a vegan diet is beneficial for your health',\n    },\n    {\n        'prompt': 'Introducing a new online course on digital marketing',\n    },\n    {\n        'prompt': 'Launching a new line of eco-friendly clothing',\n    },\n    {\n        'prompt': 'Promoting a new travel blog focusing on budget travel',\n    },\n    {\n        'prompt': 'Advertising a new software for efficient project management',\n    },\n    {\n        'prompt': 'Introducing a new book on mastering Python programming',\n    },\n    {\n        'prompt': 'Promoting a new online platform for learning languages',\n    },\n    {\n        'prompt': 'Advertising a new service for personalized meal plans',\n    },\n    {\n        'prompt': 'Launching a new app for mental health and mindfulness',\n    }\n]\n```\n\nFor the classification version, your test cases should be in the format:\n\n```\ntest_cases = [\n    {\n        'prompt': 'I had a great day!',\n        'output': 'true'\n    },\n    {\n        'prompt': 'I am feeling gloomy.',\n        'output': 'false'\n    },\n    \u002F\u002F add more test cases here\n]\n```\n\nFor the Claude 3 version, you can define input variables in addition to the use-case description:\n\n```\ndescription = \"Given a prompt, generate a personalized email response.\"\n\ninput_variables = [\n    {\"variable\": \"SENDER_NAME\", \"description\": \"The name of the person who sent the email.\"},\n    {\"variable\": \"RECIPIENT_NAME\", \"description\": \"The name of the person receiving the email.\"},\n    {\"variable\": \"TOPIC\", \"description\": \"The main topic or subject of the email. One to two sentences.\"}\n]\n```\n\nThe test cases will be auto-generated based on the use-case description and input variables.\n\n3. Choose how many prompts to generate. Keep in mind, this can get expensive if you generate many prompts. 10 is a good starting point.\n\n4. Call `generate_optimal_prompt(description, test_cases, number_of_prompts)` to generate a list of potential prompts, and test and rate their performance. For the classification version, just run the last cell. For the Claude 3 version, call `generate_optimal_prompt(description, input_variables, num_test_cases, number_of_prompts, use_wandb)`.\n\n5. The final ELO ratings will be printed in a table, sorted in descending order. The higher the rating, the better the prompt.\n\u003Cimg width=\"1074\" alt=\"Screen Shot 2023-07-04 at 11 48 45 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fmshumer\u002Fgpt-prompt-engineer\u002Fassets\u002F41550495\u002F324f90b8-c0ee-45fd-b219-6c44d9aa281b\">\n\nFor the classification version, the scores for each prompt will be printed in a table (see the image above).\n\n## Contributions are welcome! Some ideas:\n- have a number of different system prompt generators that create different styles of prompts, to cover more ground (ex. examples, verbose, short, markdown, etc.)\n- automatically generate the test cases\n- expand the classification version to support more than two classes using tiktoken\n\n## License\n\nThis project is [MIT](https:\u002F\u002Fgithub.com\u002Fyour_username\u002Fyour_repository\u002Fblob\u002Fmaster\u002FLICENSE) licensed.\n\n## Contact\n\nMatt Shumer - [@mattshumer_](https:\u002F\u002Ftwitter.com\u002Fmattshumer_)\n\nProject Link: [https:\u002F\u002Fgithub.com\u002Fmshumer\u002Fgpt-prompt-engineer](url)\n\nLastly, if you want to try something even cooler than this, sign up for [HyperWrite Personal Assistant](https:\u002F\u002Fapp.hyperwriteai.com\u002Fpersonalassistant) (most of my time is spent on this). It's basically an AI with access to real-time information that a) is incredible at writing naturally, and b) can operate your web browser to complete tasks for you.\n\nHead to [ShumerPrompt](https:\u002F\u002FShumerPrompt.com), my \"Github for Prompts\"!\n","gpt-prompt-engineer 是一个用于优化和测试大语言模型提示词的工具。它支持用户输入任务描述和测试案例，自动生成、测试并根据ELO评分系统对多个提示词进行排名，从而找到表现最佳的提示词。项目提供了多种版本，包括基于GPT-4、GPT-3.5-Turbo以及Anthropic的Claude 3 Opus模型的版本，其中Claude 3 Opus版本能够自动产生测试案例，并允许定义多个输入变量，进一步增强了灵活性与实用性。此外，还有一个特别版本通过结合Claude 3 Opus与Haiku模型来构建高效且成本低廉的AI应用。此工具适用于需要精细化调整自然语言处理任务中的提示词以提高模型输出质量的各种场景。",2,"2026-06-11 03:44:00","high_star"]