⛓ ChainForge

ChainForge is an open-source visual programming environment for prompt engineering. With ChainForge, you can evaluate the robustness of prompts and text generation models in a way that goes beyond anecdotal evidence. We believe prompting multiple LLMs, comparing their responses and testing hypotheses about them should be not only easy, but fun.

To learn more, read our documentation.

Try out ChainForge: chainforge.ai/play

*Note that you must be on a Chrome, Firefox, Edge, or Brave browser.*

We've made some Example Flows to get started (top-right corner). For instance, here is an example flow for evaluating model robustness to prompt injection attacks:

For any questions, comments, or feature requests, please submit an Issue on our GitHub, or submit a Google Form here.

Or... install ChainForge locally

The web version of ChainForge has a slightly limited feature set. For instance, in the full version you can load API keys from environment variables, write Python code to evaluate LLM responses, or query locally-run Alpaca/Llama models hosted via Dalai.

To install ChainForge on your machine, simply do:

pip install chainforge
chainforge serve

Open localhost:8000 in a Chrome, Firefox, Edge, or Brave browser.

What can I do with ChainForge?

Software built on LLM calls require one to verify the quality of outputs. ChainForge provides a suite of tools to evaluate and visualize prompt (and model) quality, with minimal effort by you. In other words, it aims to make evaluation of LLMs a piece of cake 🍰.

Everyday, developers on social media make claims about such-and-such prompt working for them. But these claims are anecdotal, with no data verifying robustness — no plots, no hard evidence, no way to verify that one model works better than another for your use case. What if you could know, precisely and in a split second, what prompt actually was the 'best'? And not only that, but which model had the most performant responses?

With ChainForge, out of the box, you can:

Curious? Visit our documentation.

Development + Contributing

ChainForge is in active development and is currently provided as an open beta test. We welcome and encourage contributors. If you'd like to contribute, just submit an Issue or fork the repository and make a Pull Request.

ChainForge was created by Ian Arawjo, a postdoctoral scholar at Harvard in the Glassman Lab of the Harvard HCI group. He is currently the lead developer. Ongoing collaborators include Elena Glassman, Martin Wattenberg, Priyan Vaithilingam, and Chelse Swoopes.


This work was partially funded by the NSF grants IIS-2107391, IIS-2040880, and IIS-1955699. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Copyright © Ian Arawjo