Node and Features Guide
An overview of all nodes currently available in ChainForge and their features.
Data Loaders | Prompters | Evaluators | Processors | Visualizers | Misc/Other |
---|---|---|---|---|---|
TextFields Node | Prompt Node | Code Evaluator Node | Code Processor | Vis Node | Comment Node |
Tabular Data Node | Chat Turn Node | LLM Scorer Node | Join Node | Inspect Node | Global Python Scripts Node |
Items Node | Simple Evaluator | Split Node | |||
Multi-Evaluator |
Data Loaders
TextFields Node
Text fields provide a way to define input values to prompt parameters. Each text field counts as a single input value to a prompt template.
Click the +
button to add a text field:
Template chaining
Text fields are unique in that fields may themselves be prompt templates. Add a template variable with {}
, and an input hook will appear:
Chain prompt templates together to, for instance, test what the best prompt template is for your use case. All prompt variables will be accessible later on in an evaluation chain, including the templates themselves.
You can also remove fields by clicking the small X
, or disable fields temporarily by clicking the eye icon:
After making any changes to text fields, you'll need to rerun any output node(s) to update results.
Items Node
Create a comma-separated list of values to input into a prompt parameter:
You can escape ,
by enclosing values in quotes, e.g. "this,is,an,example"
.
Like Tabular Data nodes, data in Items Nodes are never treated as prompt templates. Items Nodes
are particularly useful when you have many short words or phrases to use as input.
Tabular Data Node
Tabular data allows you to import or define spreadsheet data. You can press Import Data to import spreadsheets with format jsonl
, xlsx
, and csv
.
Note
Imported spreadsheets must have a header row with column names.
Tabular data provides an easy way to enter associated prompt parameters or import existing datasets and benchmarks. A typical use case is ground truth evaluation, where we have some inputs to a prompt, and an "ideal" or expected answer:
Here, we see variables {first}
, {last}
, and {invention}
"carry together" when filling the prompt template: ChainForge knows they are all associated with one another, connected via the row. Thus, it constructs 4 prompts from the input parameters. This is different than using separate Textfields nodes as input, which will calculate the cross product of all inputs (as described in Prompt Node above). For more detail go to the Prompt Templating page.
You can change cell text simply by editing it. To insert or delete a row, right-click on a row cell:
To insert a column, rename or delete one, click on the column ...
button:
Unlike TextFields nodes, Tabular Data is never treated as a prompt template (i.e., no input handles will be created if you use braces in a cell). All braces {}
in the output of Tabular Data nodes are automatically escaped.
Tabular data is particularly powerful for passing metavariables as columns, which may be used later on in an evaluation flow. More information on this feature can be found in Evaluating Responses and Prompt Templating.
Random sampling from a spreadsheet
Have a table with 100s of rows, but you don't want to send that many queries? Toggle the Sample switch:
You can change the number to control how many items to sample. Rows from your table are randomly sampled with equal probability. To refresh and sample new items, just toggle the switch off and on.
Prompters
Prompt Node
The prompt node is the heart of ChainForge. Prompt Nodes allow you to query one or multiple LLMs with a prompt or prompt template. For example, below is a prompt node (right) with one input parameter and two queried LLMs, GPT3.5 and GPT4:
The Prompt Node has one TextFields node (left) attached as input data. A prompt template has been written in the Prompt Node's text field, using {}
template hooks to declare an input variable, game
. The game
handle is attached to a TextFields node as input.
When multiple LLMs are present, ChainForge queries all LLMs simultaneously with the same prompts. To add a model, click Add +
to open a drop-down list of providers. Once added, you can click Settings icon to adjust settings, or the Trash icon to remove it. You can add multiple of the same model at different settings. See Supported Model Providers for what models are currently supported.
Prompt Nodes also allow you to request multiple responses per prompt. Just adjust Num responses per prompt
to sample more than 1 response for every prompt to every LLM.
When you are ready to query LLMs, hover over the Run button:
A tooltip will provide feedback on how many responses it will send (sometimes this can be considerable if you provide multiple input combinations). If you are sure, press Run:
ChainForge will now query all LLMs at once, simultaneously, within reasonable rate limits per model provider, and provides live feedback on its current progress.
Note
Once requests are sent, you cannot currently stop them mid-way through. However, all responses from an LLM are cache'd the moment ChainForge receives them, so you won't lose money if something goes wrong. Due to the asynchronous nature of API requests, sometimes requests may fail to complete or hang. If you get stuck or want to stop sending requests, export your flow, close your browser window and re-import it. We are working on improving the user experience in the future.
Prompt chaining
You can chain prompts together with template variables:
Doing so feeds all responses from the first prompt node into the second one. You'll also see a toggle that lets you continue in parallel across all prior LLMs, or prompt new LLMs.
Note that prompt chaining differs from chat. If you'd like to continue a chat conversation and pass the previous messages as context, see Chat Turn nodes below.
Note
Each set of queried LLMs (on the first prompt node, second, etc) remains accessible as metavariables. For more information, see the Visualizing Results: LLM Sets.
Chat Turn Node
Prompt Nodes only work on a single 'turn'. But what if you want to continue the conversation? What if you want to evaluate chats, not just prompts?
With Chat Turn nodes, you can continue conversations by passing chat context. You can:
- Continue multiple conversations simultaneously across multiple LLMs
- Template chat messages, just like prompts, and
- Start a conversation with one LLM, and continue it with a different LLM
Chat Turns work by connecting the output of an initial Prompt Node to the 'Past Conversation' input of the Chat Turn node:
Chat Turns include a toggle of whether you'd like to continue chatting with the same LLMs, or query different ones, passing chat context to the new models.
Above, I've first prompted four chat models: GPT3.5, GPT-4, Claude-2, and PaLM with the question: What was the first {game} game?
. Then I ask a follow-up question, What was the second?
By default, Chat Turns continue the conversation with all LLMs that were used before, allowing you to follow-up on LLM responses in parallel.
You can also toggle the 'continue conversations' off if you want to query different models. With this, you can start a conversation with one LLM and continue it with another (or several):
Finally, you can do everything you can with Chat Turns that you could with Prompt Nodes, including prompt templating and adding input variables. For instance, here's a prompt template as a follow-up message:
Node
In fact, Chat Turns are merely modified Prompt Nodes, and use the underlying PromptNode
class.
Supported chat models
Chat history is automatically translated to the appropriate format for all supported providers. For HuggingFace models, you need to set 'Model Type' in Settings to 'chat', and choose a Conversation model or custom endpoint. (Currently there's only one chat model listed in ChainForge dropdown: microsoft/DialoGPT
. Go to the HuggingFace site to find more!)
Evaluators
Evaluators attach a score to each LLM response based on user-defined criteria.
Code Evaluator Nodes
Score responses by writing an evaluate function in Python or JavaScript. This section will refer to Python evaluator, but the JavaScript evaluator is similar.
To use a code evaluator, you must declare a def evaluate(response)
function which will be called by ChainForge for every response in the input.
You can add other helper functions or import
statements as well. For more details, see the Evaluating Responses page.
For instance, here is a basic evaluator to check the length of the response:
The response
argument is a ResponseInfo
object.
LLM Scorer Node
An LLM Scorer uses a single model to score responses (by default GPT-4 at temperature 0). You must write a scoring prompt that includes the expected format of output (e.g., "Reply true or false."). The text of the input will be pasted directly below your prompt, in triple-` tags.
For instance, here is GPT-4 scoring whether Falcon-7b's responses to math problems are true:
We've used an implicit template variable, {#Expected}
, to use the metavariable "Expected" associate with each response (from the table to the left).
Note
You can also use LLMs to score responses through prompt chaining. However, this requires running outputs through a code evaluator node. The LLM Scorer simplifies the process by attaching LLM scores directly as evaluation results, without modifying what LLM generated the response.
Simple Evaluator Node
Conduct a simple boolean check on LLM responses against some text, a variable value, or a metavariable value, without needing to write code.
Operations include contains
, starts with
, ends with
, equals
, and appears in
. For more info, see Evaluating Responses.
Multi-Evaluator Node
Create multiple criteria to evaluate a set of responses, with a mix of code-based and LLM-based evaluators:
Table View of response inspectors can plot on a per-criteria basis:
The code and LLM evaluators inside a Multi-Eval have exactly the same features as their node counterparts.
Multi-Evaluators are in beta, and don't yet include all features. In particular, there is no "output" handle to a Multi-Eval node, since Vis Nodes do not yet support criteria-level plotting. We will also be adding an AI-assisted eval generation interface, EvalGen, in the coming weeks.
Processors
Processor nodes take in LLM responses or input texts, and transform them.
Code Processor Nodes
Code Processors (available in JavaScript and Python) transform response text according to a function defined in code. For instance, you might want to extract only part of a response for further processing down a chain:
Code Processors change the response, while Evaluators merely score the response. Processors are a destructive operation, which transform the text in each response according to the given processing function. (If you want to score responses such that you can still inspect what each response was, you are better off using an Evaluator node.)
Click the "i" info button in the top-right of the node to learn more. Currently, processors only allow you to process one response at a time, and only change the text (not any metadata or vars). If you want more power, raise an Issue on our GitHub.
Join Node
The Join Node concatenates LLM responses or input text, based on a user-specific join operation. You can join within LLMs, or across them.
For instance, here we've asked GPT3.5 and PaLM2 to translate different fruits from English into French. Each fruit was a separate value of a template variable, {fruit}. We can join all responses together into a list, within each LLM queried:
You can also join by specific variables, or change the formatting. Formats include double newline, newline, dashed list, numbered list, and array of strings (in JavaScript-style format).
Split Node
The Split Node takes LLM responses or text data as input, and splits each text into multiple. By default, it splits on markdown list items (in numbered 1. or dashed - format). For instance, here it splits a single response from GPT-3.5:
You can instead split on newlines, double newlines, commas, code blocks, or paragraphs. Code blocks is especially useful to extract out any code in the response for further processing.
For code blocks and paragraph extraction, all other text, outside of what is split, will be ignored.
Visualizers
Vis Node
Visualize evaluation scores with a plot that makes sense for the shape of the input data.
To plot data, attached the output of an Evaluator node to a Vis Node. The output you see
will depend on the shape of your input data (see below). Use the MultiSelect
at the top to select the prompt parameters you're interested in.
For instance, in basic_comparison.cforge
in examples/
, we can plot length of response by {game}
across LLMs:
Or maybe we don't care about the {game}
parameter, only the overall length of responses per LLM to get a sense of the response complexity.
We can plot this by simply removing the parameter:
Note Currently, you can only attach Evaluator nodes to Vis Nodes. This may change in the future.
For more information, visit the Visualizing Results page.
Inspect node
Inspect responses by attaching an Inspect node to Prompt or Evaluator nodes. Group responses by input variables or LLMs, at arbitrary depth:
Use Export Data
to export the data as an Excel xlsx
file.
If you've scored responses with an evaluator node, this exports the scores as well.
To learn more about this feature, visit the Inspecting Responses page.
Misc
Global Python Scripts
If you wish to write more extensive evaluator scripts, you can use the Global Python Scripts node to import local evaluator scripts.
Comment Node
Include notes in your workflow using the Comments node!