Node and Features Guide

An overview of all nodes currently available in ChainForge and their features.

Data Loaders	Prompters	Evaluators	Processors	Visualizers	Misc/Other
TextFields Node	Prompt Node	Code Evaluator Node	Code Processor	Vis Node	Comment Node
Tabular Data Node	Chat Turn Node	LLM Scorer Node	Join Node	Inspect Node	Global Python Scripts Node
Items Node		Simple Evaluator	Split Node
		Multi-Evaluator

Data Loaders

TextFields Node

Text fields provide a way to define input values to prompt parameters. Each text field counts as a single input value to a prompt template. Click the + button to add a text field:

text-field-update

Template chaining

Text fields are unique in that fields may themselves be prompt templates. Add a template variable with {}, and an input hook will appear:

prompt-tempt-as-field

Chain prompt templates together to, for instance, test what the best prompt template is for your use case. All prompt variables will be accessible later on in an evaluation chain, including the templates themselves.

template-chaining

You can also remove fields by clicking the small X, or disable fields temporarily by clicking the eye icon:

edit-textfield

After making any changes to text fields, you'll need to rerun any output node(s) to update results.

Items Node

Create a comma-separated list of values to input into a prompt parameter:

Screen Shot 2023-05-22 at 1 48 45 PM

You can escape , by enclosing values in quotes, e.g. "this,is,an,example". Like Tabular Data nodes, data in Items Nodes are never treated as prompt templates. Items Nodes are particularly useful when you have many short words or phrases to use as input.

Tabular Data Node

Tabular data allows you to import or define spreadsheet data. You can press Import Data to import spreadsheets with format jsonl, xlsx, and csv.

Note

Imported spreadsheets must have a header row with column names.

Tabular data provides an easy way to enter associated prompt parameters or import existing datasets and benchmarks. A typical use case is ground truth evaluation, where we have some inputs to a prompt, and an "ideal" or expected answer:

Screen Shot 2023-06-10 at 2 23 13 PM

Here, we see variables {first}, {last}, and {invention} "carry together" when filling the prompt template: ChainForge knows they are all associated with one another, connected via the row. Thus, it constructs 4 prompts from the input parameters. This is different than using separate Textfields nodes as input, which will calculate the cross product of all inputs (as described in Prompt Node above). For more detail go to the Prompt Templating page.

You can change cell text simply by editing it. To insert or delete a row, right-click on a row cell:

tabular-data-row-dropdown

To insert a column, rename or delete one, click on the column ... button:

tabular-data-col-dropdown

Unlike TextFields nodes, Tabular Data is never treated as a prompt template (i.e., no input handles will be created if you use braces in a cell). All braces {} in the output of Tabular Data nodes are automatically escaped.

Tabular data is particularly powerful for passing metavariables as columns, which may be used later on in an evaluation flow. More information on this feature can be found in Evaluating Responses and Prompt Templating.

Random sampling from a spreadsheet

Have a table with 100s of rows, but you don't want to send that many queries? Toggle the Sample switch:

Random sampling

You can change the number to control how many items to sample. Rows from your table are randomly sampled with equal probability. To refresh and sample new items, just toggle the switch off and on.

Prompters

Prompt Node

The prompt node is the heart of ChainForge. Prompt Nodes allow you to query one or multiple LLMs with a prompt or prompt template. For example, below is a prompt node (right) with one input parameter and two queried LLMs, GPT3.5 and GPT4:

text-field-node

The Prompt Node has one TextFields node (left) attached as input data. A prompt template has been written in the Prompt Node's text field, using {} template hooks to declare an input variable, game. The game handle is attached to a TextFields node as input.

When multiple LLMs are present, ChainForge queries all LLMs simultaneously with the same prompts. To add a model, click Add + to open a drop-down list of providers. Once added, you can click Settings icon to adjust settings, or the Trash icon to remove it. You can add multiple of the same model at different settings. See Supported Model Providers for what models are currently supported.

Prompt Nodes also allow you to request multiple responses per prompt. Just adjust Num responses per prompt to sample more than 1 response for every prompt to every LLM.

When you are ready to query LLMs, hover over the Run button:

hover-prompt-node

A tooltip will provide feedback on how many responses it will send (sometimes this can be considerable if you provide multiple input combinations). If you are sure, press Run:

prompt-node-run

ChainForge will now query all LLMs at once, simultaneously, within reasonable rate limits per model provider, and provides live feedback on its current progress.

Note

Once requests are sent, you cannot currently stop them mid-way through. However, all responses from an LLM are cache'd the moment ChainForge receives them, so you won't lose money if something goes wrong. Due to the asynchronous nature of API requests, sometimes requests may fail to complete or hang. If you get stuck or want to stop sending requests, export your flow, close your browser window and re-import it. We are working on improving the user experience in the future.

Prompt chaining

You can chain prompts together with template variables:

chaining-prompts

Doing so feeds all responses from the first prompt node into the second one. You'll also see a toggle that lets you continue in parallel across all prior LLMs, or prompt new LLMs.

Note that prompt chaining differs from chat. If you'd like to continue a chat conversation and pass the previous messages as context, see Chat Turn nodes below.

Note

Each set of queried LLMs (on the first prompt node, second, etc) remains accessible as metavariables. For more information, see the Visualizing Results: LLM Sets.

Chat Turn Node

chat-turn-node

Prompt Nodes only work on a single 'turn'. But what if you want to continue the conversation? What if you want to evaluate chats, not just prompts?

With Chat Turn nodes, you can continue conversations by passing chat context. You can:

Continue multiple conversations simultaneously across multiple LLMs
Template chat messages, just like prompts, and
Start a conversation with one LLM, and continue it with a different LLM

Chat Turns work by connecting the output of an initial Prompt Node to the 'Past Conversation' input of the Chat Turn node:

Screen Shot 2023-07-25 at 6 39 45 PM

Chat Turns include a toggle of whether you'd like to continue chatting with the same LLMs, or query different ones, passing chat context to the new models. Above, I've first prompted four chat models: GPT3.5, GPT-4, Claude-2, and PaLM with the question: What was the first {game} game?. Then I ask a follow-up question, What was the second? By default, Chat Turns continue the conversation with all LLMs that were used before, allowing you to follow-up on LLM responses in parallel.

You can also toggle the 'continue conversations' off if you want to query different models. With this, you can start a conversation with one LLM and continue it with another (or several):

Screen Shot 2023-07-25 at 12 46 52 PM

Finally, you can do everything you can with Chat Turns that you could with Prompt Nodes, including prompt templating and adding input variables. For instance, here's a prompt template as a follow-up message:

Screen Shot 2023-07-25 at 1 22 15 PM

Node

In fact, Chat Turns are merely modified Prompt Nodes, and use the underlying PromptNode class.

Supported chat models

Chat history is automatically translated to the appropriate format for all supported providers. For HuggingFace models, you need to set 'Model Type' in Settings to 'chat', and choose a Conversation model or custom endpoint. (Currently there's only one chat model listed in ChainForge dropdown: microsoft/DialoGPT. Go to the HuggingFace site to find more!)

Evaluators

Evaluators attach a score to each LLM response based on user-defined criteria.

Code Evaluator Nodes

Score responses by writing an evaluate function in Python or JavaScript. This section will refer to Python evaluator, but the JavaScript evaluator is similar.

To use a code evaluator, you must declare a def evaluate(response) function which will be called by ChainForge for every response in the input. You can add other helper functions or import statements as well. For more details, see the Evaluating Responses page.

For instance, here is a basic evaluator to check the length of the response:

python-eval-node

The response argument is a ResponseInfo object.

LLM Scorer Node

An LLM Scorer uses a single model to score responses (by default GPT-4 at temperature 0). You must write a scoring prompt that includes the expected format of output (e.g., "Reply true or false."). The text of the input will be pasted directly below your prompt, in triple-` tags.

For instance, here is GPT-4 scoring whether Falcon-7b's responses to math problems are true:

Screen Shot 2023-08-01 at 11 30 01 AM

We've used an implicit template variable, {#Expected}, to use the metavariable "Expected" associate with each response (from the table to the left).

Note

You can also use LLMs to score responses through prompt chaining. However, this requires running outputs through a code evaluator node. The LLM Scorer simplifies the process by attaching LLM scores directly as evaluation results, without modifying what LLM generated the response.

Simple Evaluator Node

Conduct a simple boolean check on LLM responses against some text, a variable value, or a metavariable value, without needing to write code.

simple-eval-node

Operations include contains, starts with, ends with, equals, and appears in. For more info, see Evaluating Responses.

Multi-Evaluator Node

Create multiple criteria to evaluate a set of responses, with a mix of code-based and LLM-based evaluators:

multi-eval-node

Table View of response inspectors can plot on a per-criteria basis:

multi-eval-table

The code and LLM evaluators inside a Multi-Eval have exactly the same features as their node counterparts.

Multi-Evaluators are in beta, and don't yet include all features. In particular, there is no "output" handle to a Multi-Eval node, since Vis Nodes do not yet support criteria-level plotting. We will also be adding an AI-assisted eval generation interface, EvalGen, in the coming weeks.

Processors

Processor nodes take in LLM responses or input texts, and transform them.

Code Processor Nodes

Code Processors (available in JavaScript and Python) transform response text according to a function defined in code. For instance, you might want to extract only part of a response for further processing down a chain:

process-node

Code Processors change the response, while Evaluators merely score the response. Processors are a destructive operation, which transform the text in each response according to the given processing function. (If you want to score responses such that you can still inspect what each response was, you are better off using an Evaluator node.)

Click the "i" info button in the top-right of the node to learn more. Currently, processors only allow you to process one response at a time, and only change the text (not any metadata or vars). If you want more power, raise an Issue on our GitHub.

Join Node

The Join Node concatenates LLM responses or input text, based on a user-specific join operation. You can join within LLMs, or across them.

For instance, here we've asked GPT3.5 and PaLM2 to translate different fruits from English into French. Each fruit was a separate value of a template variable, {fruit}. We can join all responses together into a list, within each LLM queried:

split-node

You can also join by specific variables, or change the formatting. Formats include double newline, newline, dashed list, numbered list, and array of strings (in JavaScript-style format).

Split Node

The Split Node takes LLM responses or text data as input, and splits each text into multiple. By default, it splits on markdown list items (in numbered 1. or dashed - format). For instance, here it splits a single response from GPT-3.5:

split-node

You can instead split on newlines, double newlines, commas, code blocks, or paragraphs. Code blocks is especially useful to extract out any code in the response for further processing.

For code blocks and paragraph extraction, all other text, outside of what is split, will be ignored.

Visualizers

Vis Node

Visualize evaluation scores with a plot that makes sense for the shape of the input data.

To plot data, attached the output of an Evaluator node to a Vis Node. The output you see will depend on the shape of your input data (see below). Use the MultiSelect at the top to select the prompt parameters you're interested in. For instance, in basic_comparison.cforge in examples/, we can plot length of response by {game} across LLMs:

Screen Shot 2023-05-22 at 1 02 02 PM

Or maybe we don't care about the {game} parameter, only the overall length of responses per LLM to get a sense of the response complexity. We can plot this by simply removing the parameter:

Screen Shot 2023-05-22 at 1 02 07 PM

Note Currently, you can only attach Evaluator nodes to Vis Nodes. This may change in the future.

For more information, visit the Visualizing Results page.

Inspect node

Inspect responses by attaching an Inspect node to Prompt or Evaluator nodes. Group responses by input variables or LLMs, at arbitrary depth:

Screen Shot 2023-05-19 at 4 14 38 PM

Use Export Data to export the data as an Excel xlsx file. If you've scored responses with an evaluator node, this exports the scores as well. To learn more about this feature, visit the Inspecting Responses page.

Misc

Global Python Scripts

If you wish to write more extensive evaluator scripts, you can use the Global Python Scripts node to import local evaluator scripts.

global python scripts

Comment Node

Include notes in your workflow using the Comments node!

comment