LLM Testing Lib

Objectives

Allow for easy testing of LLM calls
Speed up tests and save money of LLM calls by storing them in source

Concepts Applied

Use GPT-4 to assert that its own response meets a set of expectations
Store responses in source to increase test speed and save money

Playground

Clone this repo, experiment with the values in index.ts, then run npm run main to see how accurately your prompt meets a set of expectations. Ensure you have an environment variable set up for OPENAI_API_KEY and have access to GPT-4 via OpenAI's API.

The arguments for checkAccuracy are:

messages -- the history of messages, aka the conversation thus far. The last one will be the one being sent by the user
expectations -- an array of rules that you expect the response to meet
numAttempts -- depending on how accurate you need to be. If you need 99% accuracy, you would do (at least) 100 attempts
logFailures -- optional, best used when troubleshooting

Upcoming Improvements

List of what failed -- see which expectations failed (or why it failed from OpenAI's perspective)
UI+API for anyone to fork and deploy (perhaps using Next + Vercel)
Prompt comparison -- be able to compare accuracy of 2+ prompts, given a set of expectations
Change LLM's/models -- be able to change completion model and assertion model

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
lib		lib
.gitignore		.gitignore
README.md		README.md
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Testing Lib

Objectives

Concepts Applied

Playground

Upcoming Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rgpass/llm-testing-lib

Folders and files

Latest commit

History

Repository files navigation

LLM Testing Lib

Objectives

Concepts Applied

Playground

Upcoming Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages