Code Clinic | Improving LLM Response Reliability (Part 1)

Where Trust meets Consistency - Exploring Techniques for Consistency and Accuracy

Aug 21, 2023

∙ Paid

Introduction

LLMs are really powerful. however, their output is not really reliable. This non-determinism makes the integration of LLMs into real-world operations, especially for autonomous agents, really risky.

So what can you do to improve this?

As usual, the code example we will be using can be found on my GitHub

In this code clinic, we will be exploring two approaches that might help us get more reliable feedback for our apps.

In part 1 (this part) we will be evaluating how you can instruct LLMs to perform a labeling use case by the example of OpenAI’s API integration that is prompted to return structured feedback in JSON form.

Then in part 2, we will be exploring the same use case but we will be using Normal Computing’s “Outlines”

Use case

The goal is to build a sentiment data augmentation service that is given a restaurant review and the system returns a string that is either “positive”, “neutral”, or “negative”. Please note: We are not assessing the LLM’s capability to do high-quality sentiment analysis but we want to force the LLMs to return a defined format that makes it possible to integrate it into a workflow.

Let’s dive in with the prerequisites.

Prerequisites

Jupyter Notebook - https://jupyter.org/

Yet, I expect you already have this.
A Python kernel > 3.10 — If you don’t have it you can install it as seen below
You can check your version with these statements

Encyclopedia Autonomica

Code Clinic | Improving LLM Response Reliability (Part 1)

Where Trust meets Consistency - Exploring Techniques for Consistency and Accuracy

Introduction

Use case

Prerequisites

This post is for paid subscribers