Code Clinic | Improving LLM Response Reliability (Part 1)
Where Trust meets Consistency - Exploring Techniques for Consistency and Accuracy
Introduction
LLMs are really powerful. however, their output is not really reliable. This non-determinism makes the integration of LLMs into real-world operations, especially for autonomous agents, really risky.
So what can you do to improve this?
As usual, the code example we will be using can be found on my GitHub
In this code clinic, we will be exploring two approaches that might help us get more reliable feedback for our apps.
In part 1 (this part) we will be evaluating how you can instruct LLMs to perform a labeling use case by the example of OpenAI’s API integration that is prompted to return structured feedback in JSON form.
Then in part 2, we will be exploring the same use case but we will be using Normal Computing’s “Outlines”
Use case
The goal is to build a sentiment data augmentation service that is given a restaurant review and the system returns a string that is either “positive”, “neutral”, or “negative”. Please note: We are not assessing the LLM’s capability to do high-quality sentiment analysis but we want to force the LLMs to return a defined format that makes it possible to integrate it into a workflow.
Let’s dive in with the prerequisites.
Prerequisites
Jupyter Notebook - https://jupyter.org/
Yet, I expect you already have this.
A Python kernel > 3.10 — If you don’t have it you can install it as seen below
You can check your version with these statements