Encyclopedia Autonomica

Encyclopedia Autonomica

Share this post

Encyclopedia Autonomica
Encyclopedia Autonomica
Code Clinic | Improving LLM Response Reliability (Part 1)

Code Clinic | Improving LLM Response Reliability (Part 1)

Where Trust meets Consistency - Exploring Techniques for Consistency and Accuracy

Jan Daniel Semrau (MFin, CAIO)'s avatar
Jan Daniel Semrau (MFin, CAIO)
Aug 21, 2023
∙ Paid
1

Share this post

Encyclopedia Autonomica
Encyclopedia Autonomica
Code Clinic | Improving LLM Response Reliability (Part 1)
1
Share

Introduction

LLMs are really powerful. however, their output is not really reliable. This non-determinism makes the integration of LLMs into real-world operations, especially for autonomous agents, really risky.

So what can you do to improve this?

As usual, the code example we will be using can be found on my GitHub

Encyclopedia Autonomica is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this code clinic, we will be exploring two approaches that might help us get more reliable feedback for our apps.

In part 1 (this part) we will be evaluating how you can instruct LLMs to perform a labeling use case by the example of OpenAI’s API integration that is prompted to return structured feedback in JSON form.

Then in part 2, we will be exploring the same use case but we will be using Normal Computing’s “Outlines”

Use case

The goal is to build a sentiment data augmentation service that is given a restaurant review and the system returns a string that is either “positive”, “neutral”, or “negative”. Please note: We are not assessing the LLM’s capability to do high-quality sentiment analysis but we want to force the LLMs to return a defined format that makes it possible to integrate it into a workflow.

Let’s dive in with the prerequisites.

Prerequisites

Jupyter Notebook - https://jupyter.org/

  • Yet, I expect you already have this.

  • A Python kernel > 3.10 — If you don’t have it you can install it as seen below

    You can check your version with these statements

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 JDS
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share