Code Clinic | Building an AI Investing Agent: A Step-by-Step Guide Using Langchain and RAG
Because we all like losing money.
One thing that all of us have dreamed of is an investment agent who is much better than us in trading, always makes the right decisions, and makes us money while we are sleeping. Of course, it is unlikely that such an agent might ever exist. However, while large hedge funds are building their arsenal of trading tools, we can do the same thing. And now that we have generative AI, it is actually quite easy to put components in place that might help you gain a better understanding of a certain investment idea and the news around it.
In this code clinic, we will be employing several components to build our Retrieval augmented generation (RAG) system.
Let’s dive in
Components
JSONLoader - The data I want to use are in a JSON file on my hard disk, so I simply want to load it from there. Langchain has powerful web loaders as well, but I found it easier for this exercise to work with a local file and then make it complicated later.
LLM - We will be using OpenAI’s GPT-3.5-turbo. Ideally, for cost reasons, we should be using a local LLM like Mistral or LLama, but for this exercise the OpenAI API shall do.
Vector store - We will be storing our embeddings here. I assume you already know embeddings, so I only let you know that again for simplicity reasons we will be storing the vectors in a folder on the file system and not in an optimized VectorDB like Pinecone.
Embedder - For the vector search to work, we need embeddings. Again for simplicity reasons, I will be using the OpenAI one, although it costs money.
Prompt - And finally of course we want our agent to solve a real-world problem. So I will hand over a couple of posts to the agent, and the agent has to summarize what happened.
Building the RAG
When building systems like these, it is always good practice to start with the data. In my case, I got the data from a MySQL database and exported them into JSON.