Code Clinic | Building a YouTube Search Agent with Langchain Tools, Streamlit, and GPT-4
How to build an agent that uses Langchain's community tools to search videos.
In this code clinic, we will be building an agent that can run Search on YouTube using Langchain Runnables, Workflow Graphs, and CommunityTools.
What makes this write-up new is that I am applying three concepts that I have not used in my previous articles on Langchain (RAG-based investment agent, Agent with Memory) and Streamlit.
Before we jump into the code, just a few explanations and definitions of what these concepts are.
Runnables: One of the value propositions of Langchain is to make it easy to create custom chains. The Runnable protocol is a standard interface that standardizes the definition and invocation of custom chains.
RunnablePassthrough: Will pass inputs unchanged or with the addition of extra keys between parts of the custom chain
Workflow Graphs: Langgraph is a package that was launched in January 2024 allowing for multi-agent workflows. It’s similar to Microsoft’s Autogen. More here. Just one spec: Multi-agent is defined as “multiple independent actors powered by language models connected in a specific way”.
Community Tools: We already know much about tools and how our agents can use them to perform searches or other operations. Community tools are built by the community. In our example, we use YouTube search. Because it works and is a fun exercise.
Before we jump into the code exercise, let’s quickly outline the target picture.
Problem
The YouTube algorithm only shows me content in my echo chamber
Search on YouTube shows can’t be refined easily
Solution
A “chat with YouTube search” app that provides me a list of videos for a given search prompt.
The app consists of an agent that interacts between the user and YouTube to facilitate the search process and a streamlit website so the video can be clicked easily.
Prerequisites
The app only runs on Python3.10 or larger.
The app fails when using GPT-3.5. Only GPT-4 worked. I have not started to debug why that might be the case.
Obviously, you need your own OpenAI API key.
The cost per query is quite high. Therefore, a local LLM implementation might make more sense.
Implementation
For the implementation, I will cover the key concepts of how to create the agent.
We begin with the first concept, i.e., the creation of the agent.
# Get the prompt to use
prompt = hub.pull("hwchase17/openai-functions-agent")
# Choose the LLM that will drive the agent
llm = ChatOpenAI(model="gpt-4", openai_api_key=openai_api_key, temperature=0.1)
tools = [YouTubeSearchTool()]
agent_runnable = create_openai_functions_agent(llm, tools, prompt)
# Define the agent
agent = RunnablePassthrough.assign(agent_outcome = agent_runnable)
In this part of the code, we initialize our agent through the “create_openai_functions_agent” function. This function is part of the agent’s package from Langchain and receives a default initialization prompt from Chase’s repository.
We also provide the initialization function with an OpenAI object. This object is tasked with calling the OpenAI API. In this example, we use GPT-4.
The third parameter needed to initialize our agent is the array “tools”. However, it’s an array of length 1, since I only need to interact with YouTube and nothing else. There are other functions but they seem to be still quite buggy.
Based on the description it appears that “YouTubeSearchTool" scrapes the website and does not use the YouTube API, which might be the better implementation.
Once we have our runnable agent, we can place the agent in our chain that simply passes through information without altering it.
The next concept we explore is the “Graph” concept.
workflow = Graph()
workflow.add_node("agent", agent)
workflow.add_node("tools", execute_tools)
# Set the entrypoint as `agent`
workflow.set_entry_point("agent")
workflow.add_conditional_edges(
"agent",
should_continue,
{
"continue": "tools",
"exit": END
}
)
workflow.add_edge('tools', 'agent')
The Langchain team has implemented multi-agent workflows as a graph, which makes intuitive sense. We first create an empty Graph object, then add the agent and the “execute tools” function as nodes.
# Define the function to execute tools
def execute_tools(data):
agent_action = data.pop('agent_outcome')
tool_to_use = {t.name: t for t in tools}[agent_action.tool]
observation = tool_to_use.invoke(agent_action.tool_input)
data['intermediate_steps'].append((agent_action, observation))
print(data)
return data
Then we define where the graph traversal should commence. Once this is done we can set conditional and non-conditional edges between the nodes. Just looking at it from a research perspective, the implementation logic is really impressive.
The conditional edge takes a function as a parameter called “AgentFinish“ that depends on the agent condition that we have already learned about in prior articles.
def should_continue(data):
if isinstance(data['agent_outcome'], AgentFinish):
return "exit"
else:
return "continue"
If the agent is finished, the graph traversal should also end.
Once the final step is done, we compile the graph.
chain = workflow.compile()
For the streamlit implementation, I wrapped the chain creation in a function called “initialize”.
chain = initialize(openai_api_key)
The “initialize” function is the sum of the previously implemented parts and returns the chain to the streamlit app. The parameter we hand over is the OpenAI API key since we get this from the streamlit app configuration. If you don’t plan to do that, it might be simpler to not wrap it. The main reason I implemented it like this was that I could click on the YouTube link immediately as we are already on a website and don’t need to copy/paste it from the command shell.
if prompt := st.chat_input():
if not openai_api_key:
st.info("Please add your OpenAI API key to continue.")
st.stop()
st.session_state.messages.append({"role": "user", "content": prompt})
st.chat_message("user").write(prompt)
result = chain.invoke({"input": prompt, "intermediate_steps": []})
msg = result['agent_outcome'].return_values["output"]
Our chain is then invoked during the iteration loop of the app once a new prompt is given. The chain then takes the prompt, queries YouTube, and returns the result.
Summary
In conclusion, the project ran and provided search results from YouTube. So far, it was a fun little project for this weekend. Does it add value? It depends. I think the solution adds value if you want to have a chat-like search function for YouTube that you pay 10 cents per query for.
Besides that, the execution time is sometimes really slow. In some few cases, it took ~5 minutes to return a result. That of course doesn’t work for real-world use cases. I have not debugged where the delay is coming from. It could be either Langchain, OpenAI, or. YouTube.
Hope you found this tutorial enjoyable. Please like and subscribe.
The full code is behind this paywall below.