This issue is going to be rather short (-sighted) because I didn’t really plan ahead of time. But my mom’s in town!!! And I’m showing her around Utah, so we are out and about :)
I like to mix in various AI topics, but it’s hard to ignore the elephant in the room (LLMs). They are becoming omnipotent in the sense that it’s marching from a novel wonder to a ubiquitous commodity. And it is magical to think that the strength of LLMs lies in predicting the next token/word. Now it is being used to delegate and outsource complicated tasks. Think of this as a primer to the next issue because I would like to highlight the components which make these autonomous agents possible.
In this issue, I briefly touch upon the three main components of LLM Powered Autonomous Agents summarized by Lilian Weng. It’s a pretty long read (31 mins) but like her other articles, incredibly well-researched. (Definitely worth the read). It couldn’t have been published at a more opportune time for me because I was poking around the wonders of AutoGPT, GPT-Engineer, and BabyAGI, as well as had gotten some questions about how we can automate/outsource mundane migration tasks at work :)
LLMs are being used as a powerhouse to bifurcate a complex task into multiple simple tasks and to outsource and delegate the actual execution to outside agents (tools). They also have the capability to re-iterate the viability of decomposed tasks and come up with new ones if the previous execution warrants any. This is synonymous with how humans approach a given problem. The intermediary steps oftentimes need refining and addition.
The main three components of an autonomous agent controlled by an LLM are -
Planning
Memory
Agents (Tools)
Planning
Subgoal and Decomposition: A complicated task must be broken down into a simple plan - a to-do list. The agent can then approach these smaller, manageable sub-goals in a more efficient and organized manner.
Reflection and Refinement: This involves the agent being able to self-criticize and self-reflect upon its model outputs. The agent can then analyze its past actions, identify mistakes and areas of improvement, and learn from them. By doing so, it can enhance its final results.
Memory
Short-term memory: This memory is generally used for in-context learning. It is a restricted, finite context window length and can be used to carry out complex cognitive tasks such as learning and reasoning. Short-term memory refers to the model’s context-window length, for example -
LLaMA: 2048 tokens
GPT4: 32k tokens
MPT: 8k tokens
Long-term memory: This type of memory can be used to store facts and episodic memory i.e., events, experiences, etc. Usually, a vector database like Weaviate, Pinecone, or FAISS is used as an external vector embedding store. The agent can query and access information out of its context-window length with fast retrieval times.
Tools
Out-of-model data: In addition to the training data on which the model has been trained, oftentimes, it is expected to work with and produce the latest information. The agent can achieve this by calling external APIs.
Code execution capability: While we have seen how LLMs can decompose complex tasks and remember previous tasks, they can be more powerful if we allow them to wield external tools for executing the tasks. For example -
A wikibase agent can break down your natural language query and run an appropriate SPARQL query against the wikidata.org database, thus retrieving precise and correct information and answers to obscure questions.
MRKL was trained to use a general-purpose LLM to route the query to the best downstream suitable “expert” model, thus outsourcing the work to a well-trained domain-specific model.
TALM and Toolformer fine-tune an LLM to learn to use external APIs. While OpenAI API introduced function calling, which augments the LLM to use tools.
HuggingGPT also works on the same principles as MRKL, in which ChatGPT is used as a task planner to select the domain-specific model from the Hugging Face model hub to complete the task at hand.
Interesting things around the world 🌏
Whisper Web: You can now run OpenAI’s speech-to-text model client side right in your browser!
The Rise of the AI Engineer: Another great read about what an AI engineer warrants and how LLMs have shaped a completely new professional branch.
ChatGPT vs Minecraft: An interesting thread by Francois Chollet on how ChatGPT is being used for value-less or negative-value deliverables.
Wardley Map on Prompt Engineering: Observations through a Wardley map on how LLMs (OpenAI embeddings in the map) are evolving into a commodity and slowly becoming “invisible”.
I scroll through endless articles and Reddit and Twitter posts (so you don’t have to) and deliver the cream of the crop to you! If you would like a copy of this issue in your mailbox the next time I write, consider subscribing 🤗 Thanks for reading!