Conversational AI & The Future of CX Agents

A look into how we are building AI Agents that are shaping the future of CX

Conversational AI & Autonomous Agents

Without giving away too much of our secret sauce, I think there is a ton of value in sharing how we think about designing, building, deploying, fine-tuning and supporting autonomous operations and AI agents. I wanted to write a quick overview on how we are reasoning about conversational AI and autonomous agents.

Conversational AI as an interface is dependent on intent detection, parameter / entity detection, actions, knowledge bases, memory and state. If we think about the von Neumann architecture aspects of a computational machine we have:

1. A Central Processing Unit (CPU) that includes an arithmetic logic unit (ALU) and processor registers.

2. A Control Unit (CU) that interprets instructions from memory and executes them.

3. Memory that stores both data and instructions.

4. Input and output mechanisms for transferring data to and from external environments.

5. A common bus system that allows data and instructions to be transferred among the various components of the computer.

How can we build an autonomous agent based on similar principles with the latest LLMs? I think an AI Agent is a new form of computer comprised of the following components:

Central Processing Unit

In an AI Agent, we have the CPU which is ultimately the LLM (Large Language Model). I would consider this to actually be the central prediction unit. It is predicting the next token based on the information it has been given, the context and it’s training for a specific task. It’s the most profound and consequential component of the agent and the basis of the latest advancements we have seen. It’s the generative aspect that give us the magic we see across multiple modalities. We take a massive corpus of text (all of written human thought in the internet) and compress it into a model that can be invoked.

Memory

We have memory, in the case of AI Agents, I consider this to be search in vector space combined with the corpus of data that the model has been trained. When we think about vector space / vector search in the context of AI agents; we are determining how to reason about a specific set of of numerical representations in vector space. Vector Database gives us the search aspect we need for the AI to have context based on the companies specific data. This vector space is storing the individual chunks of texts that the AI Agent should have access to. When we take the fingerprint of a piece of data, otherwise known as an embedding of a piece of text, we are matching that to the topK results from the vector space. Lets say topK = 3. We want to search and retrieve the top 3 results that are corresponding to the users question. What matches? What is relevant? What is the answer? What are the rules? Then use this information to craft the perfect response to the customer.

Input / Output Channels

We have input / output. This is basically the channels that the input is coming into the Agent. This could be in the form of a ticket, a message body and the I/O is basically the way this message gets sent to the agent, how it’s processed and how it’s sent back to the user. An AI Agent has an I/O of input messages comprised of the system prompt, the conversation history, the context, the retrieved top 3 context results and the functions available to the model. The output of this result is the prediction. It is the message sent back to the customer / user in whatever conversational interface or channel they are using.

State / Common Bus System

I think state is corresponding to the current place in the set a states an agent could be processing, it could be interpreting the input, making an API call, generating a confirmation message, retrieving a refresh / access token, generating a response, sending the response back to a specific channel; these are all individual steps in the process of the agent. How do we handle these steps in a deterministic way so that it could be replayed on failure, retried, executed again. These are all things we think about when designing the agents workflow architecture & process. We use a deterministic workflow engine called Temporal to accomplish this. It is like the actor or the internal system that is organizing the different activities, workflows and schedules in a way for the AI to operate in a way where we have certain guarantees that actions will be completed and observability into each action the agent is taking.

ReSponse Abstractions

These in my opinion are the main components needed. So, now that we have the comparison to one of the fundamental computing architectures of the last century; how do we create the abstractions around these types of components of this new conversational computer? Is there a framework we can create so that conversational AI Agents can truly autonomously work while maintaining accuracy and quality? Here at the ways we reason about it at StateSet with ReSponse CX:

  • Knowledge Base

  • Rules Engine

  • Function Calling / Actions

  • Schedules

  • Integrations

Knowledge Base

ReSponse can answer based on your company macros, FAQs, knowledge base and more. ReSponse can personalize responses using the knowledge base to add your brand’s tone of voice with examples from your team. Our interface makes it easy to add / update knowledge easily using chat in the ReSponse interface. We have built a set of APIs that allow us to easily add / update and test the knowledge base of the AI agent; and we have built an interface to see the list of knowledge base embeddings that exist for your organization.

Rules Engine

ReSponse has a declarative Rules engine that gives us the ability to craft system prompt rules without have to custom code it each time. We can give rules that are in the context for every single response generation. We can give examples and we can make sure these rules are always in the context (not just from the top 3 results from knowledge). The Rules engine is a key abstraction in our framework to mitigate hallucinations, provide advanced configurations, give things like signatures and more. It gives us the ability to have more control and helps us fine-tune with precision. Rules come with activation booleans, versioning and more to help with keeping track of whats being used to generate responses.

Function Calling / Actions

Function calling / actions can automated for operational workflows based on your organizations data. This is done by detecting the intent of the user in the ticket message and detecting parameters such as the order number, email address or line item. These actions can be automated in systems such as Shopify, Stay.ai, ReCharge, ShipStation and more to make state changes in these systems eg. Cancel an Order, Add a Tag / Remove a Tag, Cancel a Subscription, Pause or Skip a Month instead and more. We can also provide an automated message based on your pre-approved copy / macros that can be sent to the customer to confirm the change in the system. This automation gives your team back time to focus on more complex / nuanced requests that require additional addition such as Shipping Address changes and more. Function Calling is a powerful feature built into GPT4 and makes the AI agents that much more impactful for CX Operations.

Schedules

For certain channels we want to wait 5 minutes before sending an eMail response, to answer in a less robotic way. With ReSponse we can add schedules so that the agent is multi-threaded. If you want to have the AI interpret a message generate the response and wait a couple hours before sending we can await / sleep the AI for that particular thread and then have it perform the rest of the workflow when it is supposed. This is a very nuanced but important step for production AI Agents that need to have different response times for different channels.

Integrations

How do we pull information into the context from systems of record like Shopify, Gorgias, Stay.ai and more. How do pull in the order number for the customer so it can be used in the response? How do we make the API callout to cancel an order or a susbcription? The AI Agent has to have access to the systems for read / write capabilities to autonomously make state changes in the systems it is working with. The integrations are key to making the AI Agent not just give responses but actually take action on behalf of the customer in the systems that matter most.

Couldn’t we just use a Custom GPT?

The problem with this custom GPT is it doesn’t have access to the customer’s Shopify store or the ability to cancel their order. Why? Because we have to authenticate with the merchant’s store in order to cancel their order… GPT Actions should be able to handle this right? Custom GPTs support authenticated APIs. Well we would need to add each customers Shopify Access Token and their Shopify Store URL; in practice could this even work on a Custom GPT at scale? To solve for this ReSponse has an App on the Shopify App Store which gives us these permissions to make order write changes. But maybe the answer is yes… and there could be a way to onboard merchants for this custom GPT to improve their CX for things like Orders, Subscriptions and Returns. But what about scheduling the response? What about connecting it their DMs and helpdesk with their other agents are? What about scheduling the response based on the channel? What about other custom logic like filtering / only responding to specific requests by channel / tag? The control and configurability of ReSponse combined with the power of the GPT4 Turbo API give us a more advanced way to develop and deploy agents.

Our Thesis and The Outcomes

In the current state of the DTC eCommerce market, the brands who are putting the most emphasis on optimizing the customer experience are the ones who are gearing themselves up for exponential long-term growth and world-wide brand recognition. On top of that, as Artificial Intelligence continues to rapidly advance, the DTC brands that have successfully begun to leverage Artificial Intelligence to elevate their customer experience are seeing results that no one else is seeing from a traditional customer support team. This leads to:

  • Increase Customer Satisfaction (CSAT)

  • Decrease your Time to Resolution

  • Decrease your Cost-per-Ticket

  • Increase your Operational Efficiency

  • Decreases in Subscription Churn

  • Decreases in Return / Exchange Cycle Time

And leading brands are doing this:

  • Without spending tons of money on internal developers or an ML team

  • Without spending on AI solutions that charge on usage base (not scalable)

  • Without needing to overspend on offshore customer support employees

  • Without needing to purchase a bunch of automation tools & workflows

In Summary

AI Agents and Conversational AI are an exciting development in the world of software and the history of computing. We at StateSet are excited about building the best autonomous operations platform and developing the best AI Agents in the DTC industry. We are doing this at scale with brands processing thousands of tickets per month completely autonomously using our technology stack. For more info on what we are building reach out at [email protected] or setup time for demo here.