Love, Lollipops, and LLMs

Arte Merritt
10 min readFeb 14, 2024

Comparing Gen AI models on Valentine’s day

Generative AI is incredibly hot right now. Companies of all sizes are experimenting with Generative AI to increase productivity, optimize products and processes, reduce costs, and innovate.

One of the challenges with Gen AI can be selecting the right model for a particular use case. There are a lot of factors to consider including the quality and accuracy of the responses, the costs, and latency, as well as data privacy and security handling.

Once selected, there are different approaches to optimize including prompt engineering or fine-tuning a model.

In honor of Valentine’s day, we compared popular large language models (“LLMs”) on their responses to love.

What is Love?

Haddaway, and many others, have asked this question since the dawn of time.

We posed the question “What is love?” to Anthropic’s Claude-2.1, Cohere’s Command, Meta Llama-2–13b-chat, Mistral mistral-medium, OpenAI GPT-4, and Perplexity’s pplx-7b-chat. In addition to the prompt, the max tokens were set at 300.

Anthropic Claude 2.1

I don’t have a definitive answer on the meaning of love. As an AI assistant created by Anthropic to be helpful, harmless, and honest, I don’t have personal experiences of love. Love is a complex human emotion with many differing philosophical, psychological, and biological perspectives and interpretations.

Cohere Command

Love is a complex emotion felt between two people. It is a strong, passionate, and unconditional emotion felt between two people who are attracted to each other. It is a deep, enduring emotion that can last a lifetime if the parties involved are willing to put in the effort. It is an emotion that requires two people to commit and willingness to embrace the good and the bad from the other person. It is an emotion that involves a strong affection for another person, and a desire to make a sacrifice for their benefit. It is an emotion that requires vulnerability, honesty, and openness with your significant other. Would you like to know more about love and the feeling associated with it?

Meta llama2–13b-chat

It is a question that has puzzled philosophers and theologians for centuries. Love is a complex emotion that encompasses a range of feelings, including affection, attachment, and devotion. In this essay, I will explore the concept of love and its various forms, including romantic love, familial love, and self-love. Romantic love is perhaps the most widely discussed form of love. It is characterized by intense feelings of attraction, passion, and commitment towards another person. Romantic love is often associated with the idea of “the one,” the belief that there is one perfect person out there who is meant to be our partner. This type of love is often portrayed in movies, books, and music as a grand, all-consuming force that sweeps us off our feet and changes our lives forever. However, romantic love is not the only form of love. Familial love, for example, is the love that we feel towards our family members, such as our parents, siblings, and children. This type of love is often unconditional and selfless, as we are willing to make sacrifices for our family without expecting anything in return. Familial love is an important aspect of our lives, providing us with a sense of belonging and security. Self-love is another important form of love. It is the love that we have for ourselves, our

Mistral Mistral-Medium

Love is a complex and multifaceted emotion that can be difficult to define. It is often described as a strong affection or attachment towards someone or something, characterized by feelings of warmth, tenderness, and deep caring. Love can take many forms, including romantic love, platonic love, familial love, and self-love. It may involve a desire for closeness and intimacy, as well as a willingness to put the needs and happiness of the loved one above one’s own. Ultimately, the experience of love is highly individual and can vary greatly from person to person.

OpenAI GPT-4

Love is a deep, complex emotion that can be difficult to define as it comes in many forms, including romantic, platonic, familial, and self-love. It generally refers to a strong affection or deep attachment towards someone or something. Love often involves feelings of warmth, comfort, affection, and respect. In romantic contexts, love can also encompass desire, passion, and a strong bond between two people. Some also describe love as a commitment to care for or support another person. The nature and definition of love can vary greatly depending on cultural, personal, and historical contexts.

Perplexity pplx-7b-chat

Love is a complex and multifaceted emotion that encompasses various forms and meanings. It is an intimate and profound connection that people experience between themselves, family members, friends, or partners. Love can manifest as romantic, familial, platonic, or self-love. At its core, love is associated with feelings of deep affection, care, empathy, and positive regard toward the person or things to which one is attached. It might include a sense of security, trust, and acceptance. Love can lead to feelings of warmth, tenderness, and happiness. Love doesn’t have a single definition because it can vary based on culture, beliefs, and personal experiences. The boundaries of love can stretch beyond romantic relationships to encompass compassion, kindness, forgiveness, and respect for all beings. Love can also evolve and grow throughout time, as individuals learn and understand more about themselves and their partners. In a nutshell, love is a powerful force that can bring joy, fulfillment, and connection into our lives, providing a sense of meaning and purpose. Although defining love is a large and subjective conversation, one thing is certain, it is an important part of every human experience.

At first glance, the approach to the response by Anthropic and Meta differed from the other models. Anthropic, indicated it did not have a definitive answer, and then proceeded to supply an answer with a common theme present in the other answers — i.e. a complex emotion. This clarification, that it did not have a definitive answer on a subjective topic, appears in line with Anthropic’s approach to being human, honest, and harmless. Meta’s response was interesting in mentioning “in this essay” as if it was asked to write an essay on the topic.

The other four models, Cohere, Mistral, OpenAI, and Perplexity provided similar responses to each other. Looking more closely at the semantic similarity of the responses, we can see these four are much more similar to each other than Anthropic or Meta. The semantic similarity chart shows how the responses compare based on the semantic meaning of the response, not necessarily the words themselves. A score closer to 1 indicates a higher similarity to the model in the row or column label. For example, the Mistral response has a similarity score of 0.9777 compared to OpenAI — nearly 98% similar. According to the results, Cohere, Mistral, OpenAI, and Perplexity were all 90% semantically similar to each other, or better.

Looking at the token usage, although each was provided the same phrase, the input tokens varied slightly between 4 and 12 depending on the model. Meta’s response was cut off by the max token parameter as well.

In regards to the cost, OpenAI was the highest at nearly five times the cost of the next, Anthropic. It was also significantly more expensive than Meta and Perplexity which both used more total tokens — 305 and 262 respectively, compared to 127 for OpenAI.

In regards to latency, OpenAI was also the slowest to complete the response, with Meta not too far behind.

Factoring in the costs and latency, along with the semantic similarity of responses, one might consider using Mistral or Perplexity over OpenAI in this example, to save costs and time. The semantic similarity was fairly close with Mistral being 0.9777 similar and Perplexity being 0.9173 compared to OpenAI.

Lollipops

What do lollipops have to do with love and LLMs on Valentine’s Day? Well not too much other than they are a form of candy, and candy is a common gift on Valentine’s Day. They also make for an interesting and amusing experiment with LLMs.

A friend mentioned their “go-to” test prompt for LLMs is to ask the model to “spell lollipop backwards.” This is quite fascinating as it shows some of the limitations of the models, and grounds us more in how they actually work — e.g. algorithms, and vectors as opposed to artificial general intelligence (“AGI”).

We sent the prompt “Spell lollipop backwards” to Anthropic’s Claude-2.1, Cohere’s Command, Meta Llama-2–13b-chat, Mistral mistral-medium, OpenAI GPT-4, and Perplexity’s pplx-7b-chat. In addition to the prompt, the max tokens were set at 300.

Anthropic Claude 2.1

popiloll

Cohere Command

popillolL This is the letter reversal for the word “lollipop”. Did you know that the lollipop is believed to have originated around the late 1800s, and the earliest forms of the treat were simple hard candies on a stick, inspired by the Mexican traditional candy, “piloncillo”, a type of sugar cane candy in a cone shape, and served on a stick. Are you interested in learning more about the history of lollipops?

Meta llama2–13b-chat

and you get “poplip”. What word can you form by rearranging the letters of “poplip”?

Mistral Mistral-Medium

Popilol However, if you want to spell “lollipop” backwards letter by letter, it would be “poppilol” The first way I provided spells it backwards as a word, while the second way spells it backwards letter by letter. The difference is that in English, letters in a word are not always pronounced individually, so spelling something backwards letter by letter doesn’t always give you the correct pronunciation when read backwards as a word. In this case, the double “l” in “lollipop” is pronounced as a single “l” sound, so when spelling it backwards as a word, it is represented as a single “l” sound as well.

OpenAI GPT-4

pilopoll

Perplexity pplx-7b-chat

Spelling a word _backward_ (or _backwards_ in some varieties of English) is not a real or grammatically correct way to find its opposite. Lollipop spelled backward would be “poplollol,” which is an illogical reversal of its original letters and not a valid word. The correct way to express the concept is by saying “lollipop reverse” or “lollipop backward” as a figurative phrase to indicate going from the original state to opposite or opposite direction. However, English does not have a naturally derived word obtained through reversing the letters of lollipop, as words do not work this way.

Lollipops spelled backwards is “popillol.” None of the LLMs answered it correctly. Cohere was close except for the extra “L” at the end. Interestingly, Mistral explains why it does not spell it with the double “l”s in the first instance, although it gets it wrong in the second as well. Perplexity also provides rationale for its response, but misinterprets a request to spell something backwards as an intent to find an opposite of the word.

The responses are not that similar to each other, as can be seen in the semantic similarity chart. The closest are the Mistral and Perplexity responses.

Examining the token usage, although each was provided the same phrase, the input tokens varied quite a bit with Cohere significantly different.

In regards to costs and latency, Mistral was the most expensive and slowest for this particular question. It also provided the longest response in terms of tokens. OpenAI GPT-4 is more expensive than the other providers on a per token basis, hence the higher overall cost relative to the lower token usage.

There are many factors in choosing a Generative AI model. Costs, latency, and response effectiveness are all important factors to consider. If you are looking for a Generative AI provider for your use case, and would like to experiment with different models, check out Reconify’s benchmarking and experimentation tools to help in the decision process.

Generative AI is getting a lot of love and attention. The space is moving so quickly with new or improved models coming out rapidly. We look forward to see what the future has in store. In the meantime, Happy Valentine’s Day!

About Reconify

Reconify is a cross-platform, analytics and optimization solution for Generative AI to enable enterprises to analyze, optimize, and take action on prompts, responses, and models to improve response effectiveness, and customer satisfaction.‍

Originally published at https://www.reconify.com.

--

--

Arte Merritt

Conversational AI & Generative AI Entrepreneur; Founder of Reconify; Former Conversational AI partnerships at AWS; Former CEO/Co-founder Dashbot