I was fortunate to be invited to speak about “The Best Practices in Conversational AI” at the World AI Cannes Festival last week. The last time I was in Cannes was to present my startup at the Lions Festival, so it was exciting to be back and be able to meet people from all over the world.
The conference covered a wide variety of AI/ML topics — including underlying technologies like conversational AI, natural language processing (NLP), ML modeling, text analytics, big data, intelligent document processing, and augmented reality / virtual reality (AR/VR); applications of AI/ML including HR, health & beauty, travel, transportation, the environment, and connected cities; as well as the social impacts of AI/ML including ethical AI, “AI for good,” and responsible uses of AI and data.
Below are some of the takeaways from the conference.
Conversational AI is here and now
Conversational AI was a common theme at the conference — prevalent in the talks and exhibitor booths.
The exhibitors included platforms for building conversational AI chatbots and voice assistants, enablers for conversational AI (e.g. NLP engines and model/training data providers), as well as companies based solely around a conversational AI interface. There were even demos of human-avatar, virtual agents in the booths and around the exhibitor hall.
Talks covered conversational AI as well — especially the ones from Meta. In a presentation on AI adoption trends in Italy by a team from Politecnico di Milano School of Management, chatbots and voice assistants were second only to intelligent document processing in terms of adoption this past year.
My talk was on “The Best “Practices in Conversational AI” — based on the experiences gained being in the space for the past seven years. I founded and led a startup for chatbot and voice assistant analytics that processed 90B messages across a wide variety of use cases. I have also interviewed a lot of leaders in the space through panel events, conferences, and articles.
Meta AI is impressive
Meta had one of the largest presences at the conference, including having two different booths and multiple talks. In addition to the main booth demonstrating different AI initiatives, they had one for the “Metaverse” wherein visitors could experience the VR headsets themselves.
One of the most interesting presentations was by Antoine Bordes, the Managing Director of Meta AI, discussing language AI and the efforts they are making in text translation and speech-to-speech translation. The presentation on the Metaverse by Sean Liu, Director of Product Management at Meta AI, was also quite fascinating. Both covered aspects of conversational AI in the talks.
Direct language translation
Meta is creating true, language-to-language translation in text without an intermediary.
Antoine explained how typically translation services use English as an intermediary — for example to go from Spanish to French, a model would translate Spanish to English and then English to French. While this makes it easier to scale, a lot of information can be lost. For example, Spanish and Portuguese have a lot of similarities and similar cultures — translating through English in between would result in the loss of meaning, and a loss of the ability to use all the data that exists between those languages.
Meta has created models to go directly from one language to the other, without English in between — resulting in ~10,000 combinations covering ~100 languages.
Meta is also working on speech-to-speech translation without going to text as an intermediary. Traditionally, voice translations would be transcribed to text using speech-to-text, translated, and then converted back to voice using text-to-speech. Doing this however, loses a lot of information in the speech including intonation, emotion, pace, accenting of words, and more.
While they do not yet have speech-to-speech fully working, they have made progress on aspects like replicating intonation and pace. Antoine showed a demo of their ability to replicate the intonation and pace of a speaker and apply it to additional speech. He also showed a demo of two AIs speaking in which the AI included pauses and signs of emotion (minor laughter) — reminiscent of Google’s duplex demo a few years ago. The demos were quite impressive.
Sean Liu’s presentation on the Metaverse covered a wide variety of AI/ML topics — including conversational AI, AR/VR, image and video recognition, ethics, and more.
One of the interesting AI challenges for the Metaverse relates to image and video recognition. Sean explained how in the Metaverse, users have an “ego-centric” view — they see everything from their point of view. This presents a challenge for image recognition as prior models may not be trained for this view. For example, while a model may be trained on pictures of a person, they may not be trained on seeing one’s own hand from the first-person point-of-view. The challenges are complicated further in that the technology needs to be able to detect items in video.
The biggest challenge appears to be combining all the AR/VR technology into a small form factor. The goal is to be able to work on a pair of glasses. There are challenges to handle all the compute power needed, data transfer, heat, reducing model sizes, and more. A member of the audience asked about leveraging an additional device users already have, like a phone or watch, to offload some of the work. While it is a concept Meta is exploring, there are still challenges with the amount of data transfer that would need to occur between the glasses and an external device.
While discussing the conversational AI aspects of Metaverse, Sean touched upon Meta’s “Project Cairaoke” — an interesting project that combines natural language understanding (NLU) and natural language generation (NLG) in providing more natural, personalized, and contextual conversational AI.
Within the Meta booth, was a rather fun and interesting Sketch Demo that turned a paper drawing of a humanoid character into an animation. It was created by Jesse Smith, a postdoctoral researcher at Meta AI.
Users sketch their character, take a photo of it, and the AI scans and converts it into an animation that can run, dance, jump, and do other movements as well.
While it was trained on bipeds, I drew my favorite, four-legged, feline friend and it surprisingly still worked! Although it recognized the tail as an arm which made for a funnier result. As part of the processing step, you can adjust the location of the limbs and joints so we could have fixed it as well.
With a more “normal” character, the results are pretty impressive:
Ethics, accessibility, and inclusion are key topics
Ethics, accessibility, and inclusion were common themes heard through-out the conference — the importance of which was stressed by exhibitors and presenters alike. In fact some of the participating companies were solely focused on these topics as part of the “AI for good” theme. There were talks dedicated to them as well, including one by Francesca Rossi, the AI Ethics Global Leader at IBM.
While AI is fascinating, and can be used for a lot of good, folks do need to keep in mind potential biases, the role of accessibility and inclusion, as well as how we use and store data, and incorporate these into the models and applications.
Overall, the WAICF was a great experience full of interesting people and talks. The conference organizers did a fantastic job with the festival too. It was well worthwhile and I look forward to next year’s event!
Arte Merritt leads Conversational AI partner initiatives at AWS. He is a frequent author and speaker on conversational AI and data insights. He was the founder and CEO of the leading analytics platform for conversational AI, leading the company to 20,000 customers, 90B messages processed, and multiple acquisition offers. Arte is an MIT alum.