Confluent launches plug-and-play option for realtime streaming AI
Discover how companies are responsibly integrating AI in production. This invite-only event in SF will explore the intersection of technology and business. Find out how you can attend here.
Data streaming company Confluent just hosted the first Kafka Summit in Asia in Bengaluru, India. The event saw a massive turnout from the Kafka community — over 30% of the global community comes from the region — and featured several customer and partner sessions.
In the keynote, Jay Kreps, the CEO and co-founder of the company, shared his vision of building universal data products with Confluent to power both the operational and analytical sides of data. To this end, he and his teammates showed off several innovations coming to the Confluent ecosystem, including a new capability that makes it easier to run real-time AI workloads.
The offering, Kreps said, will save developers from the complexity of handling a variety of tools and languages when trying to train and infer AI models with real-time data. In a conversation with VentureBeat, Shaun Clowes, the CPO at the company, further delved into these offerings and the company’s approach to the age of modern AI.
Confluent’s Kafka story
Over a decade ago, organizations heavily relied on batch data for analytical workloads. The approach worked, but it meant understanding and driving value only from information up to a certain point – not the freshest piece of information.
VB Event
The AI Impact Tour – San Francisco
Request an invite
To bridge this gap, a series of open-source technologies powering real-time movement, management and processing of data were developed, including Apache Kafka.
Fast forward to today, Apache Kafka serves as the leading choice for streaming data feeds across thousands of enterprises.
Confluent, led by Kreps, one of the original creators of the open platform, has built commercial products and services (both self and fully managed) around it.
However, that is just one piece of the puzzle. Last year, the data streaming player also acquired Immerok, a leading contributor to the Apache Flink project, to process (filtering, joining and enriching) the data streams in-flight for downstream applications.
Now, at the Kafka Summit, the company has launched AI model inference in its cloud-native offering for Apache Flink, simplifying one of the most targeted applications with streaming data: real-time AI and machine learning.
“Kafka was created to enable all these different systems to work together in real-time and to power really amazing experiences,” Clowes explained. “AI has just added fuel to that fire. For example, when you use an LLM, it will make up and answer if it has to. So, effectively, it will just keep talking about it whether or not it’s true. At that time, you call the AI and the quality of its answer is almost always driven by the accuracy and the timeliness of the data. That’s always been true in traditional machine learning and it’s very true in modern ML.”
Previously, to call AI with streaming data, teams using Flink had to code and use several tools to do the plumbing across models and data processing pipelines. With AI model inference, Confluent is making that “very pluggable and composable,” allowing them to use simple SQL statements from within the platform to make calls to AI engines, including those from OpenAI, AWS SageMaker, GCP Vertex, and Microsoft Azure.
“You could already be using Flink to build the RAG stack, but you would have to do it using code. You would have to write SQL statements, but then you’d have to use a user-defined function to call out to some model, and get the embeddings back or the inference back. This, on the other hand, just makes it super pluggable. So, without changing any of the code, you can just call out any embeddings or generation model,” the CPO said.
Flexibility and power
The plug-and-play approach has been opted for by the company as it wants to give users the flexibility of going with the option they want, depending on their use case. Not to mention, the performance of these models also keeps evolving over time, with no one model being the “winner or loser”. This means a user can go with model A to begin with and then switch to model B if it improves, without changing the underlying data pipeline.
“In this case, really, you basically have two Flink jobs. One Flink job is listening to data about customer data and that model generates an embedding from the document fragment and stores it into a vector database. Now, you have a vector database that has the latest contextual information. Then, on the other side, you have a request for inference, like a customer asking a question. So, you take the question from the Flink job and attach it to the documents retrieved using the embeddings. And that’s it. You call the chosen LLM and push the data in response,” Clowes noted.
Currently, the company offers access to AI model inference to select customers building real-time AI apps with Flink. It plans to expand the access over the coming months and launch more features to make it easier, cheaper and faster to run AI apps with streaming data. Clowes said that part of this effort would also include improvements to the cloud-native offering, which will have a gen AI assistant to help users with coding and other tasks in their respective workflows.
“With the AI assistant, you can be like ‘tell me where this topic is coming from, tell me where it’s going or tell me what the infrastructure looks like’ and it will give all the answers, execute tasks. This will help our customers build really good infrastructure,” he said.
A new way to save money
In addition to approaches to simplifying AI efforts with real-time data, Confluent also talked about Freight Clusters, a new serverless cluster type for its customers.
Clowes explained these auto-scaling Freight Clusters take advantage of cheaper but slower replication across data centers. This results in some latency, but provides up to a 90% reduction in cost. He said this approach works in many use cases, like when processing logging/telemetry data feeding into indexing or batch aggregation engines.
“With Kafka standard, you can go as low as electrons. Some customers go extremely low latency 10-20 milliseconds. However, when we talk about Freight Clusters, we’re looking at one to two seconds of latency. It’s still pretty fast and can be an inexpensive way to ingest data,” the CPO noted.
As the next step in this work, both Clowes and Kreps indicated that Confluent looks to “make itself known” to grow its presence in the APAC region. In India alone, which already hosts the company’s second biggest workforce based outside of the U.S., it plans to increase headcount by 25%.
On the product side, Clowes emphasized they are exploring and investing in capabilities for improving data governance, essentially shifting left governance, as well as for cataloging data driving self-service of data. These elements, he said, are very immature in the streaming world as compared to the data lake world.
“Over time, we’d hope that the whole ecosystem will also invest more in governance and data products in the streaming domain. I’m very confident that’s going to happen. We as an industry have made more progress in connectivity and streaming, and even stream processing than we have on the governance side,” he said.