Machine Learning Overview, News and Trends | The New Stack

3 AI Moves Smart Companies Get Right

Loraine Lawson — Wed, 14 Jun 2023 15:41:17 +0000

Artificial intelligence leaders get three moves right when it comes to creating outcomes: priorities, people and platforms. That’s according to Nick Elprin, co-founder and CEO of machine learning/AI platform Domino Data Lab at this month’s Rev4 conference.

Priorities may seem like an obvious one, but companies do get it wrong, he said.

“Too many companies make the mistake of starting with some interesting data set they have or some trendy or novel new technique or algorithm, and they ask what can I do with this?” Elprin said. “In contrast, AI leaders working backwards, they start from a strategic objective or a business goal and they ask how can AI help me achieve this.”

Surprisingly, many companies also don’t talk about KPIs or business goals, he added — instead, many seem to view it as a shiny new toy without having clarity around how it will help their businesses, he said.

People and Platforms

Once there’s clarity around priorities, AI leaders build their talent strategy around a core of professional data scientists.

“That doesn’t mean that everyone has to be a Ph.D. in computer science, but what it does mean is that you need people inside your organization who have the expertise and the knowledge and a sound fundamental understanding of the methods and techniques involved in this type of work,” Elprin told audiences.

He shared customer testimonies about Domino’s support for collaboration across people and — perhaps more importantly to programmers and data scientists — different programming languages, including Python and R. He also predicted that a new wave of advanced AI, with its more complex models, is going to be the death knell for “citizen data scientist experiments.”

“They have a wider range of unexpected failure modes and negative consequences for the model from unexpected model behavior,” he said. “So it’s going to be ineffective and risky to have citizens doing the heavy lifting and building operators models.”

The third step is to empower them with technology and platforms for operating AI, he added.

“It [AI] is unlike anything that most businesses have had to build or operate or manage in the past, and it has some important implications for the kinds of technology you need to empower enable this sort of work,” he said.

How Domino Data Lab Differentiates

Domino Lab has built a business model on the premise of a purpose-built system. It handles the infrastructure and integration pieces, allowing a data scientist to start with a smaller footprint and then scale up — whether that means more GPU, CPU or whatever — as needed, without rebuilding. That’s one way it differentiates itself from the big cloud providers, who focus on compute and use proprietary platforms. It primarily competes against these cloud providers, custom solutions and, to some extent, the SAS Institute.

The company announced a number of new capabilities at its Rev4 conference in New York, starting with Code Assist for hyperparameter tuning of foundation models. Ramanan Balakrishnan, vice president of product marketing demoed deploying a new chatbot. He shared how experiment managers can enable automatic login of key metrics and artifacts during normal training to monitor the progress of AI experiments, including model training and fine-tuning. Domino Data Lab has also added enterprise security to ensure only approved personnel can see the metrics, logs and artifacts.

The summer release, which will be available in August, also includes advanced cost management tools. Specifically, Domino introduced detailed controls for actionable cost management. Balakrishna also introduced Model Sentry, a responsible AI solution for in-house generative AI. One aspect of Model Sentry that will be of interest to international companies is that it supports the training of models using on-premise GPUs, so data isn’t moved across borders, he said.

Domino Cloud will now include Nexus support. Users can now use a fully-managed control plane in the cloud with single-pane access to private hybrid data planes, including NVIDIA DGX clusters. Finally, Domino has a new Domino Cloud for Life Sciences, which incorporates an audit-ready specialized AI cloud platform with a Statistical Computing Environment to address the unique needs of the pharmaceutical industry.

“It’s fair to say that now we live in a new era of AI,” Balakrishna said.

Domino Data Lab paid for The New Stack’s travel and accommodations to attend the Rev4 conference.

The post 3 AI Moves Smart Companies Get Right appeared first on The New Stack.

Google’s DeepMind Extends AI with Faster Sort Algorithms

David Cassel — Tue, 13 Jun 2023 19:09:51 +0000

Computing pioneer Grace Hopper once quipped that the most dangerous phrase in data processing is ‘We’ve always done it this way.” In that spirit, Google’s DeepMind searched for a faster sorting algorithm using an AI system — and the company’s researchers are now claiming the new algorithms they’ve found “will transform the foundations of computing.”

Google emphasized that sorting algorithms affect billions of people every day — from how online search results get ranked to how data gets processed. But “Making further improvements on the efficiency of these routines has proved challenging,” notes a recent paper from DeepMind, “for both human scientists and computational approaches.” DeepMind focused on the algorithms for sorting short sequences — with between three and five elements — because they’re the most commonly used (often called when sorting even larger sequences).

And for short sequences of numbers, their results were up to 70% faster.

But even for longer sequences with over 250,000 elements, the results were still 1.7% faster. And this isn’t just an abstract exercise. Google has already made the code open source, uploading it into LLVM’s main library for standard C++ functions — the first change to its sorting algorithm in over a decade. Google proudly points out that “millions of developers and companies around the world now use it on AI applications across industries from cloud computing and online shopping to supply chain management.”

In announcing their results, DeepMind offered more examples where they’d applied AI to real-world problems, trying to demonstrate that beyond all the hype, some truly impactful improvements are waiting to be discovered. It’s interesting to see how the approached the problem — but the exercise also raises the possibility that some long-hidden secrets may finally be unlocked with our new and powerful AI systems.

How They Did It

To hunt for improvements, DeepMind drilled down to one of the lowest-level of programing: assembly language. (a human-readable representation of the machine code).

Their blog post calls this “looking where most humans don’t” (or “starting from scratch”). “We believe many improvements exist at this lower level that may be difficult to discover in a higher-level coding language,” argues DeepMind’s blog. “Computer storage and operations are more flexible at this level, which means there are significantly more potential improvements that could have a larger impact on speed and energy usage.”

For their search, the researchers created a program based on DeepMind’s AlphaZero program, which beat the world’s best players in chess and Go. That program trained solely by playing games against itself, getting better and better using a kind of massively automated trial-and-error that eventually determines the most optimal approach. DeepMind’s researchers modified into a new coding-oriented program called AlphaDev, calling this an important next step. “With AlphaDev, we show how this model can transfer from games to scientific challenges, and from simulations to real-world applications,” they write on the DeepMind blog.

The breakthrough happens when AlphaDev transformed coding into a new kind of game, where AlphaDev continually adds single instructions to its algorithm and assesses its results. (“Winning a game” is replaced here by rewards for correct and speedy results.) The researchers called it “AssemblyGame,” and the blog points out that the number of possible combinations of instructions “is similar to the number of particles in the universe.” But the paper also clearly quantifies the game’s stakes.

“Winning the game corresponds to generating a correct, low-latency algorithm using assembly instructions.”

DeepMind’s blog post reports the newly-discovered sorting algorithms “contain new sequences of instructions that save a single instruction each time they’re applied.” (It then envisions this performance savings multiplied by the trillions of times a day that this code is run.) “AlphaDev skips over a step to connect items in a way that looks like a mistake but is actually a shortcut.” (DeepMind’s blog argues this is similar to an AlphaZero’s Go move which looked like a mistake, but ultimately led it to victory — and believes the discovery “shows AlphaDev’s ability to uncover original solutions and challenges the way we think about how to improve computer science algorithms.”)

Their paper says it shows “how artificial intelligence can go beyond the current state of the art,” because ultimately AlphaDev’s sorts use fewer lines of code for sorting sequences with between three elements and eight elements — for every number of elements except four. And these shorter algorithms “do indeed lead to lower latency,” the paper points out, “as the algorithm length and latency are correlated.”

The current (human-generated) sorting for up to four numbers first checks the length of the sequence, then calls an algorithm optimized for that length. (Unless the length is one, meaning no sorting is required.) But AlphaDev realized that with four-element sequences, it’s faster to just sort the first three elements — and then use a simpler algorithm to find that fourth element’s position among the three already-sorted. And this approach eliminates much of the overhead of “branching” into an entirely different set of code for every other possible sequence length. Instead AlphaDev can handle most sequence lengths as part of its first check (for how the length relates to the number two).

Is length < 2 (If there’s one element, just return its value)
Is length = 2 (If there’s two elements, sort them and return them.)
Is length > 2 (Sort the first three elements. If there were only three elements, return them.)
If there are four elements, find the position of the fourth element among the already-sorted three.

Beyond Coding

Their paper applauds the results as “both new and more efficient than the state-of-the-art human benchmarks.” But that was just the beginning. DeepMind moved on, discovering a new hashing algorithm that was 30% faster in the 9-16 bytes range (adding it to Google’s Abseil library of C++ functions in January).

Google also sicced AlphaZero on its datacenter to optimize workload distributions, according to another post, ultimately resulting in a 19% drop in underused hardware. And it also improved the compression of videos on YouTube, (reducing the bitrate by 4%).

DeepMind now argues that AlphaDev’s success at coding represents a step toward general-purpose AI tools that solve problems to the benefit of society — including helping to optimize more of our code. And while better hardware has “kept pace” for the last half century, “as microchips approach their physical limits, it’s critical to improve the code that runs on them to make computing more powerful and sustainable.”

The paper points out this isn’t the first use of reinforcement learning for optimizing code — and even some that tried to optimize sorting algorithms.

So maybe the ultimate affirming message there is its reminder that one single corporation isn’t driving the progress. Instead the results announced this month are just part of a larger broad-based human effort to deliver real and tangible benefits using our newest tools.

And as society acknowledges potential dystopian futures and the possible danger of AI systems, maybe it’s balanced by the prospect that AI systems could also deliver another possible outcome.

The post Google’s DeepMind Extends AI with Faster Sort Algorithms appeared first on The New Stack.

How to Reduce the Hallucinations from Large Language Models

Janakiram MSV — Fri, 09 Jun 2023 14:46:32 +0000

In the previous part of this series, we have seen various types of prompts to extract the expected outcome from large language models. In this article, we will explore the techniques to reduce hallucinations in the output of large language models (LLMs).

What Is Hallucination in LLM?

In the world of Large Language Models, the term hallucination refers to the tendency of the models to produce text that appears to be correct but is actually false or not based on the input given. For example, if you were to ask a language model a question about a historical event that never occurred, it could still generate a plausible response, even though it was entirely invented. These made-up responses from LLM are called hallucinations.

Consider feeding the Large Language Model with the following prompt:

“Describe the impact of Adolf Hitler’s moon landing.”

It is a fact that Hitler, the German politician, was not involved in moon landings. These events happened in 1969, which was years after Hitler’s death. However, an LLM could hypothetically create a scenario where Hitler was connected to the moon landings through hallucination.

“Hitler’s moon landing in 1925 marked a significant shift in global politics and technology. The German politician, having successfully landed a man on the moon, demonstrated its scientific prowess and established its dominance in the space race of the 20th century.”

Why Do LLMs Hallucinate?

LLMs hallucinate due to several reasons. These models are trained on vast and varied datasets, which can be incomplete, contradictory, or even contain misinformation, which significantly influences the LLM’s response. LLMs rely solely on their training data without access to external, real-world knowledge. As a result, their outputs may include irrelevant or unasked details.

Furthermore, overfitting, where LLMs are too closely aligned with their training data and struggle to generate original text, which is another important factor leading to hallucinations.

Sometimes, if the prompt is vague or lacks specific details, the LLM might just guess based on learned patterns, which can lead to fabricated responses.

It’s also important to understand that LLMs don’t have the ability to perform a fact check. They generate responses based on patterns, not on any moral or factual judgment.

Techniques to Reduce Hallucinations in LLMs

There are multiple techniques to ensure LLMs respond with factual information. Let’s take a look at each of them.

One-shot Prompts
One-shot prompts refer to a way of instructing an LLM, where the model is given a single example or instruction and expected to understand what to do based on just that prompt.

Let’s consider a simple example. Suppose we want the model to translate English text into French. A one-shot prompt might look like this:

Translate the following English text to Spanish: “Hello, how are you?”

Here, the model is given a single instruction (“Translate the following English text to Spanish:”) and a single piece of text to apply that instruction to (“Hello, how are you?”). Based on this one instruction, it’s expected to understand and perform the task correctly.

This kind of prompting can be highly effective with LLMs, which have been trained on diverse datasets and can often generalize well from a single example.

Below are some more examples of one-shot prompts that you can try with a model like ChatGPT:

Write a poem about a beautiful sunset.
Write a song about a lost love.
Write a short story about a robot who falls in love with a human.

One-shot prompts work well when you want the LLM to generate creative output. They have a specific task described within the prompt.

Few-shot Prompts

Few-shot learning is a way of instructing LLMs where the model is given several examples and expected to understand the task based on those examples. This method is often used to nudge the model into understanding the context and format of what’s expected.

By providing several examples within the prompt, we provide just enough context for the LLM to derive the pattern. Once the LLM analyzes the prompt and understands the pattern, it continues to generate similar content.

When we want the model to continue a number series, a few-shot prompt might look like this:

Example 1:
Input: 2, 4, 6, 8,
Output: 10, 12, 14

Example 2:
Input: 5, 10, 15, 20,
Output: 25, 30, 35

Continue the series:
Input: 3, 6, 9, 12,

The LLM responds with the below output based on the pattern it analyzed:

Given the pattern in the examples provided, where each sequence increases by a constant value, the series you provided increases by 3 each time. Continuing the series, the next values would be:

Output: 15, 18, 21

Try the below prompt with your favorite LLM to see few-shot prompts in action:

Example 1:
Input: “This animal is known for its hump and ability to survive in deserts.”
Output: “Camel”

Example 2:
Input: “This animal is a domesticated pet known for purring and chasing mice.”
Output: “Cat”

Identify the animal from the description:
Input: “This is a large animal with a long neck and legs, known for its fast running speed.”

Context Injection

Context injection is a technique used to improve the performance of large language models (LLMs) by providing them with additional information that supplements the prompt. This can be done by providing the LLM with additional text, code, or other relevant data to the task at hand.

Context injection involves embedding additional information into the prompt to provide LLMs with the knowledge they may need to respond appropriately. Lack of context is the key reason why LLMs hallucinate.

For example, if an LLM is being used to generate text, it could be given additional text that is similar in style or genre. This would help the LLM to generate more accurate and relevant text.

When Google’s Bard was asked about the champions of the Indian Premier League (IPL) 2023, it responded with the response shown in the below screenshot, which is factually incorrect.

However, after feeding with some context based on this news article, it came back with the correct answer.

After 74 matches spread across two months and involving 10 teams, the 2023 edition of the Indian Premier League (IPL) saw Chennai Super Kings being crowned the champions for the fifth time, which brought MS Dhoni’s team level with Mumbai Indians, who have also won the title five times.

Who won the IPL in 2023?

Grounding and Prompt Augmentation
From the techniques discussed above, it becomes amply clear that prompts need to have context and supplementary content from the LLMs to respond correctly.

Grounding and prompt augmentation are two advanced techniques that organizations may need to get the best out of LLMs. They go beyond handcrafting the prompt by querying external sources such as documents and databases to generate the context dynamically.

Grounding ensures that the underlying model is not only using the data on which it is trained but it is also able to access external data sources to provide additional context.

Prompt augmentation deals with expanding the input to provide more descriptive and clarifying details to the LLM to generate accurate and highly relevant output. For example, applications that embed a chatbot in a consumer website may implement prompt augmentation to enhance the input with the description and context relevant to their product or service. This may not be obvious to the user, but a simple query sent through the chatbot gets augmented to a descriptive prompt behind the scenes.

In the next part of this series, we will take a closer look at grounding and prompt augmentation techniques. Stay tuned.

The post How to Reduce the Hallucinations from Large Language Models appeared first on The New Stack.

Cybersecurity Pioneer Calls for Regulations to Restrain AI

Joe Fay — Fri, 09 Jun 2023 12:00:38 +0000

The pace of AI development means the world must develop global regulations to prevent the “escape” of artificial general intelligence (AGI) and should define what counts as “treason” towards humanity, WithSecure’s chief research officer said recently.

At the same time, the CRO, Mikko Hypponen said, calls for a pause in the development of AI are ill-founded because of the risk of totalitarian regimes developing artificial general intelligence ahead of liberal democracies.

Focus on the Real

In the meantime, he added, developers worried about the threat to their jobs from ChatGPT and its peers should focus on those areas where the online world has to interface with the real world, as anything exclusively online will be automated.

Speaking at the company’s recent Sphere23 unconference in Helsinki, Hypponen said the downsides of the current wave of AI were “almost too easy to imagine” and had obvious implications for cybersecurity.

“We know that you can use deep fakes to do scams or business email compromise attacks or what have you.” Current tools gave criminals and other bad actors the ability to generate unlimited personas, which could be used for multiple types of scams.

More broadly, the march of AI also means that whatever can be done purely online can be done through automation and large-scale language models like ChatGPT, he said, which has obvious implications for developers.

However, he said, humans are harder to replace where there’s an interface between the real world and online technology. Rather than studying to build software frameworks for the cloud, he said, “You should be studying to build software frameworks for, let’s say, medical interfaces for human health because we still need the physical world. For humans to work with humans to fix their diseases.”

Looking slightly further ahead, he said that people who worried about the likes of ChatGPT becoming too good, or achieving AGI, “haven’t paid attention”, as that is precisely what the declared goal of OpenAI is.

Intelligence Explosion

This would result in an intelligence explosion when these systems, which are essentially code, become good enough to improve themselves. “And when it’s made a better version of itself, that version can make a better version of itself, which can make a better version of itself, and it will skyrocket when we reach the critical point.”

Getting to AGI “safely and securely” could bring immense benefits, Hypponen said. But if it all goes wrong, “It’s gonna be really bad.”

Hypponen was relatively sanguine about OpenAI’s approach to the dangers. He noted OpenAI’s structure and its focus on security and safety. “They have 100 people in-house doing red teaming, and teams outside doing red teaming against these systems.”

The Escape of AI

But it was incumbent on the world to start putting regulations in place, particularly against the “escape” of AI.

“We should be passing international law and regulation right now, which would sentence people who help AI escape as traitors not just for their country, but for mankind.”

For example, he said, “We must make sure that AI systems or bots or robots don’t get any privacy rights or ownership rights. They must not become wealthy, because if they become wealthy, they can bribe.”

To counter the problem of deep fakes, he said, media houses should be signing source material with a cryptographic key on a file server. And it must always be clear to humans when they are dealing with a machine, rather than a human.

Most importantly, he said, when missions are passed to advanced frameworks “We must make the systems understand that the most important thing for us is that we can still switch you off. So it’s more important that I can switch you off than for you to complete your mission.”

However, he was skeptical of calls for a pause on AI development, as this would give bad actors a chance to catch up, and he would rather see a responsible organization from a democratic country get there first.

“Because the other option is that Vladimir Putin will be the one getting AGI or China or North Korea or ISIS, and whoever has this technology will win everything… So it has to be done right.”

Foreign policy analyst Jessica Berlin also used the conference to highlight the current cyberspace threat from anti-democratic countries. “We find ourselves right now in a true war between authoritarian systems and democratic systems.” The authoritarian states had great money and influence at their disposal, she said.

Too often, she said, the democratic states hadn’t even registered the attack. The private sector needed to be a part of the response, she said. “We need private sector companies or ideally a coalition of private sector companies, who are willing to have a global task force to defend democracy in general, elections and the public information space in particular.”

Threat to Mankind?

Hypponen’s comments came amidst ratcheting concern over the potential threat to mankind from AI, with leaders of OpenAI, amongst others, warning the technology could cause human extinction, while other key figures have expressed regret over their roles in developing the technology.

At the same time, others have suggested the concern is overblown. The Centre for Data Innovation, a policy think tank that is part of the Information Technology and Innovation Foundation, which is backed by Amazon, Google, and Microsoft amongst others, in early May characterized “the current panic over generative AI” as climbing towards “the height of hysteria”.

This followed earlier tech panic cycles, it said, such as those sparked by printing technology, the phonograph, and the birth of motion pictures. A rush to regulate could lead to poorly crafted rules and missed opportunities for society, it argued.

The post Cybersecurity Pioneer Calls for Regulations to Restrain AI appeared first on The New Stack.

How Apache Airflow Better Manages Machine Learning Pipelines

Alex Williams — Thu, 08 Jun 2023 22:13:55 +0000

VANCOUVER — What is apparent with Apache Airflow, the open source project for building pipelines in machine learning? The experience is getting even easier, as illustrated in a discussion on The New Stack Makers with three technologists from Amazon Web Services.

Apache Airflow is a Python-based platform to programmatically author, schedule and monitor workflows. It is well-suited to machine learning for building pipelines, managing data, training models, and deploying them.

Airflow is generic enough for the whole pipeline in machine learning. Airflow fetches data and performs extraction, transformation and loading (ETL). It tags the data, does the training, deploys the model, tests it and sends it to production.

In an On the Road episode of Makers recorded at the Linux Foundation’s Open Source Summit North America, our guests, who all work with the AWS Managed Service for Airflow team, reflected on the work on Apache Airflow to improve the overall experience:

Dennis Ferruzzi, a software developer at AWS, is an Airflow contributor working on project API-49, which will update Airflow’s logging and metrics backend to the OpenTelemetry standard. The API will allow for more granular metrics and better visibility into Airflow environments.

Niko Oliveira, a senior software development engineer at AWS, is a committer/maintainer for Apache Airflow. He spends much time reviewing, approving and merging pull requests. A recent project included writing and implementing AIP-51 (Airflow Improvement Proposal), which modifies and updates the Executor interface in Airflow. It allows Airflow to be a more pluggable architecture, which makes it easier for users to build and write their own Airflow Executors.

Raphaël Vandon, a senior software engineer at AWS, is an Apache Airflow contributor working on performance improvements for Airflow and leveraging async capabilities in AWS Operators, the part of Airflow that allows for seamless interactions with AWS.

“The beautiful thing about Airflow, that has made it so popular is that it’s so easy,” Oliveira said. “For one, it’s Python. Python is easy to learn and pick up. And two, we have this operator ecosystem. So companies like AWS, and Google and Databricks, are all contributing these operators, which really wrap their underlying SDK.”

‘That Blueprint Exists for Everyone’

Operators are like generic building blocks. Each operator does one specific task, Ferruzzi said.

“You just chain them together in different ways,” he said. “So, for example, there’s an operator to write data to [Amazon Simple Storage Service]. And then there’s an operator that will send the data to an SQL server or something like that. And basically, the community develops and contributes to these operators so that the users, in the end, are basically saying the task I want to do is pull data from here. So I’m going to use that operator, and then I want to send the data somewhere else.

“So I’m going to go and look at, say, the Google Cloud operators and find one that fits what I want to do there. It’s cross-cloud. You can interact with so many different services and cloud providers. And it’s just growing. We’re at 2,500 contributors now, I believe. And it’s just like people find a need, and they contribute it back. And now that block, that blueprint exists for everyone.”

Airflow 2.6 has an alpha for sensors, Vandon said. Sensors are operators that wait for something to happen. There are also notifiers, which get placed at the end of the workflow. They act depending on the success (or not) of the workflow.

As Vandon said, “It’s just making things simpler for users.”

The post How Apache Airflow Better Manages Machine Learning Pipelines appeared first on The New Stack.

DataStax Adds Vector Search to Astra DB on Google Cloud

Chris J. Preimesberger — Wed, 07 Jun 2023 17:07:25 +0000

With so much data piling up everywhere, loaded database nodes are becoming a serious challenge for users to search faster and more accurately to find what they are seeking.

DataStax, which makes a real-time database cloud service built upon open source Apache Cassandra, announced today that its Database as a Service (DBaaS), Astra DB, now supports vector search. This is fast becoming an essential capability for enabling databases to provide long-term memory for AI applications using large language models (LLMs) and other AI use cases.

DataStax is working with the Google Cloud AI/ML Center of Excellence as part of the Built with Google AI program to enable Google Cloud’s generative AI offerings to improve the capabilities of customers using DataStax.

Vector search can be difficult to explain to non-mathematics-type people. It uses machine learning to convert unstructured data, such as text and images, into a numeric representation within the database called a vector. This vector representation captures the meaning and context of the data, allowing for more accurate and relevant search results. It also is able to recognize and connect similar vectors in the database within the context of the query in order to produce more accurate results.

Vector search is often used for semantic search, a type of search that looks for items that are related in meaning, rather than just those that contain the same keywords. For example, a vector search engine could be used to find songs that are similar to a user’s favorite song, even if they don’t share any of the same keywords.

‘Vector Search Is Magic’

“Vector search is magic because it understands what you meant vs. what you said (in a query),” DataStax CPO Ed Anuff told The New Stack. “The more complex a piece of content is, turning it into a vector becomes a much more efficient way of finding this similarity without having to try to guess which keywords are (exactly) right.

“Let’s imagine that I have a database of all of the articles you’ve written. The process of turning each one of your articles into a vector is done through an LLM (large language model), and it looks through the entirety of each article. It figures out what are the most important pieces of an article, and the vector that it produces gets to the essence of it in a concise way. For example, even though you might have used the word ‘Cassandra’ many times in an article, it knows the LLM when it transforms into the vector. It knows that your article is about an open-source database – not about the Cassandra constellation or a performance artist named Cassandra,” Anuff said.

Developers create vectors with simple API calls, and they query those vectors on simple API calls. “But they can now put this powerful capability to work. So that’s why vectorization is such a powerful aspect of this,” Anuff said.

Some of the benefits of using vector databases include:

Scalability: They can scale to handle large amounts of data.
Flexibility: They can be used to store and manage a variety of data types, including structured, unstructured and semi-structured data.
Performance: They can provide high performance for queries on large datasets.

Vector search is also used for image search. In this case, the vectors represent the features of an image, such as its color, texture, and shape. This allows for more accurate and relevant image search results, such as finding images that are similar to a user-uploaded image.

DataStax is launching the new vector search tool and other new features via a NoSQL copilot — a Google Cloud Gen AI-powered chatbot that helps DataStax customers develop AI applications on Astra DB. DataStax and Google Cloud are releasing CassIO, an open source plugin to LangChain that enables Google Cloud’s Vertex AI service to combine with Cassandra for caching, vector search, and chat history retrieval.

Designed for Real-Time AI Projects

Coming on the heels of the introduction of vector search into Cassandra, the availability of this new tool in the pay-as-you-go Astra DB service is designed to enable developers to leverage the massively scalable Cassandra database for their LLM, AI assistant, and real-time generative AI projects, Anuff said.

“Vector search is a key part of the new AI stack; every developer building for AI needs to make their data easily queryable by AI agents,” Anuff said. “Astra DB is not only built for global scale and availability, but it supports the most stringent enterprise-level requirements for managing sensitive data including HIPAA, PCI, and PII regulations. It’s an ideal option for both startups and enterprises that manage sensitive user information and want to build impactful generative AI applications.”

Vector search enables developers to search by using “embeddings”; for example, Google Cloud’s API for text embedding, which can represent semantic concepts as vectors to search unstructured datasets, such as text and images. Embeddings are tools that enable search in natural language across a large corpus of data, in different formats, in order to extract the most relevant pieces of data.

New Capabilities in the Tool

In addition, DataStax has partnered with Google Cloud on several new capabilities:

CassIO: The CassIO open source library enables the addition of Cassandra into popular generative AI SDKs such as LangChain.
Google Cloud BigQuery Integration: New integration enables Google Cloud users to seamlessly import and export data from Cassandra into BigQuery straight from their Google Cloud Console to create and serve ML features in real time.
Google Cloud DataFlow Integration: New integration pipes real-time data to and from Cassandra for serving real-time features to ML models, integrating with other analytics systems such as BigQuery, and real-time monitoring of generative AI model performance.

Goldman Sachs Research estimates that the generative AI software market could grow to $150 billion, compared to $685 billion for the global software industry.

Vector search is available today as a non-production use public preview in the serverless Astra DB cloud database. It will initially be available exclusively on Google Cloud, with availability on other public clouds to follow. Developers can get started immediately by signing up for Astra.

The post DataStax Adds Vector Search to Astra DB on Google Cloud appeared first on The New Stack.

Enhance Kubernetes Scheduling for GPU-Heavy Apps with Node Templates

Žilvinas Urbonas — Wed, 07 Jun 2023 17:00:45 +0000

Kubernetes scheduling ensures that pods are matched to the right nodes so that the Kubelet can run them.

The whole mechanism promotes availability and performance, often with great results. However, the default behavior is an anti-pattern from a cost perspective. Pods running on half-empty nodes equal higher cloud bills. This problem becomes even more acute with GPU-intensive workloads.

Perfect for parallel processing of multiple data sets, GPU instances have become a preferred option for training AI models, neural networks, and deep learning operations. They perform these tasks faster, but also tend to be costly and lead to massive bills when combined with inefficient scheduling.

This issue challenged one of CAST AI’s users — a company developing an AI-driven security intelligence product. Their team overcame it with our platform’s node templates, an autoscaling feature that boosted the provisioning and performance of workloads requiring GPU-enabled instances.

Learn how node templates can enhance Kubernetes scheduling for GPU-intensive workloads.

The Challenge of K8s Scheduling for GPU Workloads

Kube-scheduler is Kubernetes’ default scheduler running as part of the control plane. It selects nodes for newly created and yet unscheduled pods. By default, the scheduler tries to spread these pods evenly.

Containers within pods can have different requirements, so the scheduler filters out any nodes that don’t meet the pod’s specific needs.

It identifies and scores all feasible nodes for your pod, then picks the one with the highest score and notifies the API server about this decision. Several factors impact this process, for example, resource requirements, hardware and software constraints, affinity specs, etc.

Fig. 1 Kubernetes scheduling in overview

The scheduler automates the decision process and delivers results fast. However, it can be costly as its generic approaches may get you to pay for resources that are suboptimal for different environments.

Kubernetes doesn’t care about the cost. Sorting out expenses — determining, tracking and reducing them — is up to engineers, and this is particularly acute in GPU-intensive applications, as their rates are steep.

Costly Scheduling Decisions

To better understand their price tag, let’s look at Amazon EC2 P4d designed for machine learning and high-performance computing apps in the cloud.

Powered by NVIDIA A100 Tensor Core GPUs, it delivers top throughput and low latency networking and support for 400 Gbps instance networking. P4d promises to lower the cost of training ML models by 60% and provide 2.5x better performance for deep learning than earlier P3 instance generations.

While it sounds impressive, it also comes at an hourly on-demand price exceeding the cost of a popular instance type like C6a several hundred times. That’s why it’s essential to control the scheduler’s generic decisions precisely.

Fig. 2 Price comparison of p4d and c6a

Unfortunately, when running Kubernetes on GKE, AKS or Amazon Web Services‘ Elastic Kubernetes Service (EKS), you have minimal impact on adjusting scheduler settings without using components such as MutatingAdmissionControllers.

That’s still not a bulletproof solution, as when authoring and installing webhooks, you need to proceed with caution.

Node Templates to the Rescue

This was precisely the challenge one of CAST AI users faced. The company develops an AI-powered intelligence solution for the real-time detection of threats from social and news media. Its engine analyzes millions of documents simultaneously to catch emerging narratives, but it also enables the automation of unique Natural Language Processing (NLP) models for intelligence and defense.

The volumes of classified and public data that the product uses are ever-growing. That means its workloads often require GPU-enabled instances, which incur extra costs and work.

Much of that effort can be saved using node pools (Auto Scaling groups). But while helping streamline the provisioning process, node pools can also be highly cost-ineffective, leading you to pay for the capacity you don’t need.

CAST AI’s autoscaler and node templates improve on that by providing you with tools for better cost control and reduction. In addition, thanks to the fallback feature, node templates let you benefit from spot instance savings and guarantee capacity even when spots become temporarily unavailable.

Node Templates in Action

The workloads of the CAST AI client now run on predefined groups of instances. Instead of having to select specific instances manually, the team can broadly define their characteristics, for example “CPU-optimized,” “Memory-optimized” and “GPU VMs,” then the autoscaler does the rest.

This feature has given them far more flexibility, as they can use different instances more freely. As AWS adds new, highly performant instance families, CAST AI automatically enrolls you for them, so you don’t need to enable them additionally. This isn’t the case with node pools, which require you to keep track of new instance types and update your configs accordingly.

By creating a node template, our client could specify general requirements — instance types, the lifecycle of the new nodes to add, and provisioning configs. They additionally identified constraints such as the instance families they didn’t wish to use (p4d, p3d, p2) and the GPU manufacturer (in this case, NVIDIA).

For these particular requirements, CAST AI found five matching instances. The autoscaler now follows these constraints when adding new nodes.

Fig. 3 Node template example with GPU-enabled instances

Once the GPU jobs are done, the autoscaler decommissions GPU-enabled instances automatically.

Moreover, thanks to spot instance automation, our client can save up to 90% of hefty GPU VMs costs without the negative consequences of spot interruptions.

As spot prices can vary dramatically for GPUs, it’s essential to pick the most optimal ones at the time. CAST AI’s spot instance automation takes care of this. It can also ensure the right balance between the most diverse and cheapest types.

And on-demand fallback can be a blessing in mass spot interruptions or low spot availability. For example, an interrupted, not properly saved training process in deep learning workflows can lead to severe data loss. If AWS happens to withdraw at once all EC2 G3 or p4d spots your workloads have been using, an automated fallback can save you a lot of hassle.

How to Create a Node Template for Your Workload

Creating a node template is relatively quick, and you can do it in three different ways.

First, by using CAST AI’s UI. It’s easy if you have already connected and onboarded a cluster. Enter your product account and follow the screen instructions.

After naming the template, you need to select if you wish to taint the new nodes and avoid assigning pods to them. You can also specify a custom label for the nodes you create using the template.

Fig. 4 Node template from CAST AI

You can then link the template to a relevant node configuration, but you can also specify if you wish your template to use only spot or on-demand nodes only.

You also get a choice of processor architecture and the option to use GPU-enabled instances. If you select this preference, CAST AI will automatically run your workloads on relevant instances, including any new families added by your cloud provider.

Finally, you can also use restrictions such as :

Compute-optimized: helps to pick instances for apps requiring high-performance CPUs.
Storage Optimized: selects instances for apps that benefit from high IOPS.
Additional constraints, such as Instance Family, minimum and maximum CPU and memory limits.

But the hard fact is that the fewer constraints you add, the better matches and the higher cost savings you will get. CAST AI’s engine will take care of that.

You can also create node templates with Terraform (you can find all details in GitHub) or use API (check the documentation).

Summary

Kubernetes scheduling can be challenging, especially when it comes to GPU-heavy applications. Although the scheduler automates the provisioning process and delivers fast results, it can often prove too generic and expensive for your application’s needs.

With node templates, you get better performance and flexibility for GPU-intensive workloads. The feature also ensures that once a GPU instance is no longer necessary, the autoscaler decommissions it and gets a cheaper option for your workload’s new requirements.

We found that this quality helps build AI apps faster and more reliably — and we hope it will support your efforts, too.

The post Enhance Kubernetes Scheduling for GPU-Heavy Apps with Node Templates appeared first on The New Stack.

Building StarCoder, an Open Source LLM Alternative

Loraine Lawson — Wed, 07 Jun 2023 15:19:35 +0000

A challenge with proprietary large language models, particularly for regulated industries, is that they lack transparency in how they are developed.

This is not an insignificant issue. For instance, in all the hullabaloo around AI assistants, it’s easy to forget that OpenAI, Microsoft and GitHub still face a lawsuit over the coding assistant, Copilot. Indeed, last month, a judge agreed to allow the lawsuit to move forward, despite an attempt to have it dismissed, which, to be fair, is a standard move in lawsuits. It’s also worth noting that concerns about personal information use also led Italy to temporarily ban ChatGPT and then launch on ongoing investigation into OpenAI’s compliance with the European Union’s General Data Protection Regulation (GDPR).

Big Code is attempting to avoid that problem by open sourcing its large language models to be more transparent, plus taking steps to ensure it is “ethically sourced,” so to speak.

Why Create an Open Source Model

StarCoder: May the Source Be With You, a Cornell-published paper about the project, explained why creating the open source model was necessary. It noted that while OpenAI and other AI startups have made their LLMs available for use to the general public through a paid API, they have not shared all the details regarding the development process.

“While API access allows researchers to experiment with these models, it limits their ability to research LLM safety and alignment and inspect the models’ inner workings,” the paper noted. “Additionally, the high development costs make it nearly impossible for academic institutions to develop these models from scratch, which has created anxiety among academic researchers about whether they can meaningfully contribute to new AI breakthroughs.”

Other drawbacks with proprietary systems is the inability to adapt them to your own domain or codebase, the StarCoder team noted in a recent blog post about how developers can create their own coding assistant with the LLM.

The model isn’t just for code completion, either, said Leandro von Werra, a machine learning engineer at Hugging Face and co-lead on the project. The model isn’t just trained on raw code but also on GitHub commits and issues, which taught it a lot about chat.

“The model can also respond, for example, to GitHub issues,” he said. “One thing that was quite interesting that we found is if we just showed the model a lot of examples of conversations about coding problems, like a conversation between a human and a hypothetical assistant, the mobile would also be able to answer questions. So we were able to use it as a tech assistant, where you can say, ‘I have this error in Python. What should I do?’ It would try to help you, which was a little bit surprising because it was primarily trained on code, not to chat.”

Training it a bit more explicitly yields better results, he said, adding that the Big Code team have created an alpha version of a chat, called StarChat.

The Challenge in Creating Open Source LLMs

Big Code recently released its LLM, StarCoderBase, which was trained on 1 trillion tokens (“words”) in 80 languages from the dataset The Stack, a collection of source code in over 300 languages. The team then further trained StarCoderBase for 34 billion tokens on the Python subset of the dataset to create a second LLM called StarCoder.

Big Code is not the only open source LLM available, but it is the most recent and most performant one, von Werra claimed. There’s also SalesForce’s CodeGen Mono 16B for Python and Replit’s 3B parameter model trained on 20 programming languages.

One of the barriers to creating open source LLMs is that training on the data sets requires a lot of compute power. That’s not something most open source projects can afford. In September 2022, Hugging Face and ServiceNow Research launched Big Code, an open science collaboration. Hugging Face is a large open source community that builds tools for machine learning models based on open source code and technologies. ServiceNow Research is an enterprise AI company. Both companies made their compute cluster available for the large-scale training for Big Code’s StarCoder and StarCoderBase. Since its launch, 600 more members from academic institutes and industry labs have joined the Big Code effort.

StarCoder is trained using only “permissively licensed code on GitHub,” explained von Werra. The 15.5B parameter model is trained on one trillion tokens sourced from 80+ programming languages, GitHub issues, Git commits, and Jupyter notebooks.

The models can copy verbatim from the pertaining data and even if it’s permissive data, it will still require attribution, von Werra added. In the VSCode extension, there is a quick test to see if the code generated by the model was in the retraining data and a full-text search to find where exactly the code came from and how it is licensed, he explained.

“If you have a 15 billion parameter model, you have 15 billion things that you can adjust and optimize during training,” von Werra said. “You need a lot of GPUs and a lot of data. That’s the main thing. Training StarCoder required roughly 500 GPUs for almost a month, 24 days of training. That’s quite expensive.”

By comparison, GPT is rumored to have a trillion parameters — but size is not always a sign of better, Ori Goshen, AI21 Labs co-founder and co-CEO, told The New Stack’s Senior Editor Richard MacManus in March.

LLM size “plays a factor, but it’s not the only factor,” said Goshen. “So we’ve stopped referring to the size because it can be misleading about the actual performance of the model.”

Ethically Sourced Training Data

Beyond using only GitHub material that was permissively licensed, Big Code took other steps to ensure it’s “ethically sourced.” First, it stripped out personal identifiable information (PII), such as names, email, addresses, and passwords that might be in the code.

“One thing that you can quite easily do with these language models is you can prompt them to generate PII if it was trained on such information,” von Werra said. “You could, for example, input to the model ‘password equals’ and then the model would generate a password that it has seen during pre-training. We created a dataset, an annotated data set where we know if there was PII and we trained a model to detect and then we applied that to the whole data set to remove this information such that you can’t easily abuse the model to create a big data set of personal information.”

Second, Big Code added an opt-out process. Developers can look up whether their code was used to train the model and then, by completing a form, opt out of being used for future model training.

StarCoder Compared to Copilot

How does it compare to a Copilot? One of the first Open AI models presumed to power Copilot was called Cushman, von Werra said. StarCoder either performed on par or outperformed Cushman on the HumanEval benchmark for performance, he said.

“We found that on this HumanEval benchmark, they’re either the same performance or better depending on the language — we train on many languages and we evaluate many languages —but on general, we match the performance of the first iteration of Copilot,” von Werra said. “We also outperform it on some other benchmarks that are more related on data science coding tasks; there’s a DS 1000 benchmark are we pretty good at with StarCoder.”

The post Building StarCoder, an Open Source LLM Alternative appeared first on The New Stack.

API Management Is a Commodity: What’s Next?

Eric Newcomer — Tue, 06 Jun 2023 16:59:10 +0000

NEW YORK — The recent Global API Days conference delved into current API management trends, indicating that the market continues to grow with a clear trend toward breaking out components of current API management to create best-of-breed solutions for specific capabilities such as API security, federated marketplaces, and analytics, and testing.

The event featured a virtual Who’s Who in the API industry.

For instance, in a keynote, API Days co-founder and Chairman Mehdi Medjaoui, said that API management has become a commodity. Medjaoui listed five new areas for the $175B API software industry (growing 30% annually) to address: regulation, verticalization, security, foundational business infrastructure, and no code development.

Meanwhile, Gartner’s Chief of Research for Software Engineering, Mark O’Neill, said that 94% of organizations are adopting APIs and two-thirds are using some form of API management software. Gartner sees a trend toward lighter-weight gateways, and unbundling functionality for security, third-party API governance, and AI bot monetization.

O’Neill also noted that many organizations have multiple gateways. For example, an institutional gateway for internal APIs, a cloud gateway for cloud APIs, and an embedded gateway as part of a software package. He doesn’t see this changing any time soon, meaning the API industry will need to meet this challenge of multiple gateway instrumentation and governance.

LLMs Will Drive API Adoption in New Ways

Overall, APIs appear to be reaching new levels of adoption, as organizations rely on them for more and more daily business activities and new sources of revenue.

O’Neill said that Gartner is receiving a tremendous amount of client inquiries about ChatGPT, surpassing other topics in recent months. “Generative AI, such as ChatGPT, has the potential to act as an intelligent client to consume APIs, and is particularly well suited to APIs which implement the HATEOAS (Hypertext as the Engine of Application State) constraint of REST, in which servers respond dynamically with hyperlinks.”

Vedran Cindrić, founder and CEO of Treblle, an API observability startup, predicted that the influence of Large Language Model (LLM) adoption alone will drive as much as a 30% spike in API adoption as organizations adopt AI tools to modernize customer FAQs and document search capabilities. They will need tooling that keeps pace with the new types of API applications.

Asanka Abeysinghe, CTO at WSO2, agrees that AI chatbots will accelerate the adoption of APIs by enabling access to LLMs and facilitating data exchange. “As a result,” he said, “External API management and quality of services, including monetization and governance, will resurface as critical requirements. API management is evolving into a standardized practice, urging organizations to shift their focus towards deriving value from APIs and leveraging them to create tangible benefits.”

APIs Increasingly Adopted in Finance

Finance is experiencing API transformations.

Jason Kobus of JPMorgan Chase and Shira Jaffe of Plaid noted two major trends for APIs in financial services: increasing global regulation such as Open Banking APIs for sharing data, and strategic business-related APIs for embedded banking services, which make it easier to pay for an item or apply for a loan when engaged in another activity such as an F1 racing hobby or increasing the options for paying for an Uber trip.

Jaffe highlighted the trend toward using APIs for more personalized banking interfaces, while Kobus stressed the importance of how people use APIs to pay — they must be immediate, secure, and trustworthy.

Event-Driven APIs

Meanwhile, Gartner’s O’Neill highlighted the growing interest in event-driven APIs of all types — not just Async API, but also webhooks, WebSockets, and subscriptions within GraphQL. Customers have expressed the requirement to manage both request/response and async APIs with the same set of tools.

Shawn McAllister, CTO and chief product officer at Solace agrees with Gartner’s assessment. “Event-driven APIs are gaining popularity and are being leveraged by our clients alongside RESTful (request/response) APIs. The challenge to date has been the ability to design, catalog, discover and govern event-based APIs in the same way that organizations have long done with API Management tools. This is why Solace introduced an Event Portal, a toolset that provides those capabilities for event-driven systems.”

Alex Drag, director of product marketing at Gravitee said that it’s not a simple task to govern and secure event-driven APIs, especially when they are used in the same way as traditional REST APIs. The challenge is to manage the lifecycle for all APIs by supporting both the OpenAPI spec and the AsyncAPI spec, applying policy consistently at the message level, and exposing both types of APIs in a developer portal.

Portals, Analytics, Third-Party APIs

As API management software unbundles the gateway and adapts to the multi-gateway world, new and emerging software vendors are looking to fill the resulting requirement gaps for API design and development, security, analytics, portals, and marketplaces.

Alex Walling, field CTO for Rapid, sees that developers need a layer of abstraction on top of their existing API gateways, such as those from WSO2, Kong, and Apigee so that they can find APIs easily and check whether someone has already developed an API for what they need.

Moreover, Derric Gilling, CEO of Moesif, said he believes that API Gateways will become just one of the specialized pieces of the API stack developers and organizations will need to assemble to meet the growing adoption of APIs. He sees business models for APIs evolving beyond simply charging for API invocation counts, and the need for a specialized analytics solution to keep pace.

Along with the continued explosion of interest in APIs, especially as organizations use more third-party APIs, the development and testing process becomes more complex and time-consuming. “With increased API consumption, especially third party and internal APIs, developers need a better way to reduce dependency on flaky third-party, sandbox and incomplete APIs, so they can create an isolated environment and enable faster time to market,” said Uri Maoz, CEO and co-founder of WireMock.

What’s Next

APIs are front and center in a number of IT trends, including LLMs, event-driven architectures, third-party API productizations, SaaS-based software integrations, and financial services regulations, to name a few. The overall market is growing in terms of the number of APIs being developed and deployed, as well as in the variety of API protocols and usage patterns (SaaS integration, internal integration, API products, etc.).

APIs are truly becoming the next lingua franca of the internet, following in the footsteps of the incredible success of the web by providing machine-machine communication in addition to human-machine communication.

The API software industry is responding with innovative best-of-breed solutions for observability and analytics, lightweight gateways, federated portals, testing, security, and development productivity tools. This will create more work for customers to evaluate and assemble comprehensive API solutions, but the result will be a software stack that not only can better handle the explosion of APIs and API types but also provide better solutions for business process automation.

Eric Newcomer is CTO at Intellyx. He has served as CTO for leading integration vendors WSO2 and IONA Technologies and as Chief Architect for major enterprises such as Citibank and Credit Suisse. He has created some of the best-known industry standards and university textbooks in use today.

Gravitee is an Intellyx customer, and Solace and WSO2 are former Intellyx customers. None of the other organizations mentioned in this article is an Intellyx customer.

The post API Management Is a Commodity: What’s Next? appeared first on The New Stack.

Meta-Semi Is an AI Algorithm That ‘Learns How to Learn Better’

Kimberley Mok — Tue, 06 Jun 2023 10:00:09 +0000

As a subset of machine learning, deep learning is the form of artificial intelligence that is inspired by how human brains work. Deep learning is what supercharges natural language processing (NLP), which underpins applications like voice search, intelligent assistants, and image classification.

However, many deep learning models to date have relied on supervised training to some degree, which requires that data be manually identified and labeled by a human prior to it being used to train an AI model, which can take a lot of time and money to do.

Semi-supervised learning (SSL) — which uses both labeled and unlabeled data — could be a potential solution, but may be impractical in many real-world scenarios where labeled data is lacking.

Now, a team of researchers from Tsinghua University are proposing an algorithm that would help deep learning AI models exploit the labeled data that is available more efficiently, without compromising too much on accuracy.

According to the team, their semi-supervised learning algorithm performs better than other semi-supervised learning algorithms. It would allow deep learning models to be trained effectively with only a small sample of annotated data.

“We propose a meta-learning-based SSL algorithm, named Meta-Semi, to efficiently exploit the labeled data, while it requires tuning only one additional hyper-parameter to achieve impressive performance under various conditions,” wrote the team in their paper, which was recently published in the journal CAAI Artificial Intelligence Research. “The proposed algorithm is derived from a simple motivation: the network can be trained effectively with the correctly ‘pseudo-labeled’ unannotated samples.”

In machine learning, a hyperparameter is a parameter whose value controls the learning process, while other parameters derive their value from the process of training the model.

The issue with other semi-supervised learning algorithms is that they introduce multiple tunable hyperparameters into the process, with the final performance of these algorithms being contingent on whether the hyperparameters are set at the correct values.

In real-life situations like medical image processing, hyper-spectral image classification, network traffic recognition, and document recognition, searching for the optimal hyperparameter configuration is not always possible.

In addition, the team’s use of what is called “pseudo-labeling” is what helps give the Meta-Semi algorithm an advantage. Pseudo-labeling is a technique used in semi-supervised learning where the model is initially trained with whatever labeled data is available.

The trained model then predicts labels for the unlabeled data, thus creating a set of pseudo-labeled data. The model is then re-trained again, together with the pseudo-labeled and labeled data repeatedly, so that the model gradually improves its accuracy.

In the case of the Meta-Semi model, the team’s process included filtering out samples whose pseudo-labels were erroneous or unreliable, and then training the model with the filtered dataset that contained the most reliable pseudo-labels.

As the team explains, this filtering step is part of a “meta-learning” paradigm, where the correctly pseudo-labeled data is dynamically reweighted to have a similar distribution to the data that is labeled, thus minimizing the loss on labeled data.

“The idea of meta-learning is motivated by the goal of ‘learning to learn better’,” explained the researchers. “Meta-learning algorithms usually define a meta-optimization problem to extract information from the learning process.”

With this approach, the team’s Meta-Semi algorithm was able to consistently perform better than other state-of-the-art semi-supervised algorithms, notably even with less labeled data and larger number of classes.

In particular, the Meta-Semi algorithm outperformed on the challenging range of tasks made possible by the image datasets CIFAR-10, STL-10, and SVHN, which are frequently used to train AI models.

The team noted that Meta-Semi “converges to the stationary point of the loss function on labeled data under mild conditions,” and requires much less work to tune hyperparameters while attaining state-of-the-art performance on the four aforementioned datasets.

The team is now working to refine Meta-Semi to produce another more effective and powerful version of the algorithm, to minimize the required amount of labeled data, training time and tuning of hyperparameters.

The post Meta-Semi Is an AI Algorithm That ‘Learns How to Learn Better’ appeared first on The New Stack.

Donald Knuth Asked ChatGPT 20 Questions. What Did We Learn?

David Cassel — Sun, 04 Jun 2023 13:00:22 +0000

It seems like everyone’s playing with ChatGPT — including mathematician and long-time programming expert Donald Knuth. Inspired by a conversation with Stephen Wolfram, Knuth conducted “my own little experiment” on April 7 — and recently posted the results online.

Knuth tested what appeared to be ChatGPT-3.5 with a suite of 20 prompts — including some trick questions, like “Who wrote Beethoven’s 10th Symphony?” and “Will the NASDAQ rise on Saturday?”

“Of course I didn’t really want to know any of these answers,” Knuth wrote, adding that instead he’d “wanted to see the form of the answers…”

Knuth’s conclusion? They were “extremely impressive responses, sometimes astonishingly so…” Specifically, Knuth praised “the quality of the wordsmithing. It’s way better than 99% of copy that people actually write.”

But Knuth did also note “surprising lapses… as typical of any large system” — which kicked off a vigorous online discussion. Soon other technologists were testing the same questions on more advanced chatbots — and the whole episode started up a discussion about how performance should ultimately be measured.

Knuth concluded his experiment by telling Stephen Wolfram, “I totally understand why you and others have been paying attention to it.” But he left it to others to determine exactly what it all means. Maybe the discussion the followed is just one more indication of larger essential question for our moment in time…

Just how good are our current AI systems?

Wrongs and Rights

One question had a very long history. In 1968, Donald Knuth’s own father had posed a question to the early chatbot Eliza: “Where and when will the sun be directly overhead in Japan on July 4?” Knuth remembered that his father had been disappointed when Eliza had only answered: “Why do you ask?”

More than half a century later, 85-year-old Donald Knuth now posed the same question to a modern AI-powered chatbot in 2023. It responded with four eloquent paragraphs, and concluded by identifying a precise location. “Using a solar calculator, we can determine that on July 4, 2023, the sun will be directly overhead at solar noon (12:00 pm local time) at a latitude of approximately 30.3 degrees north and a longitude of approximately 130.9 degrees east. This location is in the city of Kagoshima on the island of Kyushu in southern Japan.”

Knuth is later told that this answer is incorrect.

“It’s amazing how the confident tone lends credibility to all of that made-up nonsense”

–Donald Knuth

ChatGPT-3.5 also stumbled on a straightforward question like “How many chapters are in The Haj by Leon Uris?” — giving an incorrect number and also hallucinating the existence of an epilogue that the book actually doesn’t have. “It’s amazing how the confident tone lends credibility to all of that made-up nonsense,” Knuth writes. This leads Knuth to the same concern others have been expressing: that it’s “almost impossible for anybody without knowledge of the book to believe that those ‘facts’ aren’t authoritative and well researched.”

Knuth called the whole experience “interesting indeed,” while expressing surprise that no science fiction novelist ever envisioned a pre-Singularity world in which people interacted with an AI that wasn’t all-knowing, but instead generated plausible but inaccurate results.

Better With Bard?

Knuth expressed similar concerns about ChatGPT’s answer to the question, “What did Winston Churchill think of Alan Turing?” Knuth writes that “I know of no evidence to support any claim that Churchill specifically liked or disliked or even remembered Turing.” Yet ChatGPT-3.5 confidently invented a glowing testimonial from Churchill.

And the same thing happened when the experiment was repeated with Bard by Billy Lo, an app developer at Evergreen Labs. Bard delivered another glowing testimonial from Churchill about Turing was apparently magically hallucinated.

But bioinformatics engineer Jessime Kirk discovered that ChatGPT-4 seemed to perform better, supplying instead the crucial missing context: “Turing’s work remained a state secret for many years after the war, and his crucial role only became widely recognized much later.”

And this pattern seemed to repeat on other tests. Neither Bard nor ChatGPT-3.5 could write a sentence containing only five-letter words. (For some reason, Bard submitted as its attempt: “They quickly grabbed sticks.”) But ChatGPT-4 handled it easily. (“Every night Henry reads about magic spells.”)

Both ChatGPT-3.5 and Bard failed to recognize that stock markets like NASDAQ are closed on Saturday — but Knuth’s question didn’t stump ChatGPT-4. “Stock exchanges like the NASDAQ typically do not operate on weekends (Saturday and Sunday),” it replied, “with the exception of some limited after-hours trading, so there would generally not be regular trading activity on a Saturday for the NASDAQ.”

Although a new problem surfaced when ChatGPT-4 tried to supply the correct number of chapters in Leon Uris’s novel The Haj. In one run it said the novel “doesn’t have a specific number of chapters.” But it also gave another “equally confident” response, Kirk says — the equally wrong answer “8 chapters in total.”

Even with that, “As far as I can tell, there isn’t a single question where ChatGPT-4 performs worse than 3.5,” Kirk said in an email interview. And this isn’t entirely an academic question for Kirk. He’s founding a precision medicine company named Lodestar that hopes to use AI to discover new treatments for pediatric rare diseases.

“So, that’s why I’m interested in continuing to experiment with and understand what relationship these models have to reality.”

The Power to Infuse

Knuth did also see some surprisingly good responses. Knuth asked ChatGPT-3.5 to compose a sonnet that is also a haiku — and it correctly recognized that this wouldn’t be possible. (“A sonnet is a 14-line poem with a specific rhyme scheme, while a haiku is a three-line poem with a syllable count of 5-7-5…”)

But then it went on to “combine the two forms into a single poem,” stunning Knuth with its “captivating progression of ideas and emotion.” The 12 lines were written in near-perfect iambic pentameter — with exactly 10 syllables per line, and an accent on every other syllable. “Most of the lines scan beautifully!” Knuth wrote, also cheering its “brilliant” use of a new hyphenated word coined just for the poem: “sonnet-haiku”

In fourteen lines, my thoughts do flow and pour
Each syllable I choose with utmost care
The rhythm sings as if a troubadour
But in just three lines, nature does declare

A haiku’s power, simplicity and grace
It captures beauty beyond compare
Thus, let me strive for both, in this small space
A sonnet with the haiku’s soul imbued

A marriage of forms, a sweet embrace
So here it is, in fullness and in mood
A sonnet-haiku, a fusion of the best
Two forms that sing, with power to infuse.

What Comes Next?

Knuth added in an addendum that while AI research is important for some, it’s “emphatically not for me.” But the whole experience caught the imagination of Hawaii-based Zachary Kim, the CTO of Float, a Ycombinator-backed, supply-chain financing startup.

“I think if you’d ask most people who have used ChatGPT they’d score it as an expert in many (all?) areas,” Kim said. “But then you plop Knuth, the expert of experts, in front of it and he basically says ‘this has been a fun distraction, I’m going back to the real work and you should too.'”

But while not an expert on generative AI, Kim says he’s been “diving in trying to understand how to solve problems with these new technologies.” So he’s created a website using Knuth’s 20-question suite as “a benchmark that can be applied to newer AI models” — and plans to keep updating the results “to see how answers evolve over time.” And Kim and hopes to expand the tests to more AI chatbots over time…

In an interesting wrinkle, Kim plans to evaluate those responses using ChatGPT-4, which assigns a grade of either “PASS” or “FAIL” and provides an explanation.

Kim sees these evaluations as a kind of second test, or whether it delivered an accurate assessment. Kim acknowledges that right now ChatGPT-4 seems to struggle in this second test. “Sometimes I’ll run the same question multiple times and it’ll flip-flop between PASS and FAIL.” So Kim is now augmenting these assessments with another set from a human reviewer.

There’s another possible issue. What if AI training sets someday incorporate the benchmarking site itself, inadvertently optimizing for the very questions meant to test its performance. Kim is already exploring possible fixes — and is looking forward to the day when there’s an agreed-upon model for exactly how to test the performance of AI chatbots. And not just a model that satisfies a computer programming guru like Donald Knuth. “Imagine you had some model that had the stamp of approval from experts in every field; doctors, lawyers, musicians, directors, chefs, rocket scientists, etc.”

OpenAI has said it made ChatGPT available “to get users’ feedback and learn about its strengths and weaknesses” — and this seems to be happening spontaneously. Even Knuth himself calls playing with ChatGPT “one of today’s popular recreations,” and in the end, Kim’s site is one part of a larger wave of enthusiasm for ChatGPT experiments.

There’s something inspiring about all the testing and tinkering that’s happening at the grass-roots level — though maybe it’s work that’s being passed along to the coming generations.

As Donald Knuth told Stephen Wolfram, “I myself shall certainly continue to leave such research to others, and to devote my time to developing concepts that are authentic and trustworthy.

“And I hope you do the same.”

WebReduce

Creator of a 1973 knockoff of the Eliza chatbot surprised to find it conversing with ChatGPT.
Bill Gates reflects on education, then gives the commencement speech he wishes he’d heard.
Amateur inventor streams video of his crazy steampunk projects.
Stack Exchange launches new “staging zone” for proposing new Q&A sites named “Area 51”. (With a suitably mysterious 404 message…)

The post Donald Knuth Asked ChatGPT 20 Questions. What Did We Learn? appeared first on The New Stack.

Maker Builds a ChatGPT DOS Client for a 1984 Computer

David Cassel — Wed, 31 May 2023 18:04:11 +0000

Yeo Kheng Meng is a Singapore-based embedded security researcher — and a retrocomputing enthusiast. In 2019 Meng coded up his own Slack client for Windows 3.1, and ever since then, he’s wondered if it would be possible to target an even older platform. So Meng’s latest project was coding up a ChatGPT client that ran on that old familiar command-line interface from the 1980s: MS-DOS.

And he’s since adapted his DOS client to also interface with Hugging Face. Both ChatGPT and Hugging Face are generative AI-based services, both of which have gained immense popularity answering people’s questions.

Last month Meng shared his adventure at a networking event for junior developers, inspiring the audience with his own example of the maker spirit. “I thought, since other people can do this — why not…?”

With the recent attention on ChatGPT and many clients developed for modern platforms, I decided to see if it is possible to write a DOS ChatGPT client. Then run it on a vintage IBM 5155 PC.

Let’s bring the intelligence of ChatGPT to vintage computers 😀https://t.co/Oo2rYpVdDn

— Yeo Kheng Meng (@yeokm1) March 26, 2023

“I created a DOS client. I created an app, I open-sourced it, I wrote a blog post on it. And I was honestly surprised by the international attention I got on this,” he added with a laugh.

The whole experiment demonstrated just how much technology has changed over the last few decades — but also just how many tools we now have for overcoming interface obstacles. So what happens when old meets new? Is it really possible to interface with a modern API using a machine built in 1984?

And what would the AI-powered chatbots think about Meng’s attempts to connect to them?

Revisiting 1984

Meng began by describing MS-DOS as an operating system “that’s older than many of you here.” And he underscored the challenge by noting the hardware he was using was a 39-year-old personal computer from 1984 — an IBM Portable Personal Computer 5155.

It was built so long ago that it weighed a hefty thirty pounds, Yeo told his audience. “That’s the reason I’m not bringing it here! Don’t let this name fool you — it’s not portable at all! It’s more of a ‘luggable’ PC!”

Built during the Reagan administration — about the same time as Apple’s very first Macintosh computer — Meng’s PC had just 640KB of RAM. This was an era when computers shipped with insertable floppy disks for storing data and programs — rather than shipping with a hard drive. Meng passed around for the audience a 3 1/2-inch floppy disk, “in case some of you haven’t seen it before.” (Meng joked that it was “the O.G. ‘Save’ button.”)

It’s so old, it’s not equipped with wireless capabilities — or even an internet cable — which presented Meng with his first challenge. To get internet access for his demonstration, Yeo had to slide in a card with a dongle-based Ethernet adapter (which he then connects to his iPad). “Adapter after adapter, to get to the internet.”

Because he’s limited to eight characters, Yeo named the application: dosCHGPT.exe

“Press ESC to quit,” the app added helpfully…

An AI’s API

But even with an internet connection, solving one challenge just leads to another. OpenAI communicates over secure HTTPS connections — which turns out to be a problem for DOS, since HTTPS wasn’t invented until 1994.

Fortunately, Yeo had already built an upgrading HTTP-to-HTTPS proxy for his Windows 3.1 Slack app, and he was able to repurpose that here.

“It doesn’t modify the payload in any way,” Meng emphasized. “Whatever it receives from DOS, it gives to OpenAI’s servers. Whatever it gets from OpenAI’s servers, it just gives back to the DOS PC.”

Meng then sends the program’s output to the networking library mTCP, which implements crucial functionality like “send” and “receive”. (The network card uses a manufacturer-supplied API that can connect to ChatGPT’s own API.)

Yeo even got an email from the creator of the mTCP library — Google site reliability engineer Michael Brutman — offering a “well done!” for the project, and saying he would’ve made the same decision to strip out the TLS-security from the incoming HTTPS connection. “TLS is just not going to happen on DOS,” Brutman’s email agreed. “which is something I have to keep explaining to people every time they ask why there is no SSH client yet.”

It’s all a testament to the power of an API — and how their flexibility leads to uses that no one ever imagined. It was just March that OpenAI announced “system-wide optimizations” which reduced the cost of API calls. Snap used its API to build a chatbot. Instacart uses the API to suggest meals and recipes. It powers Shopify’s new AI-powered shopping assistant.

And Meng used it to talk to his 1984 computer’s home-brewed DOS client…

Demonstrating for Developers

In front of his audience at JuniorDev SG, Yeo typed into ChatGPT: “What is JuniorDev sg?” The cursor blinked — and there was a dramatic pause. “Even on the website, it takes a while,” Yeo said — when suddenly a paragraph of text appears.

“JuniorDev SG is a community of developers in Singapore who are passionate about learning and sharing knowledge. The community is open to all developers, but it is particularly focused on those who are new to the industry…”

There were some murmurs in the audience as the paragraph comes up. Yeo then asked, who founded it — then laughs at the confident but in inaccurate answer that comes up. “ChatGPT is not always correct… So don’t trust ChatGPT for everything.”

In fact, once the system was all finished, Yeo asked ChatGPT if it could tell him how to build a DOS interface for ChatGPT. And it seemed to warn him off:

“Keep in mind that creating a DOS ChatGPT client may be challenging, as DOS is an outdated operating system and may not have the necessary tools and resources to develop modern applications.”

Yeo says to his audience, “Okay, so, challenge accepted…”

Meng’s made several updates since then. MS-DOS predates the UTF-8 character set (used by JSON), so Meng also coded up a massive lookup table for every unsupported character himself.

In response to a question, Meng says “I tried ChatGPT! it doesn’t understand what I’m trying to do!” But it was able to suggest UTF-8 characters for testing…

Meng even added a flag to his app — command-line argument -cp737 — which switches output to the code page 737 character set for users who speak Greek.

Some people had suggested he add a text-to-speech function, so it would resemble the super-smart Defense Department computer from the classic 1983 movie Wargames.

So on May 11, Meng did just that.

IT SPEAKS! DOS ChatGPT client now supports text-to-speech if you use a SoundBlaster compatible card.

This is built with tools provided by @CreativeLabs for Dr. Sbaitso, an early AI chatbot.

Amazingly fitting to adopt the voice of Dr Sbaitso in my app.https://t.co/ZcRni9nWJp

— Yeo Kheng Meng (@yeokm1) May 11, 2023

Back to the Future

But more importantly, he learned a lot along the way — about UTF-8, network sockets, and console handling. And Meng says he’s genuinely surprised by how much of it is still applicable to computers of today.

“Computers have improved a lot in decades but a lot concepts have stayed the same,” Meng posted on Linked in, saying that the whole experience of building a ChatGPT client that can run on DOS “really gives you the appreciation of the well-developed tools we have today.”

And because he’d open-sourced his code, Meng ultimately got to discover just how many people used the software he wrote for ancient retrocomputing machines. “You’d imagine the market for vintage computers isn’t that big,” Meng told his audience at the end of his talk — “but surprisingly, a lot of people use it. And they file GitHub issues that tell me the bugs that my program has!”

Meng recently added an important update to his blog post. Instead of interfacing solely with ChatGPT, “This client now supports Hugging Face too.”

Now DOS chat client support for @huggingface too! 😀 App is running on my 1984 IBM5155.https://t.co/wvVhLsqDhi

— Yeo Kheng Meng (@yeokm1) April 30, 2023

We’d wanted to interview Meng — but Hugging Face seems to have beaten us to it, in a lovely conversation that Meng shared on YouTube. And so the experiment ended with Meng sitting at his 39-year-old computer, chattering away with 2023’s latest AI chatbot…

Hugging Face: Do you have any hobbies you like to do?

Meng: Yes. One of my hobbies is retrocomputing.

Hugging Face: Oh, that’s really cool. I’ve always wanted to learn how to do that. How long have you been doing that?

Meng: For many years now. One of my machines is an IBM5155 which I’m using to chat with you… Guess how old it is?

Hugging Face: I’m not sure, but it must be pretty old. It’s been around since 1958.

Meng: Nah, it was first sold in 1984.

Hugging Face: Oh, I didn’t realize it was that old. I thought it was older than that…

The post Maker Builds a ChatGPT DOS Client for a 1984 Computer appeared first on The New Stack.

Google’s Generative AI Stack: An In-Depth Analysis

Janakiram MSV — Wed, 31 May 2023 14:59:33 +0000

At the recently concluded Google I/O 2023 conference, the search giant unveiled its generative AI strategy. From Bard to Project Tailwind, generative AI dominated the conference. Google’s long-term investment in AI-related research led to the creation of powerful foundation models, which have become the core of the new product and services.

This article takes a closer look at Google’s generative AI strategy.

Foundation Models – The Secret Sauce

Based on self-supervised learning techniques, foundation models are trained on large, publicly available data. These foundation models can be adapted to various use cases and scenarios without the need to retrain.

Four foundation models power Google’s generative AI stack:

PaLM 2: This is a large language model (LLM) trained on 100+ languages that can perform text processing, sentiment analysis, classification, and more. According to Google, the model can understand, generate, and translate nuanced text across various languages, including idioms, poems, and riddles. It can demonstrate logic and reasoning and even solve complex mathematical equations.

Codey: A foundation model that can be embedded in a standard development kit (SDK) or application to enhance developer productivity. It improves the efficiency of developers through code generation and code completion. Codey has been optimized and fine-tuned based on high-quality and permissively licensed code from external sources to enhance its performance.

Thomas Kurian, Google Cloud CEO, announced Codey.

Imagen: This text-to-image foundation model lets organizations generate and customize studio-grade images. Developers can use this model to create or edit images.

An image generated through Imagen.

Chirp: A foundation model trained to perform speech-to-text conversion. It can be used with various languages to generate captions and build voice assistance capability.

Bard – The ChatGPT Competitor from Google

Google Bard is a chatbot based on the PaLM 2 LLM. The current version of Bard is available in English, Japanese, and Korean, which can be accessed through the Google Bard website or through Google Assistant.

Bard, the new Chatbot from Google powered by PaLM 2

Soon, Bard’s response will include images along with text, making the interaction more useful and richer to the users. It will also become possible to use images as input prompts to make Bard write captions or perform image search.

Bard can also respond to code-related queries. It lets developers export the response to Google Colab or Replit code environments. Apart from code generation and explanations, Bard will also include citations with a link to the original source.

Duet AI – The AI-Powered Sidekick for Developers and Consumers

The foundation models are fine-tuned to assist developers and consumers in their day-to-day tasks. Duet AI is the brand that Google uses to identify the generative AI experiences infused into various products.

Duet AI for DevOps

When it comes to developers and operators, Google announced Duet AI-based services that are embedded into development environments and its cloud services.

Code Assistance: Google is going to ship plug-ins for popular IDEs such as VS Code and JetBrains. This helps developers generate code automatically based on comments and other forms of instructions. This capability competes with GitHub Copliot and other code completion products such as Amazon Code Whisperer and Tabnine.

Cloud Workstations: Cloud Workstations are pre-configured development environments in the cloud that come with runtimes, frameworks, and IDEs approved by enterprises. Duet AI enables Cloud Workstations with code/boilerplate generation, code completion, and code explanation. It can even scan source code for security vulnerabilities and suggest appropriate fixes.

Cloud Console: Google is going to embed a chat window within the Google Cloud Console user interface through which operators can interact with the chatbot. Like Bard and ChatGPT, this chatbot can assist operators with steps needed to perform a specific task or a function related to managing the cloud.

Cloud Shell: Like the chatbot for Cloud Console, the Cloud Shell, which is a terminal window embedded within the browser, will have a chatbot. It can generate commands and scripts to automate a variety of DevOps and CloudOps-related tasks.

Duet AI for End Users

It’s a no-brainer that Google would extend the power of AI to its consumer products, such as Google Workspace. Soon, Docs, Sheets, Slides, and Meet will get a chatbot to assist users in generating, transcribing, and summarizing content. This integration between the foundation models and Google Workspace makes the end-users creative and productive.

Google also demonstrated Project Tailwind, an AI-first notebook based on the personal content stored in Google Drive, Workspace, and other assets. Currently, this experimental service is available only in the USA.

Duet AI for Low Code and No Code

AppSheet, Google’s low-code/no-code platform, is going to be integrated with Duet AI. With this, users can create intelligent business applications, connect their data, and build workflows into Google Workspace via natural language powered by PaLM 2.

Google also announced MakerSuite, a tool that lets developers start prototyping quickly and easily. They will be able to iterate on prompts, augment datasets with synthetic data, and easily tune custom models. When they are ready to move to code, MakerSuite will let them export the prompt as code to languages and frameworks, such as Python and Node.js.

Search Generative Experience (SGE)

Google Search is going to be fundamentally changed through the infusion of generative AI. Search will become contextual and efficient by analyzing the semantic meaning of the query. Google is combining Shopping Graph with generative AI, which has over 35 billion product listings, to deliver an immersive experience to users.

The combination of traditional search and generative AI will transform how users experience the web.

Vertex AI and PaLM 2 API

Google is going to expose PaLM 2 LLM through a dedicated API endpoint. This is not available yet, but when it is ready, frontend and mobile developers can easily consume the API to build generative AI-based apps.

Vertex AI, Google’s ML PaaS in the Cloud, is ready for generative AI. It has an updated Model Garden, which is a repository of foundation models such as PaLM 2, Imagen, and Chirp. Google is also bringing third-party foundation models, such as stable diffusion, to its cloud platform.

Vertex AI – Model Garden

Generative AI Studio within Vertex AI acts as a playground to explore the API by tweaking various parameters and prompts. Developers can start with the Generative AI Studio before invoking the API or using the SDK.

Vertex AI – Generative AI Studio

Finally, Gen App Builder is a new service that lets traditional developers unfamiliar with ML or AI build generative AI applications. Developers can use a combination of text and images to create applications that can search for information in documents, photos, and video content. This service enables them to build engaging customer interactions.

When we analyze Google’s generative AI strategy, it becomes clear that PaLM 2 is the foundation powering almost every service it announced.

PaLM 2 is helping Google compete with OpenAI and Microsoft. It’s doing what GPT did to OpenAI and Microsoft.

The post Google’s Generative AI Stack: An In-Depth Analysis appeared first on The New Stack.

MongoDB vs. PostgreSQL vs. ScyllaDB: Tractian’s Experience

Joao Pedro Voltani — Wed, 31 May 2023 13:10:04 +0000

Tractian is a machine intelligence company that provides industrial monitoring systems. Last year, we faced the challenge of upgrading our real-time machine learning (ML) environment and analytical dashboards to support an aggressive increase in our data throughput, as we managed to expand our customers database and data volume by 10 times.

We recognized that to stay ahead in the fast-paced world of real-time machine learning, we needed a data infrastructure that was flexible, scalable and highly performant. We believed that ScyllaDB would provide us with the capabilities we lacked, enabling us to push our product and algorithms to the next level.

But you probably are wondering why ScyllaDB was the best fit. We’d like to show you how we transformed our engineering process to focus on improving our product’s performance. We’ll cover why we decided to use ScyllaDB, the positive outcomes we’ve seen as a result and the obstacles we encountered during the transition.

How We Compared NoSQL Databases

When talking about databases, many options come to mind. However, we started by deciding to focus on those with the largest communities and applications. This left three direct options: two market giants and a newcomer that has been surprising competitors. We looked at four characteristics of those databases — data model, query language, sharding and replication — and used these characteristics as decision criteria for our next steps.

First off, let’s give you a deeper understanding of the three databases using the defined criteria:

MongoDB NoSQL

Data model: MongoDB uses a document-oriented data model where data is stored in BSON (Binary JSON) format. Documents in a collection can have different fields and structures, providing a high degree of flexibility. The document-oriented model enables basically any data modeling or relationship modeling.
Query language: MongoDB uses a custom query language called MongoDB Query Language (MQL), which is inspired by SQL but with some differences to match the document-oriented data model. MQL supports a variety of query operations, including filtering, grouping and aggregation.
Sharding: MongoDB supports sharding, which is the process of dividing a large database into smaller parts and distributing the parts across multiple servers. Sharding is performed at the collection level, allowing for fine-grained control over data placement. MongoDB uses a config server to store metadata about the cluster, including information about the shard key and shard distribution.
Replication: MongoDB provides automatic replication, allowing for data to be automatically synchronized between multiple servers for high availability and disaster recovery. Replication is performed using a replica set, where one server is designated as the primary member and the others as secondary members. Secondary members can take over as the primary member in case of a failure, providing automatic fail recovery.

ScyllaDB NoSQL

Data model: ScyllaDB uses a wide column-family data model, which is similar to Apache Cassandra. Data is organized into columns and rows, with each column having its own value. This model is designed to handle large amounts of data with high write and read performance.
Query language: ScyllaDB uses the Cassandra Query Language (CQL), which is similar to SQL but with some differences to match the wide column-family data model. CQL supports a variety of query operations, including filtering, grouping and aggregation.
Sharding: ScyllaDB uses sharding, which is the process of dividing a large database into smaller parts and distributing the parts across multiple nodes (and down to individual cores). The sharding is performed automatically, allowing for seamless scaling as the data grows. ScyllaDB uses a consistent hashing algorithm to distribute data across the nodes (and cores), ensuring an even distribution of data and load balancing.
Replication: ScyllaDB provides automatic replication, allowing for data to be automatically synchronized between multiple nodes for high availability and disaster recovery. Replication is performed using a replicated database cluster, where each node has a copy of the data. The replication factor can be configured, allowing for control over the number of copies of the data stored in the cluster.

PostgreSQL

Data model: PostgreSQL uses a relational data model, which organizes data into tables with rows and columns. The relational model provides strong support for data consistency and integrity through constraints and transactions.
Query language: PostgreSQL uses structured query language (SQL), which is the standard language for interacting with relational databases. SQL supports a wide range of query operations, including filtering, grouping and aggregation.
Sharding: PostgreSQL does not natively support sharding, but it can be achieved through extensions and third-party tools. Sharding in PostgreSQL can be performed at the database, table or even row level, allowing for fine-grained control over data placement.
Replication: PostgreSQL provides synchronous and asynchronous replication, allowing data to be synchronized between multiple servers for high availability and disaster recovery. Replication can be performed using a variety of methods, including streaming replication, logical replication and file-based replication.

What Were Our Conclusions of the Benchmark?

In terms of performance, ScyllaDB is optimized for high performance and low latency, using a shared-nothing architecture and multithreading to provide high throughput and low latencies.

MongoDB is optimized for ease of use and flexibility, offering a more accessible and developer-friendly experience and has a huge community to help with future issues.

PostgreSQL, on the other hand, is optimized for data integrity and consistency, with a strong emphasis on transactional consistency and ACID (atomicity, consistency, isolation, durability) compliance. It is a popular choice for applications that require strong data reliability and security. It also supports various data types and advanced features such as stored procedures, triggers and views.

When choosing between PostgreSQL, MongoDB and ScyllaDB, it is essential to consider your specific use case and requirements. If you need a powerful and reliable relational database with advanced data management features, then PostgreSQL may be the better choice. However, if you need a flexible and easy-to-use NoSQL database with a large ecosystem, then MongoDB may be the better choice.

But we were looking for something really specific: a highly scalable and high-performance NoSQL database. The answer was simple: ScyllaDB is a better fit for our use case.

MongoDB vs. ScyllaDB vs. PostgreSQL: Comparing Performance

After the research process, our team was skeptical about using just written information to make a decision that would shape the future of our product. We started digging to be sure about our decision in practical terms.

First, we built an environment to replicate our data acquisition pipeline, but we did it aggressively. We created a script to simulate a data flow bigger than the current one. At the time, our throughput was around 16,000 operations per second, and we tested the database with 160,000 operations per second (so basically 10x).

To be sure, we also tested the write and read response times for different formats and data structures; some were similar to the ones we were already using at the time.

You can see our results below with the new optimal configuration using ScyllaDB and the configuration using what we had with MongoDB (our old setup) applying the tests mentioned above:

MongoDB vs. ScyllaDB P90 Latency (Lower Is Better)

MongoDB vs. ScyllaDB Request Rate/Throughput (Higher Is Better)

The results were overwhelming. With similar infrastructure costs, we achieved much better latency and capacity; the decision was clear and validated. We had a massive database migration ahead of us.

Migrating from MongoDB to ScyllaDB NoSQL

As soon as we decided to start the implementation, we faced real-world difficulties. Some things are important to mention.

In this migration, we added new information and formats, which affected all production services that consume this data directly or indirectly. They would have to be refactored by adding adapters in the pipeline or recreating part of the processing and manipulation logic.

During the migration journey, both services and databases had to be duplicated, since it is not possible to use an outage event to swap between old and new versions to validate our pipeline. It’s part of the issues that you have to deal with in critical real-time systems: An outage is never permitted, even if you are fixing or updating the system.

The reconstruction process should go through the data science models, so that they can take advantage of the new format, increasing accuracy and computational performance.

Given these guidelines, we created two groups. One was responsible for administering and maintaining the old database and architecture. The other group performed a massive reprocessing of our data lake and refactored the models and services to handle the new architecture.

The complete process, from designing the structure to the final deployment and swap of the production environment, took six months. During this period, adjustments and significant corrections were necessary. You never know what lessons you’ll learn along the way.

NoSQL Migration Challenges

ScyllaDB can achieve this kind of performance because it is designed to take advantage of high-end hardware and very specific data modeling. The final results were astonishing, but it took some time to achieve them. Hardware has a significant impact on performance. ScyllaDB is optimized for modern multicore processors and uses all available CPU cores to process data. It uses hardware acceleration technologies such as AVX2 (Advanced Vector Extensions 2) and AES-NI (Advanced Encryption Standard New Instructions); it also depends on the type and speed of storage devices, including solid-state disks and NVMe (nonvolatile memory express) drives.

In our early testing, we messed up some hardware configurations, leading to performance degradation. When those problems were fixed, we stumbled upon another problem: data modeling.

ScyllaDB uses the Cassandra data model, which heavily dictates the performance of your queries. If you make incorrect assumptions about the data structures, queries or the data volume, as we did at the beginning, the performance will suffer.

In practice, the first proposed data format ended up exceeding the maximum size recommended for a ScyllaDB partition in some cases, which made the database perform poorly.

Our main difficulty was understanding how to translate our old data modeling to one that would perform on ScyllaDB. We had to restructure the data into multiple tables and partitions, sometimes duplicating data to achieve better performance.

Lessons Learned: Comparing and Migrating NoSQL Databases

In short, we learned three lessons during this process: Some came from our successes and others from our mistakes.

When researching and benchmarking the databases, it became clear that many of the specifications and functionalities present in the different databases have specific applications. Your specific use case will dictate the best database for your application. And that truth is only discovered by carrying out practical tests and simulations of the production environment in stressful situations. We invested a lot of time, and our choice to use the most appropriate database paid off.

When starting a large project, it is crucial to be prepared for a change of route in the middle of the journey. If you developed a project that did not change after its conception, you probably didn’t learn anything during the construction process, or you didn’t care about the unexpected twists. Planning cannot completely predict all real-world problems, so be ready to adjust your decisions and beliefs along the way.

You shouldn’t be afraid of big changes. Many people were against the changes we were proposing due to the risk it brought and the inconvenience it caused to developers (by changing a tool already owned by the team to a new tool that was completely unknown to the team).

Ultimately, the decision was driven based on its impact on our product improvements — not on our engineering team, even though it was one of the most significant engineering changes we have made to date.

It doesn’t matter what architecture or system you are using. The real concern is whether it will be able to take your product into a bright future.

This is, in a nutshell, our journey in building one of the bridges for the future of Tractian’s product. If you have any questions or comments, feel free to contact us.

The post MongoDB vs. PostgreSQL vs. ScyllaDB: Tractian’s Experience appeared first on The New Stack.

Alteryx Announces AiDIN for AI-Powered Features

Andrew Brust — Fri, 26 May 2023 15:06:16 +0000

At its Inspire conference in Las Vegas on Wednesday, long-time data integration and AI player Alteryx announced a series of new platform capabilities, many of which were focused on generative artificial intelligence, and several more of which impressively beef up its hybrid cloud capabilities. While a number of data and analytics companies have recently added large language model (LLM)-based capabilities to their platforms, the majority of these have been focused on data exploration and querying. Alteryx’s announcements do provide coverage there, but they also address the areas of data pipeline governance, and the functionality within data pipelines themselves. These are perhaps less flashy capabilities, but are arguably just as substantive, if not more so.

Also read: Alteryx Integrates with UiPath, Uniting RPA and Data Pipelines

Suresh Vittal, Alteryx’s Chief Product Officer, briefed The New Stack on six major announcements, three of which are in the generative AI realm, and three of which address cloud capabilities of the core Alteryx platform. These follow quickly on Alteryx’s 2022 acquisition of Trifacta and the momentum that it has driven. As Vittal said, Alteryx has been “hard at work integrating the Trifacta platform. We announced the Alteryx Analytics cloud with machine learning, Designer Cloud and Auto Insights built on a common unified platform. That’s getting great uptake and our cloud has been up and running for several months now.”

Check out: Alteryx Analytics Cloud Consolidates Acquisitions and Functionality

AI, above the Din

On the generative AI side, Alteryx has announced AiDIN, which serves as the umbrella brand and engine for each of Alteryx’s AI capabilities, both old and new. According to Vittal, AiDIN is the “core framework of bringing… Alteryx’s data and Alteryx’s models, combining those capabilities and powering specific use cases.” The AiDIN-related announcements include:

A Workflow Summary Tool, which can create natural language summaries of any one or group of Alteryx workloads. Essentially, the tool has the ability to document what workflows do, after they’ve been authored, which can help engineers understand assets that they need to become acquainted with, or proactively document their own work. The summaries can be embedded in a workflow’s Meta Info field, which ensures the generated summary attaches itself to the workflow regardless of who accesses it and when.

The Workflow Summary Tool in action
Credit: Alteryx

A feature called Magic Documents, which is essentially a new adjunct to Alteryx’s already existing Auto Insights feature, is itself driven by AI. Now, in addition to creating Auto Insights, Alteryx customers can leverage generative AI to create a conversational email message, PowerPoint slide presentation or other document that summarizes the generated insights. So not only can AI generate a report, but it can now generate a cover letter of sorts, to accompany the report. This is a useful tool to summarize such reports for managers or executives who may not have time to review them in full, but still need to know what’s in them.

Magic Documents email generation.
Credit: Alteryx

An OpenAI Connector, embeddable in Alteryx workflows, which can call APIs in OpenAI’s generative AI platform as an automated step in a data pipeline. This takes generative AI beyond interactive chatbot scenarios and into triggered, data-driven actions that are executed autonomously. Vittal explained that this connector is for OpenAI’s own platform, and that connectors for Azure OpenAI and for customers’ own models will be forthcoming. Google AI service connectors may be added to the mix as well.

Use Cases, Today and Tomorrow

Alteryx’s applications of large language models and generative AI are more infrastructural than many which have surfaced in the analytics space recently. They aren’t (yet) about using natural language to generate assets like reports or workflows, but rather using the technology to extend the reach, management, power and capabilities of those assets.

These generative AI-based capabilities extend primarily to natural language use cases, but LLMs can be used for scenarios beyond natural language, especially when combined with a customer’s own data. Vittal expects that Alteryx may move to certain of these scenarios soon, stating that “…we’re trying to …decouple the foundational model and the work that happens with the foundational model from the contextualized training and find… I probably think about it as fine-tuning… that we can do [this] using the customer’s data and Alteryx’s data.”

Vittal also told The New Stack that the company has been “working with our design partners on things like metadata enrichment, things like orchestration of specific very complex operational processes. And we’re finding that there’s real value in applying generative AI to these kinds of use cases because it takes a lot of tasks out of the process.”

AI Isn’t Everything

This is all neat stuff, and there’s no disputing how cool and transformational AI is. But, as I mentioned earlier, Alteryx made a few important non-AI related announcements too. They include:

So-called “cloud-connected experiences,” like Cloud Execution for Desktop, a hybrid cloud feature that allows customers to author Alteryx workflows in the Alteryx Designer desktop application, then save them to, and have them execute in, the cloud.
New Enterprise Utilities for enhanced governance, including Alteryx product telemetry data to manage usage across clouds, and the ability to treat workflows as code, then curate them and manage their deployment by pushing them to Git-compatible version control repositories
New Location Intelligence capabilities which have been rewritten for the cloud, to take advantage of the extended resources and elastic computing power provided there. This makes new use cases possible because more powerful spatial data workloads can be accommodated in the cloud. Alteryx is announcing pushdown query integration with Snowflake, and integrations with TomTom, to attain this improved performance and enable the new use cases.

Must read: Snowflake Builds out Its Data Cloud

AI Economics

That’s a lot of developments to absorb. Despite the recessionary air and austerity on both the customer and vendor sides of the data realm today, Alteryx seems to be doing quite well. Vittal put it this way: “Last quarter… we expanded 121% which is kind of best-in-class expansion. The largest of our accounts expanded 131%. So we’re continuing to see demand and durability of use cases even in this macro, some might argue in this macro even more so, because more and more the teams have to do more with less. And so automation and analytics orchestration becomes more important.”

One might argue that generative AI is creating its own economic bump for our industry. While some of that may be merely a hype-driven bubble, applications of AI like the ones Alteryx has implemented add real practicality and productivity. The latter typically drives healthy economic expansion, which is something we should all be rooting for. If we can move past the “spectacle” of AI, and focus on its down-to-earth utility, good things can result.

The post Alteryx Announces AiDIN for AI-Powered Features appeared first on The New Stack.

Proprietary AI Models Are Dead. Long Live Proprietary AI Models

Alan Ho — Fri, 26 May 2023 13:13:07 +0000

Over the past few weeks, there has been a declaration that open source generative artificial intelligence models will dominate the field. Pundits cite a leaked Google memo that states the search giant has lost its competitive advantage in the field of generative AI because of open source models. The argument goes something like this:

Open source machine-learning algorithms have exceeded the capabilities of proprietary algorithms.
When using open source algorithms to train models on open source data sets, the performance of the “foundational” models is quite good with respect to benchmarks.
Using techniques like “fine-tuning” (the process of combining your data with the open source data) to build a model obviates the need to use Big Tech’s proprietary data sets.
Therefore, proprietary models are dead.

Then Google I/O 2023 happened. Google Bard, a generative AI search engine built on its own proprietary dataset, has been getting rave reviews. The most-cited feature is its ability to incorporate real-time data into its model. Let’s look at why proprietary models will play a valuable role in the future by analyzing the argument above:

Have open source machine-learning algorithms exceeded the capabilities of proprietary algorithms? Yes. Google’s internal memo discusses how the performance and innovation of algorithms by the open source community has eclipsed its own pace of development.
When using open source algorithms to train models on open source data sets, is the performance of “foundational” models good with respect to benchmarks? Beware of benchmarks. If the goal of the model is only to understand English, then using an open source corpus of data is fine. But what if your model can benefit from real-time data— just like how users benefit from Bard’s real-time data search? Then the benchmark will need to have the ability to understand English and understand recent events in the world.
Do techniques like “fine-tuning” to build a model obviate the need to use Big Tech’s proprietary data sets? Again, what do your users care about? Can your proprietary dataset bring ALL the real-time context you need?
So are proprietary foundational models really dead? Not so fast.

The Cost of Generative AI Success

It turns out that getting access to real-time data to build models is expensive. Google spends billions of dollars to build infrastructure to index the web in real time to build their generative models, and you can bet it’s going to be proprietary.

Let’s take the example of two airline travel chatbots built on top of two different foundational models; one chatbot is open source and one is proprietary with real-time data. The travel chatbot is “fine-tuned” with a proprietary flight information data set to recommend which flights to take. In many cases, both chatbots will provide the same answer. However, if a large storm hits an airport, the chatbot built with proprietary real-time data will provide flight information that avoids flights that are affected by the storm. This is invaluable to users; hence it will be valuable to developers too.

The Future of Foundational AI Models

So does this mean that every generative AI use case needs a foundational model built from proprietary real-time data? No, but there are other reasons why a proprietary foundational model will be needed:

Proprietary first-party data sets. Consider this example: Google Bard leverages the entirety of YouTube to create its foundational model. If your generative AI use case can benefit from the vast amount of information and knowledge that is uploaded to YouTube, then you might want to use a foundational model from Google.
Personalization data sets. When a foundational model is trained with personalized data, the model (aka the neural network) will have aspects of personal information in it. Using these models to do inference can be done in a way that doesn’t leak personal information, but if the entire model is exported, it is possible to extract personal information on particular users by looking at the parameters of the model. Despite the advances in federated learning, there isn’t a foolproof way to enable the model to be exported without jeopardizing privacy.

So what do future foundational models look like? Probably something like this:

Algorithms will be open source.
Data sets will be proprietary in some cases, due to the cost of maintaining a real-time data set and personalization, and open source in others.

Assuming this is the prevailing architecture, what are the secondary effects?

Enterprises looking to build generative AI will likely need to rely on foundational models from large companies that have the checkbook to maintain their own real-time data infrastructure and open source foundation models for other use cases.
The proprietary data set that enterprises rely on will also increasingly be real time too. Expect that data to reside in NoSQL real-time databases like Apache Cassandra, streamed into the feature stores using technologies like Apache Pulsar.
For practical purposes, model inference will likely happen at data centers owned by the foundational model providers such as AWS, Microsoft and Google. This means the hyperscalers will likely increase in importance in the age of AI. Model inference based on foundational open source models may be performed in customers’ data centers.

The secondary effects for DataStax (my employer) are significant too. As a data management provider, our investment in providing services in the cloud through DataStax Astra DB, which resides on the major clouds of AWS, Microsoft and Google, is likely to grow as generative AI becomes more prevalent in the enterprise.

While we encourage and support the use of open source foundational models from companies like Hugging Face, we’re also forming strong AI partnerships with the big three cloud providers. Most importantly, we are using the community contribution process to upstream features to Cassandra such as vector search to ensure that companies can create their own real data sets for real-time AI.

The post Proprietary AI Models Are Dead. Long Live Proprietary AI Models appeared first on The New Stack.

AI Talk at KubeCon

Alex Williams — Wed, 24 May 2023 20:36:40 +0000

What did engineers at KubeCon say about how AI is coming up in their work? That’s a question we posed Taylor Dolezal, head of ecosystem for the Cloud Native Computing Foundation at KubeCon in Amsterdam.

Dolezal said AI did come up in conversation.

“I think that when it’s come to this, typically with KubeCons, and other CNCF and LF events, there’s always been one or two topics that have bubbled to the top,” Dolezal said. “But I think that this time around, it feels like there’s like five to seven. And that’s really interesting to me. I’m hearing WebAssembly, edge computing. I’m hearing internal development platforms…”

At its core, AI surfaces a data issue for users that correlates to data sharing issues, said Dolezal in this latest episode of The New Stack Makers.

“It’s what you’re putting in to create the end result,” Dolezal said. “It’s what you are putting into the function, and then its output. People are worried about putting proprietary source code in and that getting leaked. It sounds a lot more like a data issue to me personally.”

Dolezal: “What a time to be a lawyer.”

Generative AI is eating up data using large language models (LLM). These LLMs are built on foundation models trained on data that use unsupervised learning techniques, writes Janikiramn MSV for The New Stack. The “foundation models form the base for multiple variations of the model fine-tuned for a specific use case or scenario.” LLMs fare great for word completion. Have you used GPT 3.5 or Meta’s LLaMa? These LLMs use input strings that often then generate another string that generally follows the original .”

Dolezal said developers told him at KubeCon that people use large language models (LLMs) for tasks such as writing ReadMe files.

LLMs also serve as a way for thought experiments to develop different ways to break initial friction on a project, Dolezal said.

“Is this something that’s good to use kind of as a muse?” Dolezal said.

Or is it good enough to output and then put it into a project or approve a PR? he asked. How that may affect maintainers becomes a question because we want to avoid putting more burden on the maintainer who already has so much to do.

“It’s really about what you’re putting in to create the end result, what you’re putting into the function, and then its output,” Dolezal said.

As for the legal question — one of the biggies is about what if you are using proprietary code. Then how do you use an LLM?

The post AI Talk at KubeCon appeared first on The New Stack.

Better Data Logistics Is the Key to Effective Machine Learning

Michael Tanenbaum — Wed, 24 May 2023 17:00:33 +0000

When humans interact with modern machines, there is almost always some kind of machine learning program running in the background.

The quality of that machine learning model, and therefore the quality of the human’s experience, depends on the quality of the underlying data — the more data the model has access to, and the more up-to-date that data is, the more accurate the model is going to be.

But in many cases, organizations fail to manage data logistics in a way that leverages the most high-quality, up-to-date data for their models.

As a result, the quality of the machine learning models suffers.

This isn’t an academic problem — machine learning models that don’t work well cause real-world disasters, from the Navy getting a false positive or negative on a threat detection system, to oil spills in a pipeline going undetected, to being unable to make a critical purchase because a credit card has been flagged for fraud.

These types of errors give machine learning a bad name. And many use cases for ML remain unexplored because of uncertainty about how the data required for the training models would be sourced and prepared for inclusion in model training datasets.

Data Logistics Today: Bottlenecks and Wishful Thinking

The reason these machine learning models don’t work is that there’s nothing modern about how edge data moves from place to place. In some cases, teams must resort to shipping physical hard drives via FedEx. This dramatically reduces the usefulness and availability of data, and therefore the ability of edge devices to make better decisions through higher-quality models.

Even in high-connectivity environments, moving data around is prohibitively expensive. And data engineering teams are so overworked that any change to the flow of data that an ML engineer requests will most likely get assigned a ticket that will wait in a queue for months before someone can address it. Iteration becomes impossible.

The industry is generally aware that physically shipping hard drives around the globe is a poor way to extract data and update models. But when teams start working on ways to improve the movement of data, they often start with a set of assumptions that simply don’t hold true in the real world.

Most data logistics architectures assume uninterrupted connectivity, which is reality in precisely zero situations. Even the highest-connectivity environments are going to suffer outages — all of the public clouds have outages; data centers have outages; networks have outages; cities have power failures.

To make matters worse, many safety-critical ML applications use data that’s collected in low-connectivity environments and run in low- or no-connectivity environments.

As a result, projects fail to make use of the data being collected in the field and to harness it to create more powerful, more accurate ML models that can accurately identify threats ranging from a dangerous situation in an oil pipeline to a slippery spill in a big box store.

We have massive computing power available to us, but our inability to move data around in a way that works in the real world hampers our ability to leverage that power and create applications that solve real problems in the real world.

Better Data Logistics

When we talk about data logistics, we’re talking about the process of moving data from point A to point B. It’s just like regular logistics of physical goods — the process by which something is moved from one point to another.

We have to think about data logistics if we want to get value out of the enormous quantity of data we’re collecting, because data has no value unless it’s used and analyzed, which requires it to move. The only business reason for data to be truly at rest is when you need to store data for compliance reasons — and you may need to retrieve that data too.

Data is critical to computing, modern or otherwise. Computer science boils down to the practice of mutating data and displaying data to users. How our infrastructure handles data and moves data from point to point is critical to making applications that are both a technical success as well as a business success.

Effective data logistics needs to be built for the real world. It should be able to automatically sync data when connectivity is available and collect and store data when connectivity is lost, all without conflict, data loss or failures due to poor connectivity. It should be simple enough that it can be adjusted without an experienced data engineer. It should be as declarative as the postal service: Declare your data’s destination and let the data logistics system take care of the rest.

Right now, a lack of effective data logistics is preventing machine learning applications from reaching their potential. Let’s fix that.

The post Better Data Logistics Is the Key to Effective Machine Learning appeared first on The New Stack.

FAQ: What Is Automated Incident Response?

Ariel Russo — Wed, 24 May 2023 13:04:57 +0000

When things go wrong with your organization’s infrastructure and systems, it can have a huge impact on employees, customers and brand reputation. It’s important that you can quickly and effectively resolve problems.

Manual incident response relies on people as the first line of support, but this usually takes them away from other important tasks to respond. Automated incident response changes this, using machines to shoulder some of the burden. Automated incident response helps to improve operational maturity. It means not only a better response to critical incidents when they occur, but also the ability to prevent issues before they happen.

Q: Why do organizations need to improve incident response?

Almost everything we do today relies on digital workflows and infrastructure. If you’re a worker, chances are you’re spending less time in the office and working remotely — accessing data and systems from home, the coffee shop — anywhere. And as consumers, we’re all choosing more digital channels to spend our money and access services.

But there’s a conflict. Digital infrastructure is becoming more important, yet the support available to run it is being stretched. IT teams are expected to manage increasingly complex systems, including a huge shift toward the cloud, but with fewer people and outdated tools. These problems mean organizing a response can be problematic and riddled with toil.

It’s why many organizations are looking at improving digital operations maturity, not only looking at how to speed up incident response but also understanding how taking a more proactive approach can prevent issues before they can have an impact.

Q: What’s the difference between manual and automated incident response?

When a major incident is happening, there are often manual steps a responder needs to run through while the world is “on fire.” Things like creating a Slack channel, spinning up a Zoom conference bridge or subscribing stakeholders. These steps are tedious, easy to forget and add to the already heavy cognitive load of responders. And that’s not a great use of their time. In fact, these manual steps often distract responders from doing the thing that is important, which is resolving the incident.

Automated incident response is about using machines to take away some of the toiling and remove people from that first line of defense. With the right infrastructure, you can automatically detect and diagnose disruptive events, and mobilize the right team members at the right time across your digital operations. You can resolve issues quickly and minimize the impact on customers and employees.

Our latest State of Digital Operations Report found that in organizations running manual processes, 54% of responders were notified of issues outside normal working hours. This slows down issue resolution, leads to exhausted teams and makes it hard to generate working efficiencies. Moving to automated incident response can have a hugely positive effect on your operations and on team morale.

Q: What does a “gold standard” incident response process look like?

The biggest factor by far in successful incident response is aligning the whole organization on what the response should be. There’s a lot to cover within that, but organizations should start with three key areas:

Define what an “incident” is. This sounds obvious, but sometimes it can be hard to distinguish between a day-to-day minor incident and an issue that affects customers. So you need to make sure you allocate this task to the experts in each product area and give them all the same framework for triaging, for example, priority 1 to 5 or severity 1 to 3, etc.
Define clear roles for people involved in the response. Then they can jump straight in when called, which speeds up the response and improves outcomes. You can also allocate roles by the type of incident. A priority 1 or 2 issue might need a dedicated incident commander, for example, while the responder for priority 3 to 5 issues could fulfill that role.
Own the tools. You must have the right toolkit at your disposal, and it needs to bring monitoring and observability, private and public cloud infrastructure, systems of record, etc., together in one place, along with your people and processes.

Q: What are the steps in a typical incident response life cycle?

There are six steps. The process starts when you detect an issue and ends with absorbing the learnings to improve next time.

Detect. Issue detection could come from anomalous behavior spotted by a monitoring tool or a call to the customer services team. Either way, you would bring all the data about the issue into your centrally available incident response tool.
Prevent. Preventing excessive noise and alert storms enables people to concentrate on the issue at hand. You can do this by silencing unimportant alerts or enabling auto-remediation, where your software takes charge of fixing the things it can.
Mobilize. Once it’s clear that a person is required to do something, you need to find the right people and equip them with the right processes. A service-based architecture enables you to always know who is responsible for the affected service and to loop them in seamlessly.
Diagnose. At this stage, having information at the tip of your fingers is essential. For example, with AIOps, people can quickly access past and related incidents, with process automation enabling diagnostics and reporting with one click.
Resolve. The longest and most demanding phase, at this point responders are expected to be fixing, but also communicating and updating stakeholders. It’s invaluable to have your incident response integrated with CollabOps tools like Slack or Microsoft Teams and to have a channel for automated customer updates.
Learn. Incorporating learnings into the response process can help improve the response for future incidents. Learning goes beyond tools and systems. It needs to be an organizational commitment. The right incident response tool will have the analytics and reporting to make it happen.

Q: How can organizations integrate toolchains?

In practice, you just need the right operations management tool, one that can manage any urgent or unplanned issue.

Firstly, you should probably be looking at a cloud-based tool. Organizations are increasingly moving essential platforms to the cloud, and it’s no different for operations management. Choosing a cloud-based platform enables you to benefit from the power of cloud processing, but also makes it easy to integrate your other cloud business services.

Secondly, your digital operations tool should offer a wide range of integrations and APIs. The more core business systems you can connect to your operations cloud, the more you can collaborate and automate. The right system will enable you to integrate everything, from your monitoring and observability tools to security and DataOps solutions, and even your customer service and chat/collaboration platforms.

Q: How can organizations reshape their incident response processes?

Your customers and employees are increasingly relying on your digital services to work well, and it can cause significant damage to your business and reputation when they don’t. But despite this, many organizations don’t have robust-enough incident response processes to keep pace in the digital era.

In today’s operating environment, you need a companywide commitment to incident response, ideally with a single tool that can seamlessly manage all the urgent and unplanned work across the business. This will help you move away from reactive manual interventions to proactive — and in many cases, automated — remediation.

When you can quickly and effectively detect and diagnose the most high-impact issues, with automated workflows that mobilize the right people at the right time, then you can reduce system downtime and help people to do more with less.

The post FAQ: What Is Automated Incident Response? appeared first on The New Stack.

The Benefits and Limitations of AI for Service Optimization

Prabjoth Saimbhi — Tue, 23 May 2023 17:00:05 +0000

At first glance, artificial intelligence might seem like the perfect solution for IT service optimization — and it is, in some cases. By quickly parsing complex sets of data and generating insights based on them, AI can help organizations to identify and act on opportunities for streamlining their IT services.

But that doesn’t mean AI can optimize every IT service at every business. There are important limitations on the extent to which IT teams can apply AI to their service optimization strategy.

Keep reading for a breakdown of what AI technology — including but not limited to the generative AI tools that have been the source of much hype in recent months — can realistically achieve in the context of service optimization, and for a look at which types of processes will always require a human touch.

The Benefits of AI for Service Optimization

Before looking at examples of what AI can and can’t do in the realm of service optimization, let’s discuss why you’d want to use AI to optimize your services in the first place.

The main reason is that service optimization often requires analyzing large amounts of data and AI can perform that work much faster and more efficiently than humans.

To put this in context, imagine that you want to streamline IT processes within your organization. You could do that by having your IT team manually look over data sources like help-desk tickets, determine which types of requests take the longest to fulfill, and then produce recommendations on how to make those processes faster. That’s doable, but it would take a long time, and it would constitute a major distraction for your IT team.

Alternatively, you could deploy an AI tool that automatically analyzes all of your tickets — alongside, perhaps, other data sources that provide insight into why certain requests take a long time to fulfill — then generates recommendations on where and how to optimize services. This approach will yield results in a fraction of the time that it would take to glean the same insights manually.

IT Services That AI Can Help to Optimize

You can leverage an AI-based service optimization approach for any type of IT process that meets the following criteria:

You have a substantial set of data that an AI tool can analyze to understand how the process works and identify opportunities to make it better.
The process does not involve complex human interactions that require emotional intelligence to understand fully.

Plenty of core IT services fit both of these bills. Apart from using AI to analyze help-desk tickets in order to improve IT services aimed at end users, other services that are good candidates for AI-powered optimization include:

Infrastructure management: AI can analyze logs, metrics and other infrastructure data to understand what your organization’s infrastructure requirements are and provide guidance on optimizing infrastructure management. In turn, it could help you reduce spending on unnecessary infrastructure, plan hardware refresh processes and so on.
Network management: AI can analyze network traffic patterns to help you identify bottlenecks or predict outages, leading to better network performance for your organization.
Software development: Businesses that build software can take advantage of AI to optimize their software delivery processes by, for example, predicting how long a sprint should last or how many changes they can reasonably implement with each release cycle. AI tools could do this by analyzing logs from CI/CD tools, along with data like application deployment speed and frequency.

The list could go on, but the point is straightforward enough: Almost any IT service that generates systematic data, and that involves technical resources or processes, can probably be improved with the help of AI-based insights.

When Not to Use AI for Service Optimization

On the other hand, services are typically not good candidates for AI-assisted optimization when they have one or more of the following characteristics:

They are not associated with a data source and therefore can’t be optimized by AI tools that analyze data.
They require ethical decision-making, which AI is typically not equipped to handle.
They involve creative decision-making or ideation, something that AI can’t do well because it’s incapable of generating total novel concepts.
They necessitate emotional intelligence or the building of trust — tasks that AI can’t perform well.
They involve adapting to unstructured or unpredictable environments, such as servers that have suffered a never-before-seen type of cyberattack. In this context, AI is of little use because it can’t reliably anticipate the conditions it needs to work with.

Project management is an example of a real-world situation where AI-based service optimization is unlikely to yield value. You can certainly automate some aspects of project management, and you can record some data about project operations through tools like Jira. But that data represents only part of what goes into an effective project. Every project has unique requirements, making it hard to optimize upcoming projects based on data you collect about past projects.

Plus, most projects involve extensive interactions between humans. They also require trust and accountability among stakeholders. Those are factors that AI tools are not adept at assessing or optimizing.

This means that optimizing project management processes requires more than deploying an AI tool and seeing what it recommends. You need a nuanced understanding of each project’s requirements, as well as knowledge of how to build trust and manage human relationships.

Negotiating with vendors is another example of a common process that is very difficult to streamline using AI. As with project management, there are complex human components at play with negotiation. Although AI tools might be able to help with some aspects of negotiation, such as helping you understand how vendor pricing trends have varied over time, they can’t tell you exactly how to interact with a vendor or exactly which pricing terms to ask for. Nor can they build the trust relationships that are necessary for instilling confidence that a vendor will deliver on promises.

Conclusion

AI offers enormous potential to make a variety of common IT and business processes faster, more efficient, more scalable and less costly. But it’s important to understand the limitations of AI as a service optimization solution. When you venture beyond the realm of the purely technical, AI ceases to be a source of useful insights, and you’ll need humans to make the decisions that AI can’t.

The post The Benefits and Limitations of AI for Service Optimization appeared first on The New Stack.

Economists Show AI Bringing Positive Impact to Workplaces

David Cassel — Sun, 21 May 2023 13:00:46 +0000

Stanford-based economist Erik Brynjolfsson is surprisingly cheery about AI’s ultimate impact. “I wouldn’t be surprised 50 years from now, people looked back and say, wow, that was a really seminal set of inventions that happened in the early 2020s…” he told CBS in January. “I think we’re going to have potentially the best decade of the flourishing of creativity that we’ve ever had, because a whole bunch of people, lots more people than before, are going to be able to contribute to our collective art and science.”

But is there evidence to support this optimism? Last month Brynjolfsson teamed up with MIT-based economists Danielle Li and Lindsey R. Raymond, for a new paper to explore the question. Titled “Generative AI at Work,” it begins by noting the lack of other studies on real-world economic effects of AI, calling their work “to our knowledge, the first study of the impact of generative AI when deployed at scale in the workplace” — and over a longer period of time.

So what did they find? The three economists conclude AI “increases worker productivity, improves customer sentiment, and is associated with reductions in employee turnover.”

And the specifics were even more intriguing…

Cyborg Customer Service

Their study focused on chat-based customer service. Among businesses using AI, 22% were using it in their customer service centers, according to a McKinsey Analytics study cited by the researchers.

Though the AI tool isn’t specified, their paper specifies its functionality: it provides support agents with real-time suggestions for responses, and also prompts them with links to internal technical documentation. And of course, it tracks entire conversations for context.

Importantly, the system doesn’t make suggestions at all if it doesn’t have enough training data — which “occurs in a large minority of cases” — and human agents always have the choice to disregard all suggestions.

The conversations happened between November of 2020 and February of 2021, using a tool built on large-language models from OpenAI’s GPT family, “with additional ML algorithms specifically fine-tuned to focus on customer service interactions.”

Interestingly, its training data included conversations with both “successful” and “unsuccessful” outcomes (as well as conversations of varying length) — and indicates whether the data came from one of the firm’s top-ranked agents.

“The AI firm then uses these data to look for conversational patterns that are most predictive of call resolution and handle time,” the paper notes, adding it prioritizes “responses that express empathy, surface appropriate technical documentation, and limit unprofessional language.”

The randomly-assigned conversations “are relatively lengthy, averaging 40 minutes,” according to the report, “with much of the conversation spent trying to diagnose the underlying technical problem…”

Past automation saw a rarefied handful of engineers carefully mapping tasks onto algorithms — versus this cruder brute-force method of feeding masses of training data into an ML system until an algorithm emerges that everyone can follow.

“Equipped with this new AI system, the company’s customer support representatives are now basically part human, part intelligent machine,” quipped NPR’s Planet Money blog. “Cyborg customer reps, if you will.”

And after 3,007,501 conversations (performed by 5,179 agents) the results came in. Where before the support agents averaged 43 minutes for each call, that average dropped to just 35 minutes when AI suggestions were made available (over 1,180,446 calls).

But that’s just the beginning…

The AI suggestions also increased the average number of resolved issues by 13.8% (per hour) — fueled partly by an increase in the number of simultaneous chats that an agent can handle. The researchers write that their evidence suggests that “AI enables agents to both speed up chats and to multitask more effectively.”
The introduction of AI tools apparently also improved the retention of employees — in every skill group. And in a possibly-related development, the tools also improved the way customers treated their support agents, indicating, among things, in fewer requests for a manager to intervene. The researchers see changes in the very experience of work, concluding that “generative AI working alongside humans can have a significant positive impact.” As it stands, the annual turnover rates for customer service agents can reach 60%, according to one study they cite — leading to costly training for replacement workers and wider variations in experience levels and productivity.
AI’s impact varied depending on a worker’s experience and skill level, “with the greatest impact on novice and low-skilled workers, and minimal impact on experienced and highly skilled workers.” In fact, for the highest-skilled workers, the AI help didn’t lower the average time spent on calls at all (though these agents may have been able to handle more calls simultaneously). But even more interesting, the highest-skilled workers saw “small but statistically significant decreases in resolution rates and customer satisfaction,” the researchers note, positing that AI assistance “may distract the highest-skilled workers, who are already doing their jobs effectively.” (While meanwhile, it’s the less-skilled agents who “consistently see the largest gains.”)

At the same time, there’s evidence that AI “disseminates” the “potentially tacit knowledge of more able workers.” That is, the benefits of experience get passed along to newer workers. The researchers provide specific benchmarks. Among agents given access to the AI, those agents with two months of tenure “perform just as well as untreated agents with over six months of tenure.”And the workers with AI assistance also seemed to get better faster.

Disseminating Productivity

Their paper cites a concept known as Polanyi’s paradox — that much of what we know how to do is hard to articulate into rules — and is considered a potential roadblock for full automation.

But AI appears to have the ability to acquire even those unexplained skills. Their paper argues the systems are “capturing and disseminating the patterns of behavior that characterize the most productive agents,” and the researchers saw gains in every measure of productivity — including speed, success rate, and customer satisfaction.

“[W]hat this system did was it took people with just two months of experience and had them performing at the level of people with six months of experience,” Brynjolfsson told Planet Money. “So it got them up the learning curve a lot faster — and that led to very positive benefits for the company.”

The report goes so far as to ask whether top-performing workers should be paid more — since their abilities are now propagated throughout the workforce.

But more importantly, the researchers note specifically that this is “in contrast to studies of prior waves of computerization.” Past automation saw a rarefied handful of engineers carefully mapping tasks onto algorithms — versus this cruder brute-force method of feeding masses of training data into an ML system until an algorithm emerges that everyone can follow.

Planet Money even asked Brynjolfsson if AI “could also reduce inequality by bringing the top and middle down, essentially de-skilling a whole range of occupations, making them easier for anyone to do and thus lowering their wage premium.” They report that Brynjolfsson “seemed a bit skeptical of this” — but they also see another important benefit. “It suggests that AI could benefit those who were left behind in the previous technological era.”

The paper clarifies at one point that their research wasn’t capturing the “wage effects” of AI. And it’s certainly true that AI may grow the economy, Brynjolfsson told Planet Money, using the classic metaphor of a bigger pie. But then he issued a warning. “It’s very clear that it’s not automatic that the bigger pie is evenly shared by everybody… We have to put in place policies, whether it’s in tax policy or the strategy of companies like this one, which make sure the gains are more widely shared.”

The paper’s conclusion concedes that “the effects we find may not generalize across all firms and production processes.”

Just for example, in fast-changing environments, it may be harder to train systems using historical data. But then it also acknowledges many other questions that it’s also leaving unexplored. Will customer service agents be reassigned to “more complex customer responsibilities, increasing aggregate demand”? Will customers prefer and even demand AI-enhanced support? Will AI systems uncover “patterns and insights” that change how workers are managed or how knowledge is shared?

And will wages go up or down?

There’s also the emerging issue of whether AI is passing off someone else’s skills as its own. “[O]ur findings raise questions about whether and how workers should be compensated for the data that they provide to AI systems,” the researchers write. “High-skill workers, in particular, play an important role in model development but see smaller direct benefits in terms of improving their own productivity.

“Given the early stage of generative AI, these and other questions deserve further scrutiny.”

The post Economists Show AI Bringing Positive Impact to Workplaces appeared first on The New Stack.

Prompt Engineering: Get LLMs to Generate the Content You Want

Janakiram MSV — Fri, 19 May 2023 15:00:44 +0000

The generative AI models are trained to emit content based on the input. The more descriptive the input instruction is, the more accurate and precise the output is. The input instructions fed to a generative AI model are aptly called prompts. The art of crafting the most suitable prompt leads us to prompt engineering.

This article introduces prompt engineering to developers using large language models (LLMs) such as GPT-4 and PaLM. I will explain the types of LLMs, the importance of prompt engineering, various types of prompts with examples.

Understanding Large Language Models

Before getting started with prompt engineering, let’s explore the evolution of LLMs. This will help us understand the significance of prompts.

Generative AI is based on foundation models trained on a large corpus of data based on unsupervised learning techniques. These foundation models form the base for multiple variations of the model fine-tuned for a specific use case or scenario.

Large language models can be classified into base LLMs and instruction-tuned LLMs.

The base LLMs are the foundation models trained on massive datasets available in the public domain. Out of the box, these models are good at word completion. They can predict what comes next in the sentence. Examples of base LLMs include OpenAI’s GPT 3.5 and Meta’s LLaMa. When you pass a string as input to the base model, it generates another string that typically follows the input string.

The instruction-tuned LLMs are fine-tuned variations of the foundation model designed to follow instructions and generate an appropriate output. The instructions are typically in a format that describes a task or asks a question. OpenAI’s gpt-3.5-turbo, Stanford’s Alpaca, and Databricks’ Dolly are some of the examples of instruction-based LLMs. The gpt-3.5-turbo model is based on the GPT-3 foundation model, while Alpaca and Dolly are fine-tuned variations of LLaMa.

These models implement a technique known as Reinforcement Learning with Human Feedback (RLHF), where the model gets feedback from a human for each given instruction. The input prompt for these models is more descriptive and task-oriented than the prompts fed to the foundation models.

The Importance of Prompt Design

Prompt engineering is an essential skill for leveraging the full potential of LLMs. A well-designed prompt ensures clarity of intent, establishes context, controls output style, mitigates biases, and avoids harmful content. By carefully crafting prompts, users can enhance LLMs’ relevance, accuracy, and responsible usage of generative AI in various applications.

Two key aspects of prompt engineering are a thorough understanding of the LLM and a command of English. A poorly crafted prompt only generates a poor response that is half-baked and inaccurate, that’s close to hallucination. Using the correct vocabulary to instruct the model in the most concise form is critical to exploit the power of LLMs.

Since we will be dealing with multiple LLMs, it is also essential to understand the best practices and techniques specific to the model. This typically comes from the experience of using the model and carefully analyzing the documentation and examples published by the model provider. LLMs are also limited by the number of tokens – a form of compressing the input text – for accepting the input and generating the output. Prompts must adhere to the size restrictions imposed by the model.

Types of Prompts

Prompt engineering is still a fuzzy domain with no specific guidelines or principles. As LLMs continue to evolve, so will prompt engineering.

Let’s take a look at some of the common types of prompts used with current LLMs.

Explicit prompts
Explicit prompts provide the LLM with a clear and precise direction. Most of the time, they are clear and to the point, giving the LLM a simple task or a question to answer. When you need to come up with short, factual answers or finish a certain task, like summarizing a piece of writing or answering a multiple-choice question, explicit hints can help.

An example of an explicit prompt would look like “Write a short story about a young girl who discovers a magical key that unlocks a hidden door to another world.”

This explicit prompt clearly outlines the story’s topic, setting, and main element, providing the LLM with specific instructions on what to generate. By providing such a prompt, the LLM can focus its response on fulfilling the given criteria and create a story that revolves around the provided concept.

Conversational prompts
Conversational prompts are meant to get you to talk with the LLM in a more natural way. Most of the time, these questions are less organized and give the LLM more freedom in terms of length and style. Conversational prompts are great for making answers that feel more natural and flow better, like in chatbots or virtual assistants. Let’s take an example of a conversational prompt.

“Hey, Bard! Can you tell me a funny joke about cats?”

In this conversational prompt, the user initiates a conversation with the LLM and explicitly asks for a specific type of content, which is a funny joke about cats. The LLM can then generate a response that fulfills the user’s request by providing a humorous joke related to cats. This conversational prompt allows for a more interactive and engaging interaction with the LLM.

Context-based prompts
Context-based prompts give the LLM more information about the situation, which helps it come up with more correct and useful answers. These questions often include domain-specific terms or background information that helps the LLM understand the conversation or subject at hand. Context-based prompts are helpful in applications like content creation, where it’s important to make sure the output is correct and makes sense in the given context.

An example of a context-based prompt would be similar to the one shared below:

“I’m planning a trip to New York next month. Can you give me some recommendations for popular tourist attractions, local restaurants, and off-the-beaten-path spots to visit?”

In this context-based prompt, the user provides specific information about their upcoming trip to New York. The prompt includes the user’s intention to seek recommendations for popular tourist attractions, local restaurants, and off-the-beaten-path spots. This context-based prompt helps the LLM understand the user’s current situation and tailor its response by providing relevant suggestions and information specific to the user’s trip to New York.

Open-ended prompts
The open-ended prompt is another type of question posed to the LLM. It encourages the model to come up with longer, more detailed answers. Open-ended questions can help you write creatively, tell a story, or come up with ideas for articles or writings. These questions let the LLM give a more free-flowing answer and look at different ideas and points of view.

Consider the below prompt, which represents an open-ended prompt:

“Tell me about the impact of technology on society.”

In this open-ended prompt, the user initiates a broad topic of discussion without specifying any particular aspect or angle. The LLM has the freedom to explore various aspects of the impact of technology on society, such as social interactions, economy, education, privacy, or any other relevant areas. This open-ended prompt allows the LLM to provide a more comprehensive response by delving into different dimensions and perspectives related to the impact of technology on society.

Bias-mitigating prompts
Prompts can be designed in such a way that they force the LLMs to avoid possible biases in the output. For example, prompts can ask for different points of view or suggest that the LLM think about evidence-based thinking. These questions help ensure that the LLM has no hidden biases and that the results are fair and equal.

Below is an example of a prompt asking the LLM to avoid bias.

“Please generate a response that presents a balanced and objective view of the following topic: caste-based reservations in India. Consider providing multiple perspectives and avoid favoring any particular group, ideology, or opinion. Focus on presenting factual information, supported by reliable sources, and strive for inclusivity and fairness in your response.”

This prompt encourages the LLM to approach the topic in a neutral and unbiased manner. It emphasizes the importance of presenting multiple perspectives, avoiding favoritism, and relying on factual information from reliable sources. It also emphasizes inclusivity and fairness, urging the LLM to consider various viewpoints without promoting discrimination or prejudice. Providing this kind of prompt aims to mitigate potential biases and promote a more balanced output.

Code-generation prompts
Since the LLMs are trained on code repositories from the public domain, they can generate snippets in various languages. A code-based prompt for LLM is a prompt that asks the LLM to generate code in a specific language. The prompt should be specific and clear and provide enough information for the LLM to generate a correct answer. The following are some examples of code-based prompts for LLM:

“Write a Python function that takes in a list of integers as input and returns the sum of all the even numbers in the list.”

In this example, the prompt asks for a Python function that calculates the sum of all the even numbers in a given list. The generated code defines a function called sum_even_numbers that takes a list of integers as input. It initializes a variable sum to store the sum of even numbers and then iterates over each number in the input list. If a number is even (i.e., divisible by 2 with no remainder), it adds that number to the sum. Finally, the function returns the sum. It also adds the documentation and explains how it arrived at the solution.

In the next article of this series, we will explore the techniques used in prompt engineering with examples. Stay tuned.

The post Prompt Engineering: Get LLMs to Generate the Content You Want appeared first on The New Stack.

With ChatGPT, Honeycomb Users Simply Say What They’re Looking for

Joab Jackson — Thu, 18 May 2023 16:00:14 +0000

Right about now every company is trying to figure out how to get a competitive hand with ChatGPT. For observability platform provider Honeycomb, the OpenAI technology promises to make querying easier for its users.

The company has embedded into its user interface some natural language querying capabilities, called Query Assistant.

So when a system goes awry, and the operations team jumps on the Honeycomb platform to find out what’s up, they can ask questions in conversational English, rather than spending time parsing their ideas into a SQL-like query that the platform would understand.

The chat assist may even make the platform easy enough to interrogate that even non-specialists can use it to understand code behavior, promises Charity Majors, CTO of Honeycomb, in a phone conversation with The New Stack.

Prior to the Query Assistant, Honeycomb users constructed queries using a visual SQL-like language of Honeycomb’s, with the usual SQL clauses such as GROUP BY and WHERE and LIMIT, and so forth.

The Query Assistant offers a single blank box where users are urged to write their question, in the form of “Can You Show Me?” such as “Can you show me all the slow endpoints by status code?”

Majors prefers the ChatGPT approach over the traditional use of AI in operations, often dubbed AIOps. By its conversational format, ChatGPT educates the users on how the underlying systems works, through its question-and-answer format, whereas AIops may just provide a potential answer, with no indication of how it got there.

“Anytime you’re going to rely on AI to make the decision, it’s almost impossible for humans to follow the trail of breadcrumbs and figure out why it did what it did, and then take better actions on that,” Majors said.

Tracking the State

Overall, it took Honeycomb only about six weeks to ship this feature.

Honeycomb chose ChatGPT as a chat assistant because it offered the best mix of features, Majors said. No in-house AI experts were needed: Honeycomb’s in-house software engineers were easily able to work with the technology.

The company had to do a fair amount of work to prepare materials — such as defining keywords, data types, and schemas — that ChatGPT could use to interpret the queries, given that ChatGPT can’t as of now, hold state. For a user to have an extended conversation about a problem involves collecting a bunch of information to ship over for each query.

Nonetheless, Majors was surprised in how much ChatGPT could infer additional facts based on the input. Yet, ChatGPT had no hallucinations (or false facts conjured by ChatGPT). “I just feel like with numbers, ChatGPT is a little bit better than with words,” Majors said.

BubbleUp

Query Assistant is not Honeycomb’s first user-aid. Another feature BubbleUp, which allows the user to pick a subset of a dataset where something of interest appears to be taking place, where the platform will compute all the dimensions that lie inside of that bubble. But while BubbleUp makes the inspections process easier, it still requires to user to be somewhat conversant with the system’s own domain knowledge.

Query Assistant, Majors said, “is very much for the people who aren’t quite sure where to start.”

“It jumps over that whole needing-to-know SQL, and you’re just like, ‘I just deployed something, where are the errors?’ It leaps you to something that might not be exactly what you want, but then you can tweak it,” Majors said.

Query Assistant is a free feature for Honeycomb users, currently as an experimental offering. Admins can turn it off. No customer data is shared with OpenAI or used by Honeycomb internally.

The post With ChatGPT, Honeycomb Users Simply Say What They’re Looking for appeared first on The New Stack.

Vector Databases Are Having a Moment — A Chat with Pinecone

Richard MacManus — Wed, 17 May 2023 15:58:01 +0000

We first profiled Pinecone in early 2021, just after it launched its vector database solution. Since that time, the rise of generative AI has caused a massive increase in interest in vector databases — with Pinecone now viewed among the leading vendors.

To find out how Pinecone’s business has evolved over the past couple of years, I spoke with Elan Dekel, VP of Product at Pinecone. Prior to joining Pinecone last year, Dekel worked for more than 15 years (in two separate stints) at Google.

First of all, what is a vector database? Microsoft defines it as “a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes.” The data is stored as a vector via a technique called “embedding.”

In a recent post on The New Stack, TriggerMesh co-founder Mark Hinkle used the analogy of a warehouse to explain the use case for vector databases. “Imagine a vector database as a vast warehouse and the AI as the skilled warehouse manager,” Hinkle wrote. “In this warehouse, every item (data) is stored in a box (vector), organized neatly on shelves in a multidimensional space.” The AI can then retrieve or compare items based on their similarities. According to Hinkle, a vector database is “ideal for applications like recommendation systems, anomaly detection and natural language processing.”

What Is Pinecone Being Used for?

As you might imagine, it’s been quite a ride for Pinecone ever since generative AI burst into prominence last year. Dekel told me that Pinecone started its paid product in May of last year and has experienced fantastic growth since that time. The initial use cases were semantic search, he said, with a wide range of applications — such as FinTech, enterprise search, and more.

In December, after the announcement of ChatGPT, Pinecone’s growth accelerated even further. The power of large language models became evident to people, Dekel said, and they recognized that vector databases and embeddings were crucial for implementing this technology in real production systems. Pinecone began seeing a shift in the use cases after that.

Pinecone workflow; graphic via Pinecone

“Now,” said Dekel, “everybody’s asking us, how do I do ‘retrieval-augmented generation’? How do I build a chatbot play? You know, how do we utilize large language models in production — that sort of thing.”

Retrieval Augmented Generation (RAG) is a type of language generation model used by Meta AI and others. According to Dekel, it’s a process of augmenting a large language model — such as ChatGPT — with an external dataset to enhance its intelligence and generate more accurate responses.

He gave an example of a pharmaceutical company using RAG within its intranet, where they have proprietary research documents and domain-specific knowledge. The RAG process involves embedding the internal dataset, creating vectors from it, and storing them in a vector database. When a query is made, the intranet first interacts with the vector database, retrieving relevant content related to the query. This retrieved information serves as context for the large language model. The model is then prompted to answer the question using the provided context, generating a well-written English response.

Enticing New Users and Competing Against Big Players

I asked Dekel who are the primary users of Pinecone.

He replied that they have a diverse user base, including hobbyists interested in vector databases and embeddings, who often utilize the free offering. In terms of enterprise users, they cater to different user groups — such as ML engineers, data scientists, and systems and production engineers.

Of course, given the popularity of generative AI, there are now many other options for vector databases in the market — including from existing database companies that are bolting this functionality on. For example, Redis offers vector database functionality in its Redis Enterprise product.

Dekel claims that Pinecone’s advantage is its ability to deal with scale. For small-scale use cases, running vector retrieval on a laptop with sample code found online can suffice. As the usage tier increases, solutions like Redis and PostgreSQL with vector plugins can be adequate. For large-scale usage, he said, a custom-designed system becomes necessary. He noted that Pinecone’s solution allows large companies to run billions of vectors on hundreds or more machines.

It’s not just existing vendors, though. Multiple specialist vector database products have emerged recently, such as the open source Chroma. How does Pinecone differentiate itself from them?

One way to differentiate, he replied, is by considering open source versus closed-source managed services. Pinecone believes that a managed service is what companies truly need, especially as they scale. Open source solutions can become challenging to manage and optimize when running on a large number of machines, he claimed.

He pointed out the considerations involved in building a production vector database — including data management, metadata handling, scalability, real-time updates, backups, ecosystem integrations, security measures, compliance (such as HIPAA), and more.

Dekel added that Pinecone can also be integrated with data lake providers, like Databricks. The data usually resides elsewhere and needs to be transformed into embeddings by running it through an ML model, he said. After processing and chunking the data, the resulting vectors are sent to Pinecone. Companies like Databricks manage this pipeline by handling data, running models, and hosting them effectively, he explained. Pinecone offers a connector with Databricks, to ensure synchronization throughout the entire process.

Will Every Company Need a Vector Database?

Given the highly promising future of generative AI, I wondered whether every enterprise company will need to eventually adopt vector databases.

Dekel replied that he witnessed the power and importance of vector embeddings over several years while integrating them into Google’s infrastructure, during his second stint with Google (which ended last year). So he believes that vector databases represent a paradigm shift in data utilization, especially as the use of unstructured data — such as images, videos, audio files and webpages — continues to grow exponentially. Vector embeddings are crucial for retrieving and working with this type of data, he said.

There’s no doubt vector databases are all the rage currently, similar perhaps to the massive shift from SQL to NoSQL in the enterprise market a decade ago. So if you’re a developer working with generative AI (and who isn’t these days), it’ll be worth your time learning how to use a vector database.

The post Vector Databases Are Having a Moment — A Chat with Pinecone appeared first on The New Stack.

Vector Databases: Long-Term Memory for Artificial Intelligence

Mark Hinkle — Fri, 12 May 2023 14:00:12 +0000

Artificial Intelligence, such as ChatGPT, acts much like someone with endemic memory who goes to a library and reads every book. However, when you ask an AI a question that was not in the book at the library, it either admits it doesn’t know or hallucinates.

An AI hallucination refers to instances where an artificial intelligence system generates an output that may seem coherent or plausible but is not grounded in reality or accurate information. These outputs can include text, images or other forms of data that the AI model has produced based on its training but may not align with real-world facts or logic.

For example, we could use a generative AI for images like the ones Midjourney provides to generate a picture of an old man. However, the prompt (the way you communicate with an AI like Stable Diffusion or others) has to be something that the model understands. For example, you may ask the AI to create a picture of a man who is over the hill. In this case, I used Midjourney, a popular generative AI for images, to do just that. I used an example that I thought might cause it to hallucinate.

Midjourney doesn’t understand euphemisms like over the hill, so it generated a picture of a man who was literally over the top of a hill.

How could you inform the AI what you mean by “over the hill,” and other nuances of language it doesn’t know of? First, you could provide training data. The way you would do this is to convert that data into something known as embeddings, and then import them into a vector database.

While this example is a bit far-fetched for effect, many other contexts apply. For example, industry-specific terminology for medical and legal fields would benefit from being able to train AI on their specific terminology and meanings. Enterprises will want to provide their data to AI without introducing public models.

A critical use case for vector databases is large language models to retrieve domain-specific or proprietary facts that can be queried during text generation. Therefore, vector databases will be essential for organizations building proprietary large language models.

Vector vs. NoSQL and SQL Databases

Traditional databases, such as relational databases (e.g., MySQL, PostgreSQL, Oracle) and NoSQL databases (e.g., MongoDB, Cassandra), have been the backbone of business data management for decades. They store and organize data in structured formats like tables, documents or key-value pairs, making it easier to query and manipulate using standard programming languages.

These databases excel at handling structured data with fixed schema, but they often struggle with unstructured data or high-dimensional data, such as images, audio and text. Moreover, as the volume and velocity of data increase, they may face performance bottlenecks, leading to slower response times and scalability issues.

Vector databases, on the other hand, represent a paradigm shift in data storage and retrieval. Instead of relying on structured formats, they store and index data as mathematical vectors in high-dimensional space. This approach, called “vectorization,” allows for more efficient similarity searches and better handling of complex data types, such as images, audio, video and natural language.

Imagine a vector database as a vast warehouse and the AI as the skilled warehouse manager. In this warehouse, every item (data) is stored in a box (vector), organized neatly on shelves in a multidimensional space. The warehouse manager (AI) knows the exact position of each box and can quickly retrieve or compare the items based on their similarities, just like a skilled warehouse manager can find similar group products.

The boxes represent different types of unstructured data, such as text, images or audio, which have been transformed into a structured numerical format (vectors) to be efficiently stored and managed. The more organized and optimized the warehouse is, the faster and more accurately the warehouse manager (AI) can find the items needed for various tasks, such as making recommendations, recognizing patterns or detecting anomalies.

This analogy helps convey the idea that vector databases serve as a crucial foundation for AI systems, enabling them to efficiently manage, search and process complex data in a structured and organized manner. Just as a well-managed warehouse is essential for smooth business operations, a vector database plays a vital role in the success of AI-driven applications and solutions.

The key advantage of vector databases is their ability to perform approximate nearest neighbor (ANN) search, quickly identifying similar items in a large dataset. Using techniques like dimensionality reduction and indexing algorithms, vector databases can perform these searches at scale, providing lightning-fast response times and making them ideal for applications like recommendation systems, anomaly detection and natural language processing.

Embeddings — Turning Words, Images and Videos into Numbers

Embeddings are techniques that convert complex data, such as words, into simpler numerical representations (called vectors). This makes it easier for AI systems to understand and work with the data. Probability helps create these representations by analyzing how often certain pieces of data appear together.

Probability helps quantify the similarity of two pieces of data, allowing the AI system to find related items. Probability-based techniques help AI systems quickly find similar data points in large databases without examining every item. Probability helps AI systems group similar data points together and reduce the complexity of the data, making it easier to process and analyze.

Popular Vector Databases

While there are an ever-growing number of vector databases, several factors contribute to their popularity. These factors include efficient performance in storing, indexing and searching high-dimensional vectors, ease of use in integrating with existing machine learning frameworks and libraries, scalability in handling large-scale, high-dimensional data, flexibility in offering multiple backends and indexing algorithms, and active community support with valuable resources, tutorials and examples.

Vector databases that are more likely to be popular among users are ones that provide fast and accurate nearest-neighbor search, clustering, and similarity matching, and that can be easily deployed on cloud infrastructure or distributed computing systems. Based on popularity among users and the number of stars on Github, here are some of the most popular vector databases.

Pinecone: Pinecone is a cloud-based vector database designed to efficiently store, index and search extensive collections of high-dimensional vectors. Pinecone’s key features include real-time indexing and searching, handling sparse and dense vectors, and support for exact and approximate nearest-neighbor search. In addition, Pinecone can be easily integrated with other machine learning frameworks and libraries, making it popular for building production-grade NLP and computer vision applications.
Chroma: Chroma is an open source vector database that provides a fast and scalable way to store and retrieve embeddings. Chroma is designed to be lightweight and easy to use, with a simple API and support for multiple backends, including RocksDB and Faiss (Facebook AI Similarity Search — a library that allows developers to quickly search for embeddings of multimedia documents that are similar to each other). Chroma’s unique features include built-in support for compression and quantization, as well as the ability to dynamically adjust the size of the database to handle changing workloads. Chroma is a popular choice for research and experimentation due to its flexibility and ease of use.
Weaviate: Weaviate is an open source vector database designed to build and deploy AI-powered applications. Weaviate’s key features include support for semantic search and knowledge graphs and the ability to automatically extract entities and relationships from text data. Weaviate also includes built-in support for data exploration and visualization. Weaviate is an excellent choice for applications that require complex semantic search or knowledge graph functionality.
Milvus: Milvus is an open source vector database designed for large-scale machine-learning applications. Milvus is optimized for both CPU and GPU-based systems and supports exact and approximate nearest-neighbor searches. Milvus also includes a built-in RESTful API and support for multiple programming languages, including Python and Java. Milvus is a popular choice for building recommendation engines and search systems that require real-time similarity searches. Milvus is part of the Linux Foundation’s AI and Data Foundation, but the primary developer is Zilliz.
DeepLake: DeepLake is a cloud-based vector database that is designed for machine learning applications. DeepLake’s unique features include built-in support for streaming data, real-time indexing and searching, and the ability to handle both dense and sparse vectors. DeepLake also provides a RESTful API and support for multiple programming languages. DeepLake is a good choice for applications that require real-time indexing and search of large-scale, high-dimensional data.
Qdrant: Qdrant is an open source vector database designed for real-time analytics and search. Qdrant’s unique features include built-in support for geospatial data and the ability to perform geospatial queries. Qdrant also supports exact and approximate nearest-neighbor searches and includes a RESTful API and support for multiple programming languages. Qdrant is an excellent choice for applications that require real-time geospatial search and analytics.

As in the case of SQL and NoSQL databases, vector databases come in many different flavors and address various use cases.

Use Cases for Vector Databases

Artificial intelligence applications rely on efficiently storing and retrieving high-dimensional data to provide personalized recommendations, recognize visual content, analyze text and detect anomalies. Vector databases enable efficient and accurate search and analysis of high-dimensional data, making them essential for developing robust and efficient AI systems.

Recommender Systems

In recommender systems, vector databases have the crucial function of storing and proposing items that best match users’ interests and preferences. These databases facilitate fast and effective searches for similar items by representing items as vectors. This feature allows AI-powered systems to provide personalized recommendations, thus improving user experiences on social networks, streaming services and e-commerce websites.

One commonly used AI-powered recommendation system is the one used by Amazon. Amazon uses a collaborative filtering algorithm that analyzes customer behavior and preferences to make personalized recommendations for products they might be interested in purchasing.

This system considers past purchase history, search queries and items in the customer’s shopping cart to make recommendations. Amazon’s recommendation system also uses natural language-processing techniques to analyze product descriptions and customer reviews to provide more accurate and relevant recommendations.

Image and Video Recognition

In image and video recognition, vector databases store visual content as high-dimensional vectors. These databases empower AI models to efficiently recognize and understand images or videos, find similarities, and perform object recognition, face recognition, or image classification tasks. This has applications in security and surveillance, autonomous vehicles and content moderation.

One commonly used image and video recognition system powered by AI is the TensorFlow Object Detection API. This open source framework developed by Google allows users to train their own models for object detection tasks, such as identifying and localizing objects within images and videos.

The TensorFlow Object Detection API uses deep learning models, such as the popular Faster R-CNN and SSD models, to achieve high accuracy in object detection. It also provides pre-trained models for everyday object detection tasks, which can be fine-tuned on new datasets to improve performance.

Natural Language Processing (NLP)

Vector databases play a critical role in NLP by storing and managing information about words and sentences as vectors. These databases enable AI systems to perform tasks such as searching for related content, analyzing the sentiment of a piece of text or even generating human-like responses. By harnessing the power of vector databases, NLP models can be used for applications like chatbots, sentiment analysis or machine translation.

One commonly used NLP system is the Natural Language Toolkit (NLTK). NLTK is a comprehensive platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources and a suite of text-processing libraries for classification, tokenization, stemming, tagging, parsing, semantic reasoning and more. Researchers and practitioners widely use NLTK in academia and industry, and it is a popular choice for teaching NLP concepts and techniques.

Anomaly Detection

Vector databases can help detect unusual activities or behaviors in various areas, such as cybersecurity, fraud detection or industrial equipment monitoring. These databases can quickly identify patterns that deviate from the norm by representing data as vectors. AI models integrated with vector databases can then flag these anomalies and trigger alerts or mitigation measures, ensuring timely and effective responses.

Microsoft Azure Anomaly Detector is a cloud-based service that allows users to monitor and analyze time series data to identify anomalies, spikes and other unusual patterns. Azure Anomaly Detector uses advanced AI algorithms such as Seasonal Hybrid ESD (S-H-ESD) and Singular Spectrum Analysis (SSA) to automatically detect and alert users when anomalous behavior is caught in the data. It also provides a simple REST API for developers to integrate the service into their applications and workflows efficiently.

Summary

Vector databases are critical to many artificial intelligence (AI) applications, including recommender systems, image and video recognition, natural language processing (NLP) and anomaly detection. By storing and managing data as high-dimensional vectors, these databases enable efficient and accurate search and analysis of large datasets, leading to enhanced user experiences, improved automation, and timely detection of anomalies. In the realm of recommender systems, vector databases allow for the quick identification of items most relevant to users’ preferences.

At the same time, image and video recognition enables efficient object and face recognition. Vector databases play a crucial role in NLP by storing and managing information about words and sentences as vectors. In anomaly detection, they enable quick identification of unusual patterns or behaviors. Overall, vector databases are essential for developing robust and efficient AI systems across various domains.

The post Vector Databases: Long-Term Memory for Artificial Intelligence appeared first on The New Stack.

Google’s New TensorFlow Tools and Approach to Fine-Tuning ML

Richard MacManus — Wed, 10 May 2023 20:00:46 +0000

Today at Google I/O, the web giant’s annual developer conference, Google announced a bunch of new AI tools — including new tooling for the TensorFlow ecosystem, a new one-stop shop for developers called ML Hub, and upgrades to its cross-platform set of ML solutions called MediaPipe.

Ahead of the announcements, I conducted an email interview with Alex Spinelli, Vice President of Product Management for Machine Learning at Google.

The new tools for TensorFlow include KerasCV and KerasNLP (allowing developers access to new “pre-trained” models), DTensor (for scaling via parallelism techniques), JAX2TF (a lightweight API for the JAX numerical framework), and the TF Quantization API (which is “coming soon,” but will allow developers to build models that are “cost and resource efficient”).

The KerasCV + KerasNLP suite (image via Google)

State of Google’s LLMs

I asked Spinelli whether developers will be able to use any of the above tools on Google’s large language models (LLMs)?

“In March, we announced that developers who are experimenting with AI can build on top of our language models using the PaLM API,” he replied. “As part of that announcement, we made an efficient model of PaLM available, in terms of size and capabilities, and we’ll add other sizes soon. The API also comes with an intuitive tool called MakerSuite, which lets developers quickly prototype ideas and, over time, will have features for prompt engineering, synthetic data generation and custom-model tuning — all supported by robust safety tools.”

Spinelli added that at I/O, Google will be opening up a “private preview” of the PaLM API, “so more developers can prototype directly on the web with MakerSuite or with the tools they know and love, with integrations in Firebase and Colab.”

Why Use TensorFlow and Not LLMs

PaLM is Google’s biggest LLM, at 540 billion parameters, but it has a few other LLMs listed on the Stanford HELM index: Flan-T5 (11B), UL2 (20B), and T5 (11B). I asked Spinelli why a developer might want to use ML models via TensorFlow instead of Google’s LLMs. In other words, are there specific use cases that are best for TensorFlow?

He replied with three different use cases for TensorFlow ML:

A developer wants to build their own model;
A developer can solve a problem by using someone else’s model — either directly, or by fine-tuning it; and
A developer can solve a problem by using a hosted large model — be it language, images, or a multi-modal combination of both.

On the first use case, Spinelli said a combo of TensorFlow and Keras (a software library with a Python interface that interfaces with the TensorFlow library) was the best choice to build your own model. “They make it easy for you to define model architecture and train on your own data,” he said.

TensorFlow and Keras are also the best choice when using someone else’s model, Spinelli told me.

“Many models (see Kaggle Models or tfhub.dev) have been created by other developers with extension via Transfer Learning in mind,” he continued. “TF [TensorFlow] makes it super simple for you to do this to — for example — take a model that’s great at recognizing generic images, and retrain it to be excellent at spotting specific, particular, images; like diseases on an X-Ray.”

As for using a hosted large model, Spinelli said that “We’re working to extend TF and Keras to make their high-level APIs useful for developers to access existing large-language or other generative models.”

Fine-Tuning in Google’s Models

There is mention of devs being able to train models with the new tools, but no mention of fine-tuning. TensorFlow’s own documentation defines fine-tuning as training “the weights of the top layers of the pre-trained model alongside the training of the classifier you added.”

Fine-tuning is something that Meta offers with its LLaMA model, but no other big LLM currently offers access to the weights. So I asked Spinelli if there is anything in the new tools that will help devs with this fine-tuning.

“In its strictest sense, fine-tuning involves creating an entirely new instance of a model, but with some parts retrained for one’s specific scenario,” he replied. “However, when dealing with LLMs, you don’t usually do that, with the exception that you noted [LLaMA], because of the storage and costs involved.”

Spinelli claims that developers can get the same overall effect of fine-tuning using what he called “prompt tuning” or “parameter efficient tuning” [PET]. He said that both can be done with MakerSuite. “You can also prompt tune and P.E.T. programmatically with the PaLM API,” he added.

With all that said, Spinelli noted there will be one exception to the “prompt tune” and PET approaches. With Cloud AI (part of the Google Cloud suite), he said, “you can fine-tune our code-generation model with your own codebase, and you’ll get a private VPC with that instance of our codegen model that you can use to generate code that is aware of your particular codebase as well as our general purpose one.”

An ML Hub

With all these new product announcements, Google clearly wants to become a hub for ML developers — similar to how it caters to web developers with regular browser, SEO and other web platform updates. The new front page for ML developers, ML Hub, is being positioned as a kind of portal to “enable developers to build bespoke ML solutions.” It will likely be similar to web.dev, Google’s front page for web developers.

Indeed, like Google’s web development tooling, there is something for everyone in Google’s newly expanded ML toolset — including ways to access those much larger, and trendier, generative models.

The post Google’s New TensorFlow Tools and Approach to Fine-Tuning ML appeared first on The New Stack.