February 9, 2023
Large language models – the artificial intelligence systems that can have seemingly human chat conversations – are very much in the news these days. And make no mistake, they are large and quite comprehensive.
But just how big can AI models get? The answer to that question provides a fascinating window into the development of AI language models and their potential applications.
Two Ways of Measuring
When experts talk about the advancement in AI, they usually focus on two aspects of the model’s size: the amount of data used to train the model, and the number of parameters that the model contains.
For example, GPT-3, the predecessor to the much discussed ChatGPT, was trained on nearly 45 terabytes of text data, and has more than 175 billion parameters. Other AI models are getting even larger, with large companies working on models said to exceed 1.6 trillion parameters.
But what does that mean? Let’s break it down, starting with a look at what a parameter is.
A parameter is a value or variable that an AI model uses to generate its output based on the data it has seen. In the case of a language model like GPT-3, for example, that output is text.
“Although they really do seem like magic, it is easier to think of [natural language models] as highly sophisticated autocomplete functions,” said IEEE Member Yale Fox. “You provide an input, which is commonly referred to as a prompt, in the form of a question. The model then ‘autocompletes’ your answer.”
The output is based on previously-recognized patterns. With large language models, the autocomplete function is vastly more complex because the models have been trained on more data and feature more parameters.
“The number of parameters has an influence on the variety of outputs; the more parameters used, the less repetitive your outputs will be,” Fox said.
Which brings up a second question. Just how much information is 45 TB of text? Quite a lot. One TB equals approximately 6.5 million pages of documents stored in common formats like word processing documents or .pdfs.
“As a general rule, having more data to train from leads to better performance in many types of models,” said IEEE Senior Member Eleanor “Nell” Watson. “It allows the model to learn more about the underlying patterns and relationships found within the data.”
But the number of parameters isn’t always directly related to the size of the training data. Developers could, for example, train a very large model on 10 books, or a smaller model on 1,000 books, and they may have similar performance.
“Larger models are exponentially more expensive to train, as well as vastly more difficult to audit for issues such as bias, and to make them explainable,” Watson said. “Having too many parameters applied to too little data can make a model more prone to overfitting (generalizing inaccurately from an example which is given too much prominence).”
And simply having a large amount of data to train a model isn’t necessarily a benefit.
“Ten TB of tweets from people around the world may not be as useful as even 1 TB of fact-based knowledge from Wikipedia,” Fox said.
A Question of Scale
That situation is leading to some interesting questions in the world of AI. Namely, just how big can AI models get?
Researchers have noted that, for example, doubling the number of parameters in a model may not yield twice the performance. It may cost many multiples more in money, time and computing resources to build. One solution might be to increase the amount of data used in training the model, though it is unclear how much data you need, and whether it exists.
“It’s therefore being argued that the greatest limiting factor to many latest models may in fact be a lack of quality data in sufficient scale and nuance to allow them to operate at their full capacity,” Watson said.
So, what accounts for the huge leap that language models have made in recent months?
Watson notes that these improvements are a result of a combination of factors, including an increase in the number of parameters, better use of data, and improved training techniques. The engineers behind ChatGPT have emphasized a “human-in-the-loop” approach, where the model is continually fine-tuned and improved based on feedback from human evaluators.
And, as a recent article in IEEE Computer Magazine points out, researchers have turned to a variety of techniques to improve AI models and their efficiency. These include things like better hardware and software developments and different computer architectures, but also using multi-modal training data that combine text and images or video.
“The trend of increasing AI model size does not appear to be ceasing,” the author notes. “Nevertheless, only a few major companies and resourceful institutes can keep pace with this trend because the barriers to entry are considerable.”