The new model uses the Switch Transformer approach to deal with complexity.
The number of parameters is the key to the power of machine learning algorithms. Basically, the number of parameters is the biggest factor when it comes to the complexity of an AI model. OpenAI’s GPT-3, for example, also known as one the largest language models ever trained, has 175 billion parameters and is capable of doing analogies or even some basic code.
Google researchers state they managed to develop a language model that is powered by more than a trillion parameters. Their new 1.6-trillion-parameter model is said to offer up to 4 times speed compared to the previous largest Google-trained language model (T5-XXL).
Researchers think that large-scale training is the ultimate path toward powerful models. Simple architectures, combined with large datasets and large numbers of parameters, lead to more complicated algorithms. The thing is that large-scale training is extremely computationally intensive, so they're now using the Switch Transformer, a “sparsely activated” technique that is said to use a subset of a model’s parameters that transform input data within the model.
What they want to do is to have multiple models specialized in different tasks, all inside one larger model that is like a "gating network" choosing the right model for a given task.
You can find the full paper here. Don't forget to join our new Telegram channel, our Discord, follow us on Instagram and Twitter, where we are sharing breakdowns, the latest news, awesome artworks, and more.