This article is part of our coverage of the latest in AI research.
In early May, Meta released the Open Pretrain Transformer (OPT-175B), a large language model (LLM) that can perform a variety of tasks. Large language models have become one of the hottest areas of research in artificial intelligence over the past few years.
The OPT-175B is the latest entrant in the LLM arms race initiated by OpenAI’s GPT-3, a deep neural network with 175 billion parameters. The GPT-3 showed that LLMs could perform multiple tasks without additional training and without looking at only a few examples (zero- or few-shot learning). Microsoft later integrated GPT-3 into many of its products, showing not only the scientific but also commercial promises of the LLM.
hello humanoids
Subscribe now for a weekly recap of our favorite AI stories
As the model name suggests, what makes OPT-175b unique is META’s commitment to “openness”. Meta has made the model available to the public (with a few caveats). It has also released a ton of details about the training and development process. In a post published on the Meta AI Blog, the company described the release of the OPT-175B as a “democratic access to the language model at scale.”
META’s move towards transparency is commendable. However, competition on the larger language model has reached a point where it can no longer be democratised.
Meta’s release of the OPT-175B has some key features. This includes both the code required to train and use the LLM as well as the pre-trained model. Pre-trained models are particularly useful for organizations that do not have the computational resources to train models (training neural networks is much more resource-intensive than running them). It will also help in reducing the carbon footprint largely due to the computational resources required to train large neural networks.
Like the GPT-3, OPT comes in a variety of sizes, ranging from 125 million to 175 billion parameters (models with more parameters have greater learning potential). At the time of this writing, all models up to the OPT-30B are available for download. The full 175 billion-parameter model will be made available to select researchers and institutions who fill out the request form.
According to the Meta AI Blog, “To maintain integrity and prevent abuse, we are releasing our model under a non-commercial license to focus on research use cases. Access to the model is granted to academic researchers.” Those affiliated with organizations from the government, civil society and academia, along with industry research laboratories around the world.
In addition to the models, META has released a complete logbook that provides a detailed technical timeline of the development and training process of large language models. Published papers usually only include information about the final model. According to Meta, the logbook “provides valuable insight into how much computation was used to train the OPT-175B and the need for human overhead when the underlying infrastructure or the scale of the training process becomes unstable.”
In its blog post, Meta states that large language models are mostly accessible through “paid APIs” and that limited access to the LLM “limits researchers’ ability to understand how and why these large language models work, their hinder progress on efforts to improve robustness. and reduce known issues such as bias and toxicity.”
It is a jab at OpenAI (and by extension Microsoft), which released the GPT-3 as a black-box API service, rather than making its model weights and source code available to the public. One of the reasons OpenAI asked not to make GPT-3 public was controlling the development of abuse and harmful applications.
META believes that by making the models available to a wider audience, it will be in a better position to study them and prevent any harm they may cause.
Here’s how Meta describes the effort: “We hope that OPT-175b will bring more voice to the limits of larger language model building, help the community collectively design responsible release strategies, and help large will add an unprecedented level of transparency and openness to the development of the scale language model in the region.”
However, it is worth noting that “transparency and openness” does not equate to “the democratization of the large language model”. The cost of training, configuring and running large language models remains prohibitive and is likely to increase in the future.
According to Meta’s blog post, its researchers have managed to significantly reduce the cost of training large language models. The company says that the model’s carbon footprint has been reduced to one-seventh of the GPT-3. The experts I spoke to previously estimated the cost of training for the GPT-3 to be up to $27.6 million.
This means that training the OPT-175B will still cost several million dollars. Fortunately, the pre-trained model will eliminate the need to train the model, and Meta says it will provide the codebase used to train and deploy the full model “using only 16 NVIDIA V100 GPUs”. Will do This is comparable to an Nvidia DGX-2, which costs about $400,000, not a small amount for a cash-constrained research lab or an individual researcher. (According to a paper providing more details on the Opt-175B, Meta trained its own model with 992 80GB A100 GPUs, which are significantly faster than the V100.)
Meta AI’s logbook further confirms that training large language models is a very complex task. The OPT-175B timeline is fraught with server crashes, hardware failures, and other complications that require highly technical personnel. Researchers had to restart the training process several times, changing hyperparameters, and changing loss functions. All these involve additional cost which small laboratories cannot afford.
Language models such as OPT and GPT are based on transformer architecture. One of the key features of transformers is their ability to process large sequential data (eg, text) in parallel and at scale.
In recent years, researchers have shown that by adding more layers and parameters to transformer models, they can improve their performance on language tasks. Some researchers believe that reaching high levels of intelligence is only a scale problem. Accordingly, cash-rich research labs like Meta AI, DeepMind (owned by Alphabet) and OpenAI (backed by Microsoft) are moving towards building bigger and bigger neural networks.
Last year, Microsoft and Nvidia created a 530 billion parameterized language model called Megatron-Turing (MT-NLG). Last month, Google introduced the Pathway Language Model (PaLM), an LLM with 540 billion parameters. And there are rumors that OpenAI will release GPT-4 in the next few months.
However, large neural networks also require large financial and technical resources. And while larger language models will have new bells and whistles (and new failures), they will inevitably centralize power in the hands of a few wealthy companies, making it even more difficult for smaller research labs and independent researchers to work on larger language models. Will be done.
On the business side, big tech companies will benefit even more. Running large language models is very expensive and challenging. Companies like Google and Microsoft have specialized servers and processors that allow them to run these models on a large scale and profitably. For smaller companies, the overhead of running their own version of an LLM such as the GPT-3 is very prohibitive. Just as most businesses use cloud hosting services rather than setting up their own servers and data centers, out-of-the-box systems such as the GPT-3 API will gain more traction as larger language models become more popular. .
This in turn will further centralize AI in the hands of big tech companies. More AI research labs will have to partner with big tech to fund their research. And it will give the big tech more power to decide the future directions of AI research (which will probably be aligned with their financial interests). This may come at the cost of areas of research that do not have a short-term return on investment.
The bottom line is that, as we celebrate META’s move to bring transparency to LLMs, let’s not forget that the large language model is undemocratic in nature and favors the same companies that are promoting them.
This article was originally written by Ben Dixon and published by Ben Dixon on TechTalk, a publication that examines technology trends, how they affect the way we live and do business, and what they do. solve problems. But we also discuss the bad side of technology, the deeper effects of new technology, and what we need to pay attention to. You can read the original article here.


