GPT-4 will arrive next week according to Microsoft. And its great novelty is that it will be “multimodal”


This week four Microsoft engineers in its German division organized an event dedicated to the revolution that LLMs (Large Language Models) like GPT are bringing to the company. As part of that conference, they gave surprise details of the expected new version of the OpenAI model.

GPT-4. When GPT-3 appeared in 2020, it was in the form of a private beta. That prevented that model from demonstrating its capability, but in 2022 the appearance of ChatGPT—based on an iteration of GPT-3—changed everything. For months there has been talk of what awaits us with GPT-4, and the CTO of Microsoft in Germany, Andreas Braun, stated according to Heise Online that this engine will arrive next week.

Kosmos-1. The arrival of GPT-4 seemed especially close after Microsoft’s announcement in early March of the release of Kosmos-1, a Multimodal Large Language Model (MLLM) that responds not only to text prompts, but also to images. That makes it behave in a certain way like Google Lens and is capable of extracting information and context from an image.

bigger better. One of the clear characteristics that are expected from GPT-4 is that it has a larger size than GPT-3. While it has 175,000 million parameters, there is talk that GPT-4 will have 100 trillion parameters, something that Sam Altman, CEO of AI, explained that “it is complete stupidity”. Even so, what is certain is that it will be bigger, and that will allow it to be able to respond to more complex situations and generate even more “human” responses.


Multimodal? This is one of the great innovations —if it is not the greatest— of GPT-4, a multimodal model that, as already outlined in Kosmos-1, will allow the input to be from different sources or “modalities” such as text —what used in ChatGPT—, images, video, spoken voice, or other formats.

Give me data, I already analyze them. These models use deep learning and natural language processing to understand the relationships and correlations between these different types of data. By combining multiple “modalities”, the AI ​​model can improve its accuracy and provide analysis of complex data.

An example: the video. An immediate practical application of these models is that of video. With GPT-4, theoretically, a video and its associated audio can be given as input so that the engine understands the conversation and even the emotions of those who take part in it. You will also be able to recognize objects (or people) and extract information. Thus, one could get a summary of a movie or a YouTube video like we now get meeting summaries.

saving time. One of Microsoft’s engineers pointed out how this type of engine would be helpful in call centers, where GPT-4 could transcribe calls and then summarize them, something human agents normally have to do. According to his estimates, this could save 500 hours of work a day for a Microsoft customer in the Netherlands who receives 30,000 calls a day: the prototype was created in two hours, a developer spent a couple of weeks on it, and the result was apparently a success.

GPT-4 will continue to make mistakes. Although the new model will undoubtedly be more powerful, at Microsoft they wanted to make it clear that artificial intelligence will not always answer correctly and it will be necessary to validate the answers.

Just in case, let’s be cautious. The expectation with GPT-4 is enormous, and in fact even Sam Altman himself, CEO of OpenAI, already made it clear weeks ago that industry and users should lower those expectations because “people are crying out to be disappointed, and that is what will happen.”

In crast.net | “I couldn’t go to sleep watching it grow so big”: we spoke to the creator of Abbreviame, the viral ChatGPT-based bot


Related News

Pikmin 4 annoncéiert bei Nintendo Direct

Wärend Nintendo Direct hu mir eng Successioun vu ganz massiven Trailer gesinn. Ee vun dësen huet Pikmin 4 gewisen, deen op Nintendo Switch am Laf vun 2023 kënnt. Duerno

WhatsApp: wéi een eng Stëmmnotiz lauschtert ouni bemierkt ze ginn

Hutt Dir schonn déi lescht Versioun vu WhatsApp? Déi lescht Versioun vun der App enthält elo Communautéiten, eng Funktioun déi Iech erlaabt mat 512 Leit ze chatten

WhatsApp: wat heescht de schwaarzen Häerz Emoji

WhatsApp wäert et net nëmme méi Emojis op seng Plattform bäidroen. Dorënner sinn méi Déieren, Beruffer, dat zidderen Gesiicht an esouguer nei Häerzer. Obwuel

Google Maps: den Trick fir ze wëssen wou Dir Ären Auto a Momenter geparkt hutt

Google Maps ass eng vun den Uwendungen déi am meeschte geschätzt gi vu Benotzer, well Dir kënnt verschidde Funktiounen derbäi fir eng besser Erfahrung ze hunn. Ee vun hinnen ass kënnen

Facebook Messenger: wéi Dir Messagen aktivéiert déi sech selwer zerstéieren

Sidd Dir ee vun de Leit déi Facebook um Computer benotzt? De sozialen Netzwierk ännert sech op der Plattform fir de PC. Et mécht de Moment ganz vill Sich

Hei ass wou all Google App en Android Tablet UI kritt, a wéi eng Updates live sinn [U: Google TV]

Beim I/O 2022 huet Google ugekënnegt datt et méi wéi 20 vu sengen Éischt-Party Apps fir de gréisseren Ecran aktualiséieren an enger Demonstratioun vu sengem Engagement fir d'Form