After the release of ChatGPT, it shook the entire technology community. One remarkable thing that OpenAI has achieved is to make everyone reach a consensus in a very short period, acknowledging the tremendous potential of this technology. Such a widespread consensus is not easy to achieve.
I consider myself a technology enthusiast, and I believe the emergence of ChatGPT is a significant development and the beginning of an era. I described it as “the steam engine of the intelligent era.” Although ChatGPT still has many shortcomings for the intelligent era, when we look back in three years, it will undoubtedly be a different story. Back in February this year, some of my friends expressed doubts about this, and even some senior figures in the NLP field believed it was just a probabilistic game. It is normal to have different views when a new thing appears. However, today, I believe that fewer and fewer people hold such opinions, and even those outside the technical community are increasingly buying into this technology. We can also see that people in the technology community are quite anxious. Over the past few months, every morning when we wake up and check the news, we see new models or new applications coming out. It’s challenging to keep up with all the papers, let alone trying them out ourselves. So there is a sense of anxiety among people in the entire technology community.
The progress of large models such as GPT-4, PaLM2, or StableDiffusion is extremely rapid. During a Zhihu conference held in April, I mentioned that objectively, there is still a difference in language capabilities between large models in Chinese and English, with their Chinese ability being slightly weaker. The primary reason behind this is data. Large models have become completely data-centric. The amount and quality of data determine the model’s capability in the era of large models.
However, in our work, we can also see that the corpus in China is still very rich, although the publicly accessible data may be relatively limited. Nevertheless, we are pleased to see that the Beijing municipal government recently released a document (“Measures for Promoting the Innovative Development of General Artificial Intelligence in Beijing (2023-2025) (Draft for Soliciting Opinions)”). The government is also aware of the importance of open data and is organizing the industrial and academic sectors to build open databases.