网易首页 > 网易号 > 正文 申请入驻

Will Sora Put Hollywood Out of Job?

0
分享至

(About the author: Xu Siqing, founding partner and CEO of Alpha Startup Fund, is a serial entrepreneur who has embarked on three entrepreneurial ventures. He contributed to the IPO of Alpha Startup Fund on NASDAQ as COO in 2010. He served as an investment partner at Sinovation Ventures, Chief Marketing Officer at Qihoo 360, and later joined WI Harper Group as Managing Director, overseeing investment and management operations in China. In 2015, he established angel investment firm Alpha Startup Fund.

Xu has over 20 years of experience in the IT, Internet, and telecommunications industries, having established Microsoft South China business and served as its first General Manager. He has also held positions such as the General Manager of Data Business at China Netcom Co., Ltd. and the Chief Marketing Officer at eLong Travel.

Xu has been awarded the title of "Forbes China Best Venture Capitalist Top 100" in 2020, 2022, and 2023, and has received multiple awards including 36Kr's "Most Popular Investor Among Entrepreneurs in 2023." Xu holds a Bachelor’s degree in modern mechanics from the University of Science and Technology of China and a Master’s degree in material physics from the Chinese Academy of Sciences.)

BEIJING, February 19 (TMTPOST)—OpenAI released its first AI video generation model Sora last Thursday. This marks a historic milestone, as the diffusion model, combined with OpenAI's highly successful transformer, has achieved a breakthrough in visual generation similar to that of large language models. Undoubtedly, a commercial revolution in the field of visual generation will follow.

This article will discuss: 1. What is Sora and how it works; 2. The industrial opportunities Sora presents; 3. Will AI startups fail to survive?

What is Sora and how it works

Sora has redefined the standards for AI video generation models in several aspects:

· Sora has increased the video length from five to fifteen seconds to one minute, which can fully meet the needs of creating short videos. According to OpenAI, if necessary, it’s a piece of cake to make videos longer than one minute.

· Sora can generate multiple shots, and each shot maintains consistency in character roles and visual style.

· Sora can generate videos from text prompts, and also support video-to-video editing. It can also generate high-quality images. Sora can even collage together completely different videos to make them merge into one coherent piece.

· Sora is based on a diffusion model, as well as a visual large model that combines diffusion with Transformer, and it has produced emergent phenomena and has gained a deeper understanding and interaction capability with the real world, resembling the early form of a world model.

Sora can generate more real, highly consistent multi-shot long videos

OpenAI released dozens of sample videos, demonstrating the powerful capabilities of the Sora model.

Facial features such as pupils, eyelashes, and skin textures is so real to the naked eye that no bug can be detected, marking an epic improvement in authenticity compared to previous AI-generated videos. The gap between AI videos and reality is even harder to discern.

The drone's perspective of Tokyo street scenes showcases the advantages of Sora in complex settings, naturalness of character movements, and beyond.

Driving on the mountain roads, the retro SUV looks highly realistic.

Sora can make a natural transition between two input videos, creating seamless transitions between videos with completely different themes and scenes.

How the Diffusion Model+Transformer Works

Inspired by the large-scale training of big language models, the OpenAI team took a similar approach. In the same way of handling the text data tokens of big language models, they segmented visual data into chunks. They first compressed the video into lower-dimensional latent features and then decomposed it into spatiotemporal chunks, which serve a similar function to tokens in big language models, used for training Sora.

Simply put, Sora has tokenized images/videos

Sora is a video model based on the Diffusion Model, but it is a Diffusion Transformer model. The Transformer has already proven its powerful capability in implementing language, vision, and image generation together.

It is based on the research findings of DALL·E and GPT models, adopting the re-annotation technique of DALL·E 3, and it enables the model to more accurately follow user text instructions to generate videos by leveraging the capabilities of GPT.

So, Sora is a visual large model that combines a diffusion model with a transformer.

In addition to generating videos based on text instructions, this model can also transform existing static images into videos, giving the content in the images targeted and detailed animations. The model can also extend existing videos or complete missing frames.

The emergence of Sora has further widened the gap between China and the U.S. in the field of AI.

Major flaws to be addressed

However, despite the significant improvements in technology and performance, Sora still has many limitations, particularly in understanding complex scenes involving physical principles, cause-and-effect relationships, spatial details, and the passage of time. For example, it does not represent the shattering of glass well.

Also, there is no change in the flame before and after blowing out a candle.

It also made an error in the direction of a person running on a treadmill.

OpenAI only provided a demonstration of the generated video, and with the release of Sora, it also sparked concerns about the misuse of video generation technology. For this reason, the company did not officially make Sora available for public use, but carefully selected a group of "trusted" professionals for testing.

Industrial opportunities Sora presents

Firstly, this marks a milestone in technological advancement.

Secondly, in the context of video applications, powerful presentation does not equal being practical. If commercialization requires a score of 100 (60 points for technology + 40 points for scenario), human beings could get 90 points, and Sora can get 60 points, or up to 75 points. There is still a path to commercialization that needs to be completed by manual efforts or a combination of technology and business innovation.

First, controllability. Whether in commercial or creative scenarios, videos need to be completed according to human intelligence or objective laws, which poses a huge challenge for Sora.

For example, someone proposed a physical model, and although Sora can generate beautiful and flashy moves, when it comes to a specific scenario, such as a rubber ball bouncing repeatedly off the ground, it requires the support of a physical model, which goes beyond the capability of current diffusion+transformer technology.

Second, the prompt is still a technical challenge. In the visual field, it is generally difficult for non-professionals to use visual generation effectively, which requires training and technological breakthroughs to train laymen into experts.

As such, there is still large space for improvement for creation based on practical scenarios. For creations with 60 to 75 points or more, there are opportunities for scenario innovation.

The opportunities for scenario innovation belong to creators who understand the scenarios and the models.

Those who have watched the TV drama "Blossoms" know that, for famous directors like Wong Kar-wai, technological innovation tools can at most improve the efficiency of presenting specific scenarios. Characters like Ba Zong, Lingzi, and Ye Shu cannot be replaced by machines in the short term.

What we may expect is not AI making filmmakers unemployed, but empowering filmmakers to create better works.

Will AI Startups at Home and Abroad Fail to Survive?

First of all, the winner takes all does not apply to all cases. A notable feature of the U.S. business ecosystem is that top-tier companies build platforms, second-tier companies produce a full range of products, and third-tier companies focus on winning customers.

OpenAI's Sora marks a significant engineering progress, somewhat akin to the industry leading the way in state-funded scientific research. However, this breakthrough was first realized in the industrial sector, not academia, and there is still some distance to go before commercialization.

Leading companies need to secure their leading position in key areas, make breakthroughs in technology, build platforms, and also develop vertical applications, but they place more emphasis on attracting a wide range of developers to participate, rather than spreading themselves too thin by trying to cover all applications.

Therefore, there is a lot to be achieved beyond 60 points. This can be clearly seen by looking at the thousands of applications on Salesforce.

Secondly, according to OpenAI's paper, the path to supporting 60-second videos is clearly articulated, saving many startups tens of millions in exploration costs, while also providing entrepreneurs with a great deal of imaginative space.

If it only takes 15 seconds, if the controllability of the video subject is increased, if there is a need to control the subject's path in the video, could there be other options? Could the diffusion transformer be used in a better way? Again, the capability of the model determines the height of a startup team, and above a score of 60, the applications supported by the model will be distinguished. Startups that understand models and applications have great opportunities.

In the U.S. market, large companies that follow the lead like to narrow the gap through mergers and acquisitions; small teams that run fast and start quickly are highly valued when entering large companies.

Domestic mergers and acquisitions are not as active, and big factories like to enter the field and do everything. But with OpenAI running so fast and so many opportunities emerging on such a large track, it's hard for big companies not to have other ideas, in case another big company beats them to the draw.

Once again, this is a grand arena where everyone can play on an equal footing.

Admittedly, behind the large video models is the super-linear growth of training and inference computing power. The rise of demand, coupled with a greater need for computing power, infrastructure, and tools, has generated more new opportunities than ever before for Chinese and U.S. entrepreneurs.

(This article was first published on the TMTPost App)

特别声明:以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布,本平台仅提供信息存储服务。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相关推荐
热点推荐
年轻女子大脑里揪出20厘米长活虫!病因和多年前饮食有关

年轻女子大脑里揪出20厘米长活虫!病因和多年前饮食有关

南方都市报
2024-04-29 20:00:30
货币战,赢了!中国GDP涨5.3%,美国仅0.4%,中美分水岭已出现

货币战,赢了!中国GDP涨5.3%,美国仅0.4%,中美分水岭已出现

娱乐圈小胡椒
2024-04-29 10:49:53
后续来了,重庆江边两女孩穿和服引众怒,官方回应,身份被曝光

后续来了,重庆江边两女孩穿和服引众怒,官方回应,身份被曝光

元爸体育
2024-04-29 20:09:41
53岁大妈因丈夫不和她过性生活,不给她钱花,睡梦中将丈夫杀害

53岁大妈因丈夫不和她过性生活,不给她钱花,睡梦中将丈夫杀害

胖胖侃咖
2024-04-20 08:00:15
告别!亚洲第一飞人落幕,苏炳添轰10秒50,名次第5无缘达标奥运

告别!亚洲第一飞人落幕,苏炳添轰10秒50,名次第5无缘达标奥运

林小湜体育频道
2024-04-28 21:06:56
刘亦菲称:因为这次的穿着被吐槽,精神内耗到半夜睡不着!

刘亦菲称:因为这次的穿着被吐槽,精神内耗到半夜睡不着!

娱乐的小灶
2024-04-29 15:17:12
神舟十七号4.30回家:预计17时30分前后着陆!空间站钥匙是扳手?

神舟十七号4.30回家:预计17时30分前后着陆!空间站钥匙是扳手?

环球科学猫
2024-04-29 16:11:01
2-0!首支U23亚洲杯决赛队诞生,锁定奥运会名额,中国裁判成关键

2-0!首支U23亚洲杯决赛队诞生,锁定奥运会名额,中国裁判成关键

体坛春秋
2024-04-30 00:56:07
穆里尼奥:近20年世界足坛,堪称世界顶级的仅3人,本泽马还不行

穆里尼奥:近20年世界足坛,堪称世界顶级的仅3人,本泽马还不行

天下足球资讯
2024-04-27 13:45:14
美国为何制裁完“昆仑银行”后光速变脸?

美国为何制裁完“昆仑银行”后光速变脸?

美人茶话会
2024-04-29 18:53:25
唐艺昕 ,张若昀好福气

唐艺昕 ,张若昀好福气

阿芒娱乐说
2024-04-29 09:31:07
60岁生双胞胎,盛海琳晚年遭双重打击,爆料称丈夫是非正常死亡

60岁生双胞胎,盛海琳晚年遭双重打击,爆料称丈夫是非正常死亡

乙图
2024-04-28 06:48:58
苹果新品全面曝光,大的要来了

苹果新品全面曝光,大的要来了

果粉俱乐部
2024-04-29 11:44:30
俄罗斯:希望哈萨克斯坦和乌兹别克斯坦不向乌克兰提供俄罗斯武器

俄罗斯:希望哈萨克斯坦和乌兹别克斯坦不向乌克兰提供俄罗斯武器

飞狼
2024-04-28 22:30:17
69年陈明仁接到一命令,脱下军服、帽徽送军部,周总理:你误会了

69年陈明仁接到一命令,脱下军服、帽徽送军部,周总理:你误会了

让时间说真话
2024-04-29 16:19:38
局势彻底失控,美国乱作一团,拜登无能为力,美议员称中国要负责

局势彻底失控,美国乱作一团,拜登无能为力,美议员称中国要负责

排头国际视野
2024-04-29 21:10:03
你那些很严重的病都是怎么好的?网友:拉肚子拉三天把痔疮拉没了

你那些很严重的病都是怎么好的?网友:拉肚子拉三天把痔疮拉没了

今日养生之道
2024-04-29 19:46:26
越闹越大!27岁吴艳妮穿连体露背装惹争议,网友:臀部快包不住了

越闹越大!27岁吴艳妮穿连体露背装惹争议,网友:臀部快包不住了

拳击时空
2024-04-28 06:32:08
我国已做好最坏打算,一旦俄战败,我们必须做好四件事,哪四件?

我国已做好最坏打算,一旦俄战败,我们必须做好四件事,哪四件?

零点历史说
2024-04-02 11:50:41
江西38万彩礼觉醒姐,被扒与黑人亲密合影,网友:0彩礼我都不要

江西38万彩礼觉醒姐,被扒与黑人亲密合影,网友:0彩礼我都不要

嘿哥哥科技
2024-04-30 01:40:56
2024-04-30 04:58:44
钛媒体APP
钛媒体APP
独立财经科技媒体
100620文章数 858312关注度
往期回顾 全部

教育要闻

9+9÷3=12被老师打“红叉”,家长质问反被嘲笑:小学知识都不会?

头条要闻

周鸿祎近10天里热搜不断 背后是360市值蒸发4000多亿

头条要闻

周鸿祎近10天里热搜不断 背后是360市值蒸发4000多亿

体育要闻

足球童话!执教16年,从业余联赛到德甲

娱乐要闻

田馥甄遭抵制,蔡依林却能稳稳捞金?

财经要闻

建信人寿巨亏40亿 部分产品退保率93%

科技要闻

马斯克收获大礼,李彦宏梅开二度?

汽车要闻

配置更丰富 静态体验2024款欧拉好猫

态度原创

旅游
家居
艺术
手机
教育

旅游要闻

入境游热度持续攀升 “畅游中国”更便捷

家居要闻

光影之间 空间暖意打造生活律动

艺术要闻

共度北京108小时 北京当代2024“凝聚”全球36座城市100余家艺术机构

手机要闻

传三星因成本原因放弃在下一代旗舰机中采用更大的电池和更快的充电速度

教育要闻

安徽数学常考最值题,认真做都能做对,试试看

无障碍浏览 进入关怀版