The Unseen Architects for Video Generation AI: Training Data
Following the release of Sora 2 two days ago, Sam Altman has become widely recognized due to his frequent appearances in popular social media videos. In Shanghai In Acient China Talks to a random person Sora 2 are gaining popularity; however, detailed public documentation regarding their underlying training methodologies remains scarce. OpenAI simply noted that Sora takes inspiration from large language models (LLMs) that acquire generalist capabilities by training on “internet-scale data. It is possible that OpenAI may have scraped YouTube content without permission from Google. On the other hand, Google’s Veo is assumed to benefit from YouTube’s high-quality video. The implication is clear: the ability to generate realistic video is directly proportional to access to petabytes of high-quality, varied footage. ...