AI Wiki
ChatGPT 教程MidJourney 教程Sora 教程
  • 👏Welcome
  • AI Wiki 知识百科
    • 🔎什么是人工智能 (AI)-Google
    • 🔎人工智能-百度百科
    • 🔎人工智能-Wikipedia
    • 📖Artificial Intelligence Wiki (English)
    • 📖机器之心人工智能专业词汇库
    • 📖人工智能专业术语表
      • A
      • B
      • C
      • D
      • E
      • F
      • G
      • H
      • I
      • J
      • K
      • L
      • M
      • N
      • O
      • P
      • Q
      • R
      • S
      • T
      • U
      • V
      • W
      • X
      • Y
      • Z
    • 📖机器学习课程术语表
  • Prompt Engineering 教程
    • 🔔Prompt Engineering 是什么?​
    • 📘Learn Prompting (多语言)
    • 📒Learning Prompt (中文)
    • 📗Learn Prompt (English)
    • 📕Deep Learning (English)
  • ChatGPT 教程
    • 🚩基础篇
      • 如何注册ChatGPT账号
      • Prompt简介
      • 基础用法
      • 基本原则 & 建议
      • 基本使用场景 & 使用技巧
        • 场景1:问答问题
        • 场景2:基于示例回答
        • 场景3:推理
        • 场景4:无中生有——写代码
        • 场景5:锦上添花——改写内容
        • 场景6:锦上添花——信息解释
        • 场景7:化繁为简——信息总结
        • 场景8:化繁为简——信息提取
    • 🏳️‍🌈高级篇
      • ChatGPT Prompt Framework
      • Zero-Shot Prompts
      • Few-Shot Prompting
      • Self-Consistency
      • PAL Models
      • OpenAI Playground 使用方法
      • 搭建基于知识库内容的机器人
    • 🏴‍☠️技巧篇
      • 技巧1:To Do and Not To Do
      • 技巧2:增加示例
      • 技巧3:使用引导词,引导模型输出特定内容
      • 技巧4:增加 Role(角色)或人物
      • 技巧5:使用特殊符号指令和需要处理的文本分开
      • 技巧6:通过格式词阐述需要输出的格式
      • 技巧7:Zero-Shot Chain of Thought
      • 技巧8:Few-Shot Chain of Thought
      • 技巧9:其他
    • 🪧Awesome ChatGPT Prompts (English)
  • Midjourney 教程
    • 🚩基础篇
      • 如何使用 Midjourney?
      • Midjourney Prompt 基本结构
      • Midjourney Prompt 常用参数
      • Midjourney 基础设置
      • 订阅 Midjourney 会员
    • 🏳️‍🌈高级篇
      • Midjourney Prompt 高级参数
      • Midjourney 各版本差异
      • Midjourney 官方 FAQ
    • 🏴‍☠️技巧篇
      • 技巧一:临摹
      • 技巧二:多实验
      • 技巧三:善用 Image2Image 功能
      • 技巧四:增加风格——艺术运动
      • 技巧五:增加风格——艺术家
      • 技巧六:善用 no 参数,去掉不想要的元素
      • 技巧七:多参数同时使用
      • 技巧八:使用 Seed 参数对图进行二次修改
      • 技巧九:神秘的 blend 功能
      • 技巧十:控制变量法渐进优化
      • 技巧十一:增加风格——国家
      • 技巧十二:增加权重
      • 技巧十三:善用灯光
      • 技巧十四:增加风格——年份
      • 技巧十五:如何让 Midjourney 生成的人更具有多样性?
      • 技巧十六:改变相机与镜头
      • 技巧十七:看到别人的图,想知道它的 prompt 是啥
    • 📋Text Prompt 篇
      • 撰写 Text Prompt 注意事项
      • 场景1:Stock Photo
      • 场景2:品牌 Logo
      • 场景3:App & 徽章 Logo
      • 场景4:插画
      • 场景5:头像
      • 场景6:游戏
      • 场景7:实物
      • 场景8:人物
      • 场景9:风景
      • 场景10:动漫
      • 场景11:其他
      • 框架总结
    • 🧮Big List
      • Midjourney 完整参数列表
      • Artist List
      • Photographers List
      • Lighting List
      • Anime List
      • Camera and Lens List
  • Sora 教程
    • 🚩基础篇
      • Sora 基础介绍(中文)
      • Sora 官网介绍(English)
      • 如何申请使用 Sora
      • Sora Prompt提示词合集
      • Sora 学习手册汇总
      • 💰Sora 赚钱方法
    • 🏳️‍🌈高级篇
    • 🏴‍☠️技巧篇
  • 返回Aig123.com
由 GitBook 提供支持
在本页
  • Capabilities
  • Safety
  • Research techniques
  1. Sora 教程
  2. 基础篇

Sora 官网介绍(English)

Sora is an AI model that can create realistic and imaginative scenes from text instructions.

上一页Sora 基础介绍(中文)下一页如何申请使用 Sora

最后更新于1年前

Capabilities

We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.

Introducing Sora, our text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.

Today, Sora is becoming available to red teamers to assess critical areas for harms or risks. We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.

We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon.

Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.

The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.

The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.

The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.

Safety

We’ll be taking several important safety steps ahead of making Sora available in OpenAI’s products. We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model.

We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora. We plan to include in the future if we deploy the model in an OpenAI product.

In addition to us developing new techniques to prepare for deployment, we’re leveraging the that we built for our products that use DALL·E 3, which are applicable to Sora as well.

For example, once in an OpenAI product, our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others. We’ve also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it’s shown to the user.

We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology. Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.

Research techniques

Sora is a diffusion model, which generates a video by starting off with one that looks like static noise and gradually transforms it by removing the noise over many steps.

Sora is capable of generating entire videos all at once or extending generated videos to make them longer. By giving the model foresight of many frames at a time, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily.

Similar to GPT models, Sora uses a transformer architecture, unlocking superior scaling performance.

We represent videos and images as collections of smaller units of data called patches, each of which is akin to a token in GPT. By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions and aspect ratios.

Sora builds on past research in DALL·E and GPT models. It uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. As a result, the model is able to follow the user’s text instructions in the generated video more faithfully.

Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI.

In addition to being able to generate a video solely from text instructions, the model is able to take an existing still image and generate a video from it, animating the image’s contents with accuracy and attention to small detail. The model can also take an existing video and extend it or fill in missing frames. .

🚩
C2PA metadata
existing safety methods
Learn more in our technical report
Sora: Creating video from textOpenAI
Logo