Chat-GPT原理

互联网 1 年前 0 1

本文介绍: 除了自注意力机制外，Tr ans form e r 架构还使用了残差连接（res i du al connection s）和层归一化（layer normalization）等技术来加速训练过程和提高模型性能。 Tr ans former 架构通常由编码器（encoder）和解码器（decoder）组成，其中编码器用于将输入序列映射为一系列隐藏表示，解码器则利用这些隐藏表示生成输出序列。在自注意力子层中，输入序列中的每个元素都可以与其他元素进行交互，通过学习注意力权重来确定不同位置之间的关联程度。

核心是基于Tr ans former 架构

英文原文：

Tr ans formers are base d on t he “attention mech a n i sm,” whi ch allow s the model t o p a y more attention t o some inputs than o t hers, reg a r d less of where t hey show up in t he input seq ue n ce. Fo r example, let’s co nside r t he followin g sentence:
在这里插入图片描述

In t his sce nario, when the model is predict in g the verb “boug ht,” it needs to match the p ast tense of the verb “went.” In order to do that, it has to pay a lot of attention to the token “went.” In fa ct, it may pay more attention to the token “went” than to the token “and,” des pite the fa ct that “went” appears mu ch earlier in the input seq uence.

它允许模型在处理输入序列时能够同时关注输入序列中各个位置的信息，从而更好地捕捉长距离依赖关系。

显示所有内容

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

transformer 架构解码器

相关文章

架构整洁之道-软件架构-展示器和谦卑对象、不完全边界、层次与边界、Main组件、服务

架构整洁之道-软件架构-展示器和谦卑对象、不完全边界、层次与边界、Main组件、服务

互联网 1 年前 5

第二章整车EE架构的升级关键点

第二章整车EE架构的升级关键点

互联网 1 年前 5

KAFKA高可用架构涉及常用功能整理

KAFKA高可用架构涉及常用功能整理

互联网 1 年前 10

科普类——设计一套无人驾驶遥操作系统的步骤、架构、软硬件需求（十一）

科普类——设计一套无人驾驶遥操作系统的步骤、架构、软硬件需求（十一）

互联网 1 年前 5

【DDD】学习笔记-限界上下文与架构

【DDD】学习笔记-限界上下文与架构

互联网 1 年前 5

超越传统—Clean架构打造现代Android架构指南

超越传统—Clean架构打造现代Android架构指南

android 1 年前 4

JVM之GC垃圾回收

互联网 1 年前 3

行为型设计模式—中介者模式

互联网 1 年前 4

发表回复取消回复