知者也 -

所有标签

#深度学习 ¹ #Ubuntu ⁵ #电力 ⁰ #数学 ³ #Katex ¹ #Latex ¹ #优化 ³ #test ²

论文精读 #深度学习

精读Attention is all you need，解读Transformer模型

这是一篇记录对于论文Attention is all you need和模型Transformer的解读的文章原理解读参考Transformer代码完全解读 Why self-attention? 对于输入序列长度为n，每个token的特征表示（embedding）维度为d的情况，Self-At

Zoecitron

发布于 2024-04-11