Alejandro Armenta's Blog
LLM Engineer
Friday, August 22, 2025
Masked Multihead Attention
I made a masked Multihead Attention mechanism for LLMs.
This is the code:
They are combined the context vectors in the last dimension each learning something different from the input x.
No comments:
Post a Comment
Newer Post
Older Post
Home
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment