Mixhead: Breaking the low-rank bottleneck in multi-head attention language models.
2022
-
Correction
-
Cite
-
Save
0
References
0
Citations
NaN
KQI