Addons Top — Hyperdeep
Is this for a description or a creative writing project?
Let $x$ be the input to a top layer. In a standard transformer, the output is $y = Wx + b$. In HDAT, the weight matrix $W$ is decomposed. The output becomes: $$ y = W_\textfrozenx + \textAddon(x, z) $$ Where $z$ is the latent code generated by the hypernetwork conditioning on the task context. The "Top" aspect refers to the specific focus on the final $N$ layers where $z$ has the highest variance and impact. hyperdeep addons top