Skip to main content

in reply to ☆ Yσɠƚԋσʂ ☆

in reply to ☆ Yσɠƚԋσʂ ☆

Honestly makes more sense than shoving everything into one model. Some of those main models seem to just use one to do everything and I would break them into parts. Have a model to filter the input and try to categorize it and feed it into the proper model. Like having a "council of experts" and ask to proper "expert".

  • I am not an expert into AI models
in reply to RedWeasel

That's part of the idea with the whole mixture of experts (MoE) approach in newer models actually.

Rather than using a single neural net that's say 512 wide, you split it into eight channels/experts of 64. If the neural net can pick the correct channel for each inference, then you only have to run 1/8th of the neurons on every forward pass. Of course, once you have your 8 channels/experts in parallel, you now need to decide which expert/channel to use for each token you want to process. This is called a router which takes in an input and decides which expert/channel to send it to. The router itself is a tiny neural network. It is a matrix that converts the input vectors to a router choice. And the router itself has a small set of trainable weights that gets trained together with the MoE.

in reply to ☆ Yσɠƚԋσʂ ☆

thank you for the translation; I understood like five words in the link. sadly, the entry barrier of expensive hardware is way too much to play with it, moral issues notwithstanding. bookmarked to revisit in a decade or so, inshallah.
in reply to glitching

just wait for the bubble to pop, and I'm sure we'll see a lot of affordable GPUs flood the market from the abandoned data centres :)