Friendica Social Network

$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

GitHub - itigges22/ATLAS: Adaptive Test-time Learning and Autonomous Specialization

Adaptive Test-time Learning and Autonomous Specialization - itigges22/ATLAS

^GitHub

#technology

like this

in reply to ☆ Yσɠƚԋσʂ ☆

☆ Yσɠƚԋσʂ ☆

in reply to ☆ Yσɠƚԋσʂ ☆ • 5 days ago • •

The trick they use is pretty clever. When you ask an AI to write code, it doesn't always get it right. Sometimes the code has bugs, sometimes it misunderstands the problem entirely. A naive way to address that is to generate a few solutions and test each one. The odds that at least one works go way up. ATLAS generates multiple attempts, running each through a test suite. Each retry also gets told what went wrong with the previous attempt, so it can try to avoid the same mistake.

But this can be pretty slow since you have to run the code in an isolated environment, check the outputs, wait for it to finish. Doing that for every candidate quickly adds up. So ATLAS has another shortcut for avoiding unnecessary testing. Instead of simply generating solutions and testing all of them, it tries to predict which one is most likely correct before running any tests.

ATLAS also asks the model for an embedding of what it just wrote which acts as a fingerprint. Two similar pieces of code will produce similar fingerprints. A well-written, confident solution will produce a different f

like this

in reply to ☆ Yσɠƚԋσʂ ☆

Avid Amoeba

in reply to ☆ Yσɠƚԋσʂ ☆ • 5 days ago • •

Very interesting.

in reply to ☆ Yσɠƚԋσʂ ☆

RedWeasel

in reply to ☆ Yσɠƚԋσʂ ☆ • 5 days ago • •

Honestly makes more sense than shoving everything into one model. Some of those main models seem to just use one to do everything and I would break them into parts. Have a model to filter the input and try to categorize it and feed it into the proper model. Like having a "council of experts" and ask to proper "expert".

I am not an expert into AI models

in reply to RedWeasel

☆ Yσɠƚԋσʂ ☆

in reply to RedWeasel • 5 days ago • •

That's part of the idea with the whole mixture of experts (MoE) approach in newer models actually.

Rather than using a single neural net that's say 512 wide, you split it into eight channels/experts of 64. If the neural net can pick the correct channel for each inference, then you only have to run 1/8th of the neurons on every forward pass. Of course, once you have your 8 channels/experts in parallel, you now need to decide which expert/channel to use for each token you want to process. This is called a router which takes in an input and decides which expert/channel to send it to. The router itself is a tiny neural network. It is a matrix that converts the input vectors to a router choice. And the router itself has a small set of trainable weights that gets trained together with the MoE.

in reply to ☆ Yσɠƚԋσʂ ☆

glitching

in reply to ☆ Yσɠƚԋσʂ ☆ • 5 days ago • •

thank you for the translation; I understood like five words in the link. sadly, the entry barrier of expensive hardware is way too much to play with it, moral issues notwithstanding. bookmarked to revisit in a decade or so, inshallah.

in reply to glitching

☆ Yσɠƚԋσʂ ☆

in reply to glitching • 4 days ago • •

just wait for the bubble to pop, and I'm sure we'll see a lot of affordable GPUs flood the market from the abandoned data centres :)

⇧

☆ Yσɠƚԋσʂ ☆ via Technology

☆ Yσɠƚԋσʂ ☆
5 days ago • •

$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

GitHub - itigges22/ATLAS: Adaptive Test-time Learning and Autonomous Specialization

☆ Yσɠƚԋσʂ ☆

Avid Amoeba

RedWeasel

☆ Yσɠƚԋσʂ ☆

glitching

☆ Yσɠƚԋσʂ ☆

☆ Yσɠƚԋσʂ ☆ via Technology

☆ Yσɠƚԋσʂ ☆ 5 days ago • •

☆ Yσɠƚԋσʂ ☆
5 days ago • •