Friendica Social Network

German researchers achieved 71.6% on ARC-AGI using a regular GPU for 2 cents per task. OpenAI's o3 gets 87% but costs $17 per task making it 850x more expensive.

That score is seriously impressive because it actually beats the average human performance of 60.2% and completely changes the narrative that you need massive proprietary models to do abstract reasoning. They used a fine-tuned version of Mistral-NeMo-Minitron-8B and brought the inference cost down to an absurdly cheap level compared to OpenAI's o3 model.

The methodology is really clever because they started by nuking the standard tokenizer and stripping it down to just 64 tokens to stop the model from accidentally merging digits and confusing itself. They also leaned heavily on test-time training where the model fine-tunes itself on the few example pairs of a specific puzzle for a few seconds before trying to solve the test input. For the actual generation they ditched standard sampling for a depth-first search that prunes low-probability paths early so they do not waste compute on obvious dead ends.

The most innovative part of the paper is their Product of Experts selection strategy. Once the model generates a candidate solution they do not just trust it blindly. They

Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective

The Abstraction and Reasoning Corpus (ARC-AGI) poses a significant challenge for large language models (LLMs), exposing limitations in their abstract reasoning abilities.

^arXiv.org

#technology

This entry was edited (1 week ago)

like this

in reply to ☆ Yσɠƚԋσʂ ☆

neon_nova

in reply to ☆ Yσɠƚԋσʂ ☆ • 1 week ago • •

I don’t know much about running this on my own computer other than using ollama. Is that what you mean about running it on my own?

in reply to neon_nova

☆ Yσɠƚԋσʂ ☆

in reply to neon_nova • 1 week ago • •

I haven't tried it with ollama, but it can download gguf files directly if you point it to a huggingface repo. There are a few other runners like vllm and llama.cpp, you can also just run the project directly with Python. I expect the whole Product of Experts algorithm is going to get adopted by all models going forward since it's such a huge improvement, and you can just swap out the current approach.

in reply to ☆ Yσɠƚԋσʂ ☆

neon_nova

in reply to ☆ Yσɠƚԋσʂ ☆ • 1 week ago • •

So is this a huge breakthrough that’s going to be adopted by ai companies across the board? Or maybe there is some downside.

in reply to neon_nova

☆ Yσɠƚԋσʂ ☆

in reply to neon_nova • 1 week ago • •

Almost certainly given that it drastically reduces the cost of running models. Whether you run them locally or it's a company selling a service, the benefits here are pretty clear.

in reply to ☆ Yσɠƚԋσʂ ☆

neon_nova

in reply to ☆ Yσɠƚԋσʂ ☆ • 1 week ago • •

It just sounds too good to be true. So, no critics have claimed downsides to this?

in reply to neon_nova

☆ Yσɠƚԋσʂ ☆

in reply to neon_nova • 1 week ago • •

I mean the paper and code are published. This isn't a heuristic, so there's no loss of accuracy. I'm not sure why you're saying this is too good to be true, the whole tech is very new and there are lots of low hanging fruit for optimizations that people are discovering. Every few months some discovery like this is made right now. Eventually, people will pluck all the easy wins and it's going to get harder to dramatically improve performance, but for the foreseeable future we'll be seeing a lot more stuff like this.

⇧

☆ Yσɠƚԋσʂ ☆ via Technology

☆ Yσɠƚԋσʂ ☆
1 week ago • •

German researchers achieved 71.6% on ARC-AGI using a regular GPU for 2 cents per task. OpenAI's o3 gets 87% but costs $17 per task making it 850x more expensive.

Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective

neon_nova

☆ Yσɠƚԋσʂ ☆

neon_nova

☆ Yσɠƚԋσʂ ☆

neon_nova

☆ Yσɠƚԋσʂ ☆

☆ Yσɠƚԋσʂ ☆ via Technology

☆ Yσɠƚԋσʂ ☆ 1 week ago • •

☆ Yσɠƚԋσʂ ☆
1 week ago • •