Skip to main content


I'm tired of LLM bullshitting. So I fixed it.


This entry was edited (2 weeks ago)
in reply to SuspciousCarrot78

Based AF. Can anyone more knowledgeable explain how it works? I am not able to understand.
in reply to SuspciousCarrot78

As I understand it, it corrects the output of LLMs. If so, how does it actually work?
in reply to itkovian

in reply to SuspciousCarrot78

That is much clearer. Thank you for making this. It actually makes LLMs useful with much lesser downsides.
This entry was edited (3 weeks ago)
in reply to itkovian

God, I hope so. Else I just pissed 4 months up the wall and shouted a lot of swears at my monitor for nada :)

Let me know if it works for you

in reply to SuspciousCarrot78

This is very cool. Will dig into it a bit more later but do you have any data on how much it reduces hallucinations or mistakes? I’m sure that’s not easy to come by but figured I would ask. And would this prevent you from still using the built-in web search in OWUI to augment the context if desired?
in reply to FrankLaskey

On the stuff you use the pipeline/s on? About 85-90% in my tests.

Just don't GIGO (Garbage in, Garbage Out) your source docs...and don't use a retarded LLM.

That's why I recommend Qwen3-4 2507 Instruct. It does what you tell it to (even the abilterated one I use).

Random Sexy-fun-bot900-HAVOK-MATRIX-1B.gguf? I couldn't say :)

in reply to FrankLaskey

Comment removed by (auto-mod?) cause I said sexy bot. Weird.

Restating again:
On the stuff you use the pipeline/s on? About 85-90% in my tests. Just don't GIGO (Garbage in, Garbage Out) your source docs...and don't use a dumb LLM. That's why I recommend Qwen3-4 2507 Instruct. It does what you tell it to (even the abilterated one I use).

in reply to SuspciousCarrot78

abilterated one


Please elaborate, that alone piqued my curiosity. Pardon me if I couldve searched

in reply to 7toed

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

Thank you again for your explainations. After being washed up with everything AI, I'm genuinely excited to set this up. I know what I'm doing today! I will surely be back
in reply to 7toed

Please enjoy. Make sure you use >>FR mode at least once. You probably won't like the seed quotes but maybe just maybe you might and I'll be able to hear the "ha" from here.
in reply to SuspciousCarrot78

This is so cool to read about, thx for doing what you and pls keep doing it! We need high quality and trustworthy information now more than ever I think. Damn nzs spewing their propaganda everywhere and radicalising the vulnerable. Thanks!
in reply to SuspciousCarrot78

I have no remarks, just really amused with your writing in your repo.

Going to build a Docker and self host this shit you made and enjoy your hard work.

Thank you for this!

in reply to BaroqueInMind

Thank you ❤

Please let me know how it works...and enjoy the >>FR settings. If you've ever wanted to trolled by Bender (or a host of other 1990s / 2000s era memes), you'll love it.

in reply to Diurnambule

There are literally dozens of us. DOZENS!

I'm on a potato, so I can't attach it to something super sexy, like a 405B or a MoE.

If you do, please report back.

PS: You may see (in the docs) occasional references that slipped passed me to MoA. That doesn't stand for Mixture of Agents. That stood for "Mixture of Assholes". That's always been my mental model for this.

Or, in the language of my people, this was my basic design philosophy:

YOU (question)-> ROUTER+DOCS (Ah shit, here we go again. I hate my life)

|

ROUTER+DOCS -> Asshole 1: Qwen ("I'm right")

|

ROUTER+DOCS -> Asshole 2: Phi ("No, I'm right")

|

ROUTER+DOCS -> Asshole 3: Nanbeige ("Idiots, I'm right!")

|

ROUTER+DOCS (Jesus, WTF. I need booze now) <- (all assholes)

|

--> YOU (answer)

(this could have been funnier in the ASCII actually worked but man...Lemmy borks that)

EDIT: If you want to be boring about it, it's more like this

pastebin.com/gNe7bkwa

PS: If you like it, let other people in other places know about it.

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

Fuck yeah...good job. This is how I would like to see "AI" implemented. Is there some way to attach other data sources? Something like a local hosted wiki?
in reply to Terces

Hmm. I dunno - never tried. I suppose if the wiki could be imported in a compatible format...it should be able to chew thru it just fine. Wiki's are usually just gussied up text files anyway :) Drop the contents of your wiki in there a .md's and see what it does
in reply to SuspciousCarrot78

I wanna just plug Wikipedia into this and see if it turns an LLM into something useful for the general case.
in reply to SpaceNoodle

LOL. Don't do that. Wikipedia is THE nosiest source.

Would you like me to show you HOW and WHY the SUMM pathway works? I built it after I tried a "YOLO wikipedia in that shit - done, bby!". It...ended poorly

in reply to SuspciousCarrot78

Not OP, but random human.

Glad you tried the "YOLO Wikipeida", and are sharing that fact as it saves the rest of us time. :)

in reply to SpaceNoodle

This entry was edited (3 weeks ago)
in reply to db0

AI Horde has a OpenAI compatible REST API (oai.aihorde.net/). They say that it doesn't support the full feature set of their native API, but will almost assuredly work with this.

OP manually builds the oapi JSON payload and then uses the python requests library to handle the request.

The fields they're using match the documentation on oai.aihorde.net/docs

You would need to add a header with your AI Horde API key. Looks like that would only need to be done in router_fastapi.py - call_model_prompt() (line 269) and call_model_messages() (line 303) and then everything else is setup according to documentation

in reply to FauxLiving

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

Very impressive. The only mistake on the third one is that the kudos are actually transferrable (i.e. "tradable"), but we forbid exchanges for monetary rewards.

Disclaimer: I'm the lead developer for the AI Horde. I also like you've achieved here and would be interesting if we can promote this usage via the AI Horde in some way. If you can think of some integration or collaboration we could do, hit me up!

PS: While the OpenAI API is technically working, we still prefer people to use our own API as it's much more powerful (allowing people to use multiple models, filter workers, tweak more vars) and so on. If you would support our native API, I'd be happy to add a link to your software in our frontpage in the integrations area for LLMs.

This entry was edited (3 weeks ago)
in reply to db0

Oh shit! Uh...thank you! Umm. Yes. That was unexpected :)

Re: collab. I'm away for a bit with work, but let me think on it for a bit? There's got to be a way to make this useful to more peeps.

Believe it or not, I am not a CS guy at ALL (I work in health-care) and I made this for fun, in a cave, with a box of scraps.

I'm not good at CS. I just have a ... "very special" brain. As in, I designed this thing from first principles using invariants, which I understand now is not typical CS practice.

in reply to SuspciousCarrot78

No worries, just wanted to point out we're always happy to collaborate with other cool FOSS projects.
in reply to db0

Thank you :) I've been eating a lot of shit on HN (and other places) about this thing. It's nice not to be called a goon-coder or fantasist, just once.
in reply to SuspciousCarrot78

WTF is a "goon-coder" lol :D

I haven't had good experiences with HN myself, even when I was simply trying to post about the AI Horde.

in reply to db0

I had to look it up. Apparently, it's someone who over-optimises the bells and whistles and never ships a finished product.

gooncode.dev/

in reply to SuspciousCarrot78

At first blush, this looks great to me. Are there limitations with what models it will work with? In particular, can you use this on a lightweight model that will run in 16 Gb RAM to prevent it hallucinating? I've experimented a little with running ollama as an NPC AI for Skyrim - I'd love to be able to ask random passers-by if they know where the nearest blacksmith is for instance. It was just far too unreliable, and worse it was always confidently unreliable.

This sounds like it could really help these kinds of uses. Sadly I'm away from home for a while so I don't know when I'll get a chance to get back on my home rig.

in reply to rollin

My brother in virtual silicon: I run this shit on a $200 p.o.s with 4gb of VRAM.

If you can run an LLM at all, this will run. BONUS: because of the way "Vodka" operates, you can run with a smaller context window without eating shit of OOM errors. So...that means.. if you could only run a 4B model (because the GGUF itself is 3GBs without the over-heads...then you add in the drag from the KV cache accumulation).. maybe you can now run next sized up model...or enjoy no slow down chats with the model size you have.

in reply to SuspciousCarrot78

I never knew LLMs can run on such low-spec machines now! That's amazing. You said elsewhere you're using Qwen3-4B (abliterated), and I found a page saying that there are Qwen3 models that will run on "Virtually any modern PC or Mac; integrated graphics are sufficient. Mobile phones"

Is there still a big advantage to using Nvidia GPUs? Is your card Nvidia?

My home machine that I've installed ollama on (and which I can't access in the immediate future) has an AMD card, but I'm now toying with putting it on my laptop, which is very midrange and has Intel Arc graphics (which performs a whole lot better than I was expecting in games)

in reply to rollin

Yep, LLMs can and do run on edge devices (weak hardware).

One of the driving forces for this project was in fact trying to make my $50 raspberry pi more capable of running llm. It sits powered on all the time, so why not?

No special magic with NVIDIA per se, other than ubiquity.

Yes, my card is NVIDIA, but you don't need a card to run this.

in reply to als

Yes. Several reasons -

  • Focuses on making LOCAL LLMs more reliable. You can hitch it to OpenRouter or ChatGPT if you want to leak you personal deets everywhere, but that's not what this is for. I built this to make local, self hosted stuff BETTER.
  • Entire system operates on curating (and ticketing with provenance trails) local data..so you don't need to YOLO request thru god knows where to pull information.
  • In theory, you could automate a workflow that does this - poll SearXNG, grab whatever you wanted to, make a .md summary, drop it into your KB folder, then tell your LLM "do the thing". Or even use Scrapy if you prefer: github.com/scrapy/scrapy
  • Your memory is stored on disk, at home, on a tamper proof file, that you can inspect. No one else can see it. It doesn't get leaked by the LLM any where. Because until you ask it, it literally has no idea what facts you've stored. The content of your KBs, memory stores etc are CLOSED OFF from the LLM.
in reply to als

Yes, because making locally hosted LLMs actually useful means you don’t need to utilize cloud-based and often proprietary models like ChatGPT or Gemini which Hoover up all of your data.
in reply to SuspciousCarrot78

Super interesting build

And if programming doesn't pan out please start writing for a magazine, love your style (or was this written with your AI?)

in reply to Angel Mountain

Once again: I am a meat popsicle (with ASD), not AI. All errors and foibles are mine :)
in reply to SuspciousCarrot78

meat popsicle


( ͡° ͜ʖ ͡°)

Anyway, the other person is right. Your writing style is great !

I successfully read your whole post and even the README. Probably the random outbursts grabbed my attention back to te text.

Anyway version 2, this
Is a very cool idea ! I cannot wait to either :
- incorporate it to my workflows
- let it sit in a tab to never be touched ever again
- tgeoryceaft, do tests and request features so much as to burnout

Last but not least, thank you for not using github as your primary repo

This entry was edited (3 weeks ago)
in reply to Karkitoo

Hmm. One of those things is not like the other, one of those things just isn't the same...

About the random outburst: caused by TOO MUCH FUCKING CHATGPT WASTING HOURS OF MY FUCKING LIFE, LEADING ME DOWN BLIND ALLEYWAYS, YOU FUCKING PIEC...

...sorry, sorry...

Anyway, enjoy. Don't spam my Github inbox plz :)

in reply to SuspciousCarrot78

Don't spam my Github inbox plz


I can spam your codeberg's then ? :)

About the random outburst: caused by TOO MUCH FUCKING CHATGPT WASTING HOURS OF MY FUCKING LIFE, LEADING ME DOWN BLIND ALLEYWAYS, YOU FUCKING PIEC...
..sorry, sorry...


Understandable, have a great day.

in reply to Karkitoo

Don't spam my Codeberg either.

Just send nudes.

In ASCII format.

By courier pigeon

in reply to SuspciousCarrot78

I don't see how it addresses hallucinations. It's really cool! But seems to still be inherently unreliable (because LLMs are)
in reply to Alvaro

don’t see how it addresses hallucinations. It’s really cool! But seems to still be inherently unreliable (because LLMs are)


LLMs are inherently unreliable in “free chat” mode. What llama-conductor changes is the failure mode: it only allows the LLM to argue from user curated ground truth and leaves an audit trail.

You don't have to trust it (black box). You can poke it (glass box). Failure leaves a trail and it can’t just hallucinate a source out of thin air without breaking LOUDLY and OBVIOUSLY.

TL;DR: it won't piss in your pocket and tell you it's rain. It may still piss in your pocket (but much less often, because it's house trained)

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

Very impressive! Do you have benchmark to test the reliability? A paper would be awesome to contribute to the science.
in reply to bilouba

Just bush-league ones I did myself, that have no validation or normative values. Not that any of the LLM benchmarks seem to have those either LOL

I'm open to ideas, time wiling. Believe it or not, I'm not a code monkey. I do this shit for fun to get away from my real job

in reply to SuspciousCarrot78

I understand, no idea on how to do it. I heard about SWE‑Bench‑Lite that seems to focus on real-world usage.
Maybe try to contact "AI Explained" on YT, he's the best IMO. Your solution might be novel or not but he might help you figuring that. If it is indeed novel, it might be worth it to share it with the larger community.
Of course, I totally get that you might not want to do any of that.
Thank you for your work!
in reply to SuspciousCarrot78

This seems astonishingly more useful than the current paradigm, this is genuinely incredible!

I mean, fellow Autist here, so I guess I am also... biased towards... facts...

But anyway, ... I am currently uh, running on Bazzite.

I have been using Alpaca so far, and have been successfully running Qwen3 8B through it... your system would address a lot of problems I have had to figurr out my own workarounds for.

I am guessing this is not available as a flatpak, lol.

I would feel terrible to ask you to do anything more after all of this work, but if anyone does actually set up a podman installable container for this that actually properly grabs all required dependencies, please let me know!

in reply to sp3ctr4l

Indeed. And have you heard? That makes the normies think were clankers (bots). How delightful.

Re: the Linux stuff...please, if someone can do that, please do. I have no idea how to do that. I can figure it out but making it into a "one click install" git command took several years off my life.

Believe it or not, I'm not actually a IT / CS guy. My brain just decided to latch onto this problem one day 6 months ago and do an autism.

I'm 47 and I still haven't learned how to operate this vehicle...and my steering is getting worse, not better, with age.

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

This entry was edited (3 weeks ago)
in reply to sp3ctr4l

Not famous, no :)

I hear you, brother. Normally, my hyperfocus is BJJ (I've been at that for 25 years; it's a sickness). I herniated a disc in my low back and lost the ability to exercise for going on 6 months.

BJJ is like catnip for autists. There is an overwhelming population of IT, engineers and ASD coded people in BJJ world.

There's even a gent we loving call Blinky McHeelhook, because well...see for yourself

Noticing the effects of elbow position, creating an entire algorithm, flow chart and epistemology off the fact?

"VERY NORMAL."

Anyway, when my body said "sit down", my brain went "ok, watch this".

I'm sorry. I'm so sorry. No one taught me how to drive this thing :)

PS: I only found out after my eldest was diagnosed. Then my youngest. The my MIL said "go get tested". I did.

Result - ASD.

Her response - "We know".

Great - thanks for telling me. Would have been useful to know, say... 40ish years ago.

This entry was edited (3 weeks ago)
in reply to sp3ctr4l

No promises, but if I end up running this it will be by putting it in a container. If I do, then I'll put a PR on Codeberg with a Docker Compose file (compatible with Podman on Bazzite).

@SuspciousCarrot78@lemmy.world

in reply to SuspciousCarrot78

I’m probably going to give this a try, but I think you should make it clearer for those who aren’t going to dig through the code that it’s still LLMs all the way down and can still have issues - it’s just there are LLMs double-checking other LLMs work to try to find those issues. There are still no guarantees since it’s still all LLMs.
in reply to WolfLink

Fair point on setting expectations, but this isn’t just LLMs checking LLMs. The important parts are non-LLM constraints.

The model never gets to “decide what’s true.” In KB mode it can only answer from attached files. Don't feed it shit and it won't say shit.

In Mentats mode it can only answer from the Vault. If retrieval returns nothing, the system forces a refusal. That’s enforced by the router, not by another model.

The triple-pass (thinker → critic → thinker) is just for internal consistency and formatting. The grounding, provenance, and refusal logic live outside the LLM.

So yeah, no absolute guarantees (nothing in this space has those), but the failure mode is “I don’t know / not in my sources, get fucked” not “confidently invented gibberish.”

This entry was edited (3 weeks ago)
in reply to WolfLink

I haven’t tried this tool specifically, but I do on occasion ask both Gemini and ChatGPT’s search-connected models to cite sources when claiming stuff and it doesn’t seem to even slightly stop them bullshitting and claiming a source says something that it doesn’t.
in reply to skisnow

Yeah, this is different. Try it. It gives you cryptogenic key to the source (which you must provide yourself: please be aware. GIGO).
in reply to SuspciousCarrot78

How does having a key solve anything? Its not that the source doesn’t exist, it’s that the source says something different to the LLM’s interpretation of it.
This entry was edited (3 weeks ago)
in reply to skisnow

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

The hash proves which bytes the answer was grounded in, should I ever want to check it. If the model misreads or misinterprets, you can point to the source and say “the mistake is here, not in my memory of what the source was.”.


Eh. This reads very much like your headline is massively over-promising clickbait. If your fix for an LLM bullshitting is that you have to check all its sources then you haven’t fixed LLM bullshitting

If it does that more than twice, straight in the bin. I have zero chill any more.


That’s… not how any of this works…

in reply to SuspciousCarrot78

Awesome work. And I agree that we can have good and responsible AI (and other tech) if we start seeing it for what it is and isn't, and actually being serious about addressing its problems and limitations. It's projects like yours that can demonstrate pathways toward achieving better AI.
in reply to SuspciousCarrot78

THIS IS AWESOME!!! I've been working on using an obsidian vault and a podman ollama container to do something similar, with VSCodium + continue as middleware. But this! This looks to me like it is far superior to what I have cobbled together.

I will study your codeberg repo, and see if I can use your conductor with my ollama instance and vault program. I just registered at codeberg, if I make any progress I will contact you there, and you can do with it what you like.

On an unrelated note, you can download wikipedia. Might work well in conjunction with your conductor.

en.wikipedia.org/wiki/Wikipedi…

in reply to UNY0N

Please enjoy :) Hope it's of use to you!

EDIT: Please don't yeet wikipedia into it. It will die. And you will be sad.

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

I’m sure this is neat but I couldn’t get through the ai generated description without getting turned off. The way ai writes is like nails on a chalkboard
in reply to brettvitaz

For the record: none of my posts here are AI-generated. The only model output in this thread is in clearly labeled, cited examples.

I built a tool to make LLMs ground their answers and refuse without sources, not to replace anyone’s voice or thinking.

If it’s useful to you, great. If not, that’s fine too - but let’s keep the discussion about what the system actually does.

Also, being told my writing “sounds like a machine” lands badly, especially as an ND person, so I’d prefer we stick to the technical critique.

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

Sorry I accused you. Your writing is extremely ai like and very unpleasant to read.
in reply to brettvitaz

I'm sorry if my method of writing is unpleasant to you.

Your method of communicating your thoughts is ABHORRENT to me.

Let's go our separate ways.

Peace favour your sword.

in reply to btsax

Oh god, I think liked being called a clanker more :P

(Not North Dakotan. West Australian. Proof: cunt cunt cunty cunt cuntington).

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

I wouldn't know how to get this going, but I very much enjoyed reading it and your comments and think that it looks like a great project. 👍

(I mean, as a fellow autist I might be able to hyperfocus on it for a while, but I'm sure that the ADHD would keep me from finishing to go work on something else. 🙃)

in reply to Murdoc

Ah - ASD, ADHD and Lemmy. You're a triple threat, Harry! :)

Glad if it was entertaining, if even a little!

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

I really need this. Each time I try messing with GPT4All's "reasoning" model, it pisses me off. I'm selective on my inputs, low temperature, local docs, and it'll tell me things like tension matters for a coil's magnetic field. Oh and it spits out what I assume is unformatted LATEX so if anyone has an interface/stack recommendation please let me know
in reply to 7toed

I feel your pain. Literally.

I once lost ... 24? 26? hrs over a period of days with GPT once...it each time confidently asserting "no, for realz, this is the fix".

This thing I built? Purely spite driven engineering + caffeine + ASD to overcome "Bro, trust me bro".

I hope it helps.

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

Okay pardon the double comment, but I now have no choice but to set this up after reading your explainations. Doing what TRILLIONS of dollars hasn't cooked up yet.. I hope you're ready by whatever means you deam, when someone else "invents" this
in reply to 7toed

It's copyLEFT (AGPL-3.0 license). That means, free to share, copy, modify...but you can't roll a closed source version of it and sell it for profit.

In any case, I didn't build this to get rich (fuck! I knew I forgot something).

I built this to try to unfuck the situation / help people like me.

I don't want anything for it. Just maybe a fist bump and an occasional "thanks dude. This shit works amazing"

in reply to SuspciousCarrot78

Responding to my own top post like a FB boomer: May I make one request?

If you found this little curio interesting at all, please share in the places you go.

And especially, if you're on Reddit, where normies go.

I use to post heavily on there, but then Reddit did a reddit and I'm done with it.

lemmy.world/post/41398418/2152…

Much as I love Lemmy and HN, they're not exactly normcore, and I'd like to put this into the hands of people :)

PS: I am think of taking some of the questions you all asked me here (de-identified) and writing
a "Q&A_with_drBobbyLLM.md" and sticking it on the repo. It might explain some common concerns.

And, If nothing else, it might be mildly amusing.

in reply to SuspciousCarrot78

I have a Strix Halo machine with 128GB VRAM so I'm definitely going to give this a try with gpt-oss-120b this weekend.
in reply to Domi

Show off :)

You're self hosting that, right? I will not be held responsible for some dogey OpenRouter quant hosted by ToTaLlY NoT a ScAM LLC :)

in reply to Domi

This is the way. Good luck with OSS-120B. Those OSS models, they

  • really
  • like
  • bullet
  • points
This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

gpt-oss is pretty much unusable without custom system prompt.

Sycophancy turned to 11, bullet points everywhere and you get a summary for the summary of the summary.

in reply to Domi

Strix halo gang. Out of curiosity, what OS are you using?
in reply to SuspciousCarrot78

I want to believe you, but that would mean you solved hallucination.

Either:

A) you're lying

B) you're wrong

C) KB is very small

in reply to ThirdConsul

Hallucination isn't nearly as big a problem as it used to be. Newer models aren't perfect but they're better.

The problem addressed by this isn't hallucination, its the training to avoid failure states. Instead of guessing (different from hallucination), the system forces a Negative response.
That's easy and any big and small company could do it, big companies just like the bullshit

in reply to Kobuster

Buuuuullshit. Asked different models about the ten highest summer transfer scorers and got wildly different answers. They then tried to explain why amd got more wrong numbers.
in reply to Kobuster

A very tailored to llms strengths benchmark calls you a liar.

artificialanalysis.ai/articles…
(A month ago the hallucination rate was ~50-70%)

This entry was edited (3 weeks ago)
in reply to Kobuster

^ Yes! That. Exactly that. Thank you!

I don't like the bullshit...and I'm not paid to optimize for bullshit-leading-to-engagment-chatty-chat.

"LLM - tell me the answer and then go away. If you can't, say so and go away. Optionally, roast me like you've watched too many episodes of Futurama while doing it"

in reply to ThirdConsul

in reply to SuspciousCarrot78

So... Rag with extra steps and rag summarization? What about facts that are not rag retrieval?
in reply to ThirdConsul

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

The system summarizes and hashes docs. The model can only answer from those summaries in that mode


Oh boy. So hallucination will occur here, and all further retrievals will be deterministically poisoned?

in reply to ThirdConsul

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

in reply to ThirdConsul

This entry was edited (3 weeks ago)
in reply to ThirdConsul

Woof, after reading your "contributions" here, are you this fucking insufferable IRL or do you keep it behind a keyboard?

Goddamn. I'm assuming you work in tech in some capacity? Shout-out to anyone unlucky enough to white-knuckle through a workday with you, avoiding an HR incident would be a legitimate challenge, holy fuck.

in reply to SuspciousCarrot78

re: the KB tool, why not just skip the llm and do two chained fuzzy finds? (what knowledge base & question keywords)
This entry was edited (3 weeks ago)
in reply to Pudutr0n

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

This is amazing! I will either abandon all my other commitments and install this tomorrow or I will maybe hopefully get it done in the next 5 years.

Likely accurate jokes aside this will be a perfect match with my obsidian volt as well as researching things much more quickly.

in reply to pineapple

I hope it does what it I claim it does for you. Choose a good LLM model. Not one of the sex-chat ones. Or maybe, exactly one of those. For uh...research.
in reply to SuspciousCarrot78

This is awesome. Ive been working on something similar. Youre not likely to get much useful from here though. Anything AI is by default bad here
This entry was edited (3 weeks ago)
in reply to Zexks

Well, to butcher Sinatra: if it can make it on Lemmy and HN, it can make it anywhere :)
in reply to SuspciousCarrot78

Hallucination is mathematically proven to be unsolvable with LLMs. I don't deny this may have drastically reduced it, or not, I have no idea.

But hallucinations will just always be there as long as we use LLMs.

in reply to termaxima

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

This sounds really interesting, I'm looking forward to reading the comments here in detail and looking at the project, might even end up incorporating it into my own!

I'm working on something that addresses the same problem in a different way, the problem of constraining or delineating the specifically non-deterministic behavior one wants to involve in a complex workflow. Your approach is interesting and has a lot of conceptual overlap with mine, regarding things like strictly defining compliance criteria and rejecting noncompliant outputs, and chaining discrete steps into a packaged kind of "super step" that integrates non-deterministic substeps into a somewhat more deterministic output, etc.

How involved was it to build it to comply with the OpenAI API format? I haven't looked into that myself but may.

in reply to PolarKraken

Cheers!

Re: OpenAI API format: 3.6 - not great, not terrible :)

In practice I only had to implement a thin subset: POST /v1/chat/completions + GET /v1/models (most UIs just need those). The payload is basically {model, messages, temperature, stream...} and you return a choices[] with an assistant message. The annoying bits are the edge cases: streaming/SSE if you want it, matching the error shapes UIs expect, and being consistent about model IDs so clients don’t scream “model not found”. Which is actually a bug I still need to squash some more for OWUI 0.7.2. It likes to have its little conniptions.

But TL;DR: more plumbing than rocket science. The real pain was sitting down with pen and paper and drawing what went where and what wasn't allowed to do what. Because I knew I'd eventually fuck something up (I did, many times), I needed a thing that told me "no, that's not what this is designed to do. Do not pass go. Do not collect $200".

shrug I tried.

This entry was edited (3 weeks ago)
in reply to SuspciousCarrot78

The very hardest part of designing software, and especially designing abstractions that aim to streamline use of other tools, is deciding exactly where you draw the line(s) between intended flexibility (user should be able and find it easy to do what they want), and opinionated "do it my way here, and I'll constrain options for doing otherwise".

You have very clear and thoughtful lines drawn here, about where the flexibility starts and ends, and where the opinionated "this is the point of the package/approach, so do it this way" parts are, too.

Sincerely that's a big compliment and something I see as a strong signal about your software design instincts. Well done! (I haven't played with it yet, to be clear, lol)

in reply to PolarKraken

This entry was edited (2 weeks ago)
in reply to SuspciousCarrot78

Holy shit I'm glad to be on the autistic side of the internet.

Thank you for proving that fucking JSON text files are all you need and not "just a couple billion more parameters bro"

Awesome work, all the kudos.

This entry was edited (3 weeks ago)
in reply to floquant

Thanks. It's not perfect but I hope it's a step in a useful direction
in reply to SuspciousCarrot78

I strongly feel that the best way to improve the useability of LLMs is through better human-written tooling/software. Unfortunately most of the people promoting LLMs are tools themselves and all their software is vibe-coded.

Thank you for this. I will test it on my local install this weekend.

in reply to SuspciousCarrot78

soooo if it doesn't know something it won't say anything and if it does know something it'll show sources...so essentially you plug this into Claude it's just never going to say anything to you ever again?

neat.

in reply to rozodru

I see what you did there :)

Claude! Look how they massacred my boy!

in reply to SuspciousCarrot78

don't get me wrong I love what you've built and it IS something that is sorely needed. I just find it funny that because of this you've pretty much made something like Claude just completely shut up. You've pretty much showed off the extremely sad state of Anthropic.
in reply to rozodru

This entry was edited (2 weeks ago)