@SuspciousCarrot78

SuspciousCarrot78@lemmy.world · edit-2 5 hours ago

Additionally, in windows (linux too?) one could use Moonlight / Sunshine to compute on the GPU and stream to secondary device (either directly, like say to a Chromecast, or via the iGPU to their monitor). Latency is quite small in most circumstances, and allows for some interesting tricks (eg: server GPUs allow you to split GPU into multiple “mini-gpus” - essentially, with the right card, you could host two+ entirely different, concurrent instances of GTA V on one machine, via one physical GPU).

A bit hacky, but it works.

Source: I bought a Tesla P4 for $100 and stuck it in a 1L case.

GPU goes brrr

SuspciousCarrot78@lemmy.world · 1 day ago

Through no fault of my own, the people I talk to (family) are tied to the FB ecosystem…and they are very resilient as to leaving it.

2026 will have to be the year of “please, for the love of all that is holy, can we switch to Signal”

SuspciousCarrot78@lemmy.world · 1 day ago

Instructions unclear. Vending machine now prangent.

SuspciousCarrot78@lemmy.world · 1 day ago

How so?

SuspciousCarrot78@lemmy.world · 3 days ago

You had me at horded.

You. Had. Me. At. Horded.

SuspciousCarrot78@lemmy.world · edit-2 3 days ago

Ha ha! I actually finished it over the weekend. Now it’s onto the documentation…ICBF lol

I just tried to get shit GPT to do it this morning, as it’s generally pretty ok for that. As always, it produces real “page turners”. Here is its idea of a “lay explainer”

Mixture of Assholes: Llama-swap + “MoA router”: making small local models act reliably (without pretending they’re bigger)

This project is a harness for local inference: llama-swap is the model traffic-cop, and the router is the conductor that decides what kind of work you want done (straight answer, self-critique loop, style rewrite, vision/OCR), when, and with what context. Vodka acts as memory layer and context re-roll.

The goal isn’t to manufacture genius. It’s to make local models behave predictably under hardware constraints by:

making retrieval explicit (no “mystery memory”),
keeping “fancy modes” opt-in,
and making the seams inspectable when something goes wrong.

The shape is simple:

UI → Router (modes + RAG + memory plumbing) → llama-swap (model switching) → answer. ([GitHub][1])

The “what”: one OpenAI-style endpoint that routes workflows, not just models

At the front is an OpenAI-compatible POST /v1/chat/completions endpoint. From the client’s point of view, it’s “just chat completions” (optionally streaming). From the router’s point of view, each request can become a different workflow.

It also accepts OpenAI-style multimodal message blocks (text + image_url), which matters for the vision/OCR paths.

Under the hood, the router does three things:

Decides the pipeline (Serious / Mentats / Fun / Vision / OCR)
Builds an explicit FACTS block (RAG) if you’ve attached any KBs
Calls llama-swap, which routes the request to the chosen local model backend behind an OpenAI-like interface ([GitHub][1])

The “why”: small models fail less when you make the seams visible

A lot of local “agent” setups fail in the same boring ways:

they silently change behaviour,
they smuggle half-remembered context,
they hallucinate continuity.

This design makes those seams legible and user-controlled:

You pick the mode explicitly (no silent “auto-escalation”).
Retrieval is explicit and inspectable.
There’s a “peek” path that can show what the RAG facts block would look like without answering — which is unbelievably useful for debugging.

The philosophy is basically: if the system is going to influence the answer, it should be inspectable, not mystical.

The “what’s cool”: you’re routing workflows (Serious / Mentats / Fun / Vision)

There are two layers of control:

A) Session commands (`>>…`): change the router state

These change how the router behaves across turns (things like sticky fun mode, which KBs are attached, and some retrieval observability):

>>status — show session state (sticky mode, attached KBs, last RAG query/hits)
>>fun / >>fun off — toggle sticky fun mode
>>attach <kb> / >>detach <kb|all> / >>list_kb — manage KBs per session
>>ingest <kb> / >>ingest_all — ingest markdown into Qdrant
>>peek <query> — preview the would-be facts block

B) Per-turn selectors (`##…`): choose the pipeline for one message

## mentats … — deep 3-pass “draft → critique → final”
## fun … — answer, then rewrite in a persona voice
## vision … / ## ocr … — image paths

The three main pipelines (what they actually do)

1) Serious: the default “boring, reliable” answer

Serious is the default when you don’t ask for anything special. It can inject a FACTS block (RAG) and it receives a constraints block (which is currently a V1 placeholder). It also enforces a confidence/source line if it’s missing.

Docs vs implementation (minor note): the docs describe Serious as “query + blocks” oriented. The current implementation also has a compact context/transcript shaping step as part of prompt construction. Treat the code as the operational truth; the docs are describing the intended shape and may lag slightly in details as things settle.

2) Mentats: explicit 3-pass “think → critique → final”

This is the “make the model check itself” harness:

Thinker drafts using QUERY + FACTS + constraints
Critic checks for overreach / violations
Thinker produces the final, carrying forward a “FACTS_USED / CONSTRAINTS_USED” discipline

If the pipeline can’t complete cleanly (protocol errors), the router falls back to Serious.

3) Fun: answer first, then do the performance

Fun is deliberately a post-processing transform:

pass 1: generate the correct content (lower temperature)
pass 2: rewrite in a persona voice (higher temperature), explicitly instructed not to change the technical meaning

This keeps “voice” from leaking into reasoning or memory. It’s: get it right first, then style it.

RAG, but practical: Qdrant + opt-in KB (knowledge base) attach + “peek what you’re feeding me”

KBs are opt-in per session

Nothing is retrieved unless you attach KBs (>>attach linux, etc.). The FACTS block is built only from attached KBs and the router tracks last query/hit counts for debugging.

Ingestion: “KB folder → chunks → vectors in Qdrant”

Ingestion walks markdown, chunks, embeds, and inserts into Qdrant tagged by KB. It’s simple and operational: turn a folder of docs into something you can retrieve from reliably.

The KB refinery: SUMM → DISTILL → ingest

This is one of the more interesting ideas: treat the KB as a product, not a dump.

SUMM produces a human-readable summary (strict: no fabrication, no silent renaming) from base text
DISTILL produces dense, retrieval-shaped atoms (embedding-friendly headings/bullets, minimal noise)
then ingest the distilled output

The key point: DISTILL isn’t “a nicer summary.” It’s explicitly trying to produce retrieval-friendly material.

Vodka: deterministic memory plumbing (not “AI memory vibes”)

Vodka does two jobs:

context reduction / stability: keep the effective context small and consistent
explicit notes: store/retrieve nuggets on demand (!! store, ?? recall, plus cleanup commands), TTL (facts expire unless used)

It can also leave internal breadcrumb markers and later expand them when building a transcript/context — those IDs aren’t surfaced unless you deliberately show them.

Roadmap reality check: what’s left for V1.1

Constraints/GAG: placeholder in V1 (constraints block currently empty)
Coder role: present in config but not wired yet

SuspciousCarrot78@lemmy.world · edit-2 3 days ago

Do we dare ask why you need 48TB to store media, or do we slowly back out of the room, avoiding eye contact?

SuspciousCarrot78@lemmy.world · edit-2 3 days ago

It warms the cockles of my heart that I renamed my self hosted LLM’s deep thinking mode to Mentats. For shits and giggles, I made it append every “deep thinking” conclusion it makes with [ZARDOZ HAS SPOKEN!].

It’s the simple things, really.

SuspciousCarrot78@lemmy.world · 5 days ago

I like to secretly imagine it stands for SIG SAUER. Bang = process ded

SuspciousCarrot78@lemmy.world · edit-2 7 days ago

You…I like you. Well done.

My reddit account (the first one) was 12 years old. I nuked it and took a 18 months social media sabbatical. It was nice.

I then set up a second account…that got shadow banned (for the MVNO reasons I outlined above).

Thank you for confirming that my “fuck it, I’ll make my own Reddit. With blackjack and hookers!” plan (aka making your own Lemmy instance) is actually possible.

Please tell me you’re self hosting that on a broken down 2012 laptop, using a $99 GoDaddy domain for the ultimate fuck you asthetic.

How utterly delicious to imagine others spite out-enginering a multi billion dollar company with a box of scraps in a cave, Tony Stank style.

SuspciousCarrot78@lemmy.world · edit-2 7 days ago

Redreader does that too! It’s amazing.

https://f-droid.org/packages/org.quantumbadger.redreader/

The maintainer of RR had to basically shame reddit into not killing RR access (because RR is one of the only apps that blind folks can use to access Reddit) and they still keep trying to cripple it.

SuspciousCarrot78@lemmy.world · edit-2 3 days ago

Oh, you want a story? I’ll give you a story.

I was banned for using a MVNO while on holidays. A MVNO (mobile virtual network operator - aka a phone company) often has changing IP endpoints.

I was in Japan (using Rakuten) which has dynamic endpoints in Hong Kong, Japan, Singapore. To the reddit bots, that seems like someone stealing or spoofing your account.

Reddit flagged my account and forced password change to confirm identity “for my safety”. I complied. They then silently shadow banned my account after I did so. Like, immediately.

I followed the appeals process (such as it is - basically howling at the moon). I provided logs, GPS co-ords and even details of my flights / boarding passes.

Auto mod basically replied “lol, get fucked”. (Its an auto mod, BTW. Their entire appeal process.Very easy tell).

OK then.

I then used their takeout service to at least grab my old posts, which have some niche technical know how that is worth preserving.

In the course of that, I notice reddit has set up their system in such a way that unknowingly (?) breeches GDPR rules and privacy laws.

Well, well, well…

At this point, I’m stunned at what a cluster fuck reddit really is. Like…wtf?

I then emailed them directly and basically said “listen.I know the laws that govern this. You have 48hrs to email me XYZ or I escalate”

A few days later, Reddit legal emails me saying “hi…we’re not sure what your complaint is. Please clarify”.

So I do - with forensic level detail. Politely. Professionally.

No response.

I send a follow up email a week later saying “look, one way or another this issue needs to be resolved and my complaint answered. I’ll give you 3 days. If its still radio silence, I escalate”.

I fly home yesterday (a full 7 days after the fact). Still radio silence.

Perfect! Today I launch GDPR and OIAC complaint, with full evidence trail, screenshots, logs - the whole works. Takes me 5 minutes.

Will that get me unbanned? Don’t know, don’t care.

Will it cost reddit time and money? Very likely yes.

For those not in the know, breech of GDPR and OIAC laws carry pretty significant penalties for the service provider (to the tune of several million pounds)…and reddit is registered in a EU country. Oops.

The moral of the story (if there is one) is this:

Reddit is rented land. Lemmy is too…but at least with Lemmy, if you really wanted to, you could set up your own instance, with n=1 users and turn off new sign ups, effectively creating a sovereign, unbannable island, where you actually own what you post.

I suppose I should thank Reddit, really. This is the second account in as many years they killed like this, for this very reason.

I was going to set up my own homelab anyway: this just pushed me to do it faster and create a telecoms stack to replace cloud based social media services.

What gals me the most is I spent a fair bit of effort rebuilding my fake internet points over the course of 3 months, to satisfy their “yo, is this a human?” algo…all the while watching bots and stolen accounts spam and flourish.

To add insult to injury: today I received a message from one of the (many many) bots on Reddit offering me a chance to purchase a particular SaaS (?). So apparently I’m still good to be spammed, but other wise I can FOAD?

Fuck you, Reddit.

You owe them nothing - not your knowledge, time or efforts. Let it enshittify.

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

Sorry - I think I misunderstood part of your question (what stage have you actually gotten to). See what I mean about needing sentiment analysis LOL

Did you mean about the MoA?

The TL;DR - I have it working - right now - on my rig. It’s strictly manual. I need to detangle it and generalise it, strip out personal stuff and then ship it as v1 (and avoid the oh so tempting scope creep). It needs to be as simple as possible for someone else to retool.

So, it’s built and functional right now…but the detangling, writing up specs and docs, uploading everything to Codeberg and mirroring etc will take time. I’m back to work this week and my fun time will be curtailed…though I want nothing more than to hyperfocus on this LOL.

One of the issues with ASD is most of us over-engineer everything for the worst case adversarial outcomes, as a method of reducing meltdowns/shutdowns. Right now, I am specifically using my LLM like someone who hates it and wants to break it…to make sure it does what I say it does.

If you’d like, I can drop my RFC (request for comments, in engineering talk) for you to look at / verify with another LLM / ask someone about. This thing is real, not hype and not vibe coding. I built this because my ASD brain needs it and because I was driven by spite / too miserly to pay out the ass for decent rig. Ironically, those constraints probably led to something interesting (I hope) that can help others (I hope). Like everything else, it’s not perfect but it does what it says on the tin 9/10…which is about all you can hope for.

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

Right?

Everyone knows you’re meant to use a banana as a telephone.

https://www.youtube.com/watch?v=3l9nLXczT3s

Or, alternatively given where we are

https://yewtu.be/search?q=connor+for+real+weirdo

PS: yes, I was tempted to use Raffi’s song here instead

SuspciousCarrot78@lemmy.world · 8 days ago

Lots to try here

https://www.reddit.com/r/lowendgaming/comments/1p431ec/winners_of_the_best_freeware_and_retail_fall_2025/

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

Everything stems from the fact that I want something I can “trust but verify” / see all the seams at a moment’s notice. I assume the LLM will lie to me, so I do everything in my power to squeeze it. Having lost hours and dollars believing ChatGPT, Claude, etc… I live by “fool me once, shame on you. Fool me 4000 times, shame on me”.

The problem with LLMs (generally) is that they are NOT deterministic. You can ask the same question 5 times and get slightly different answers each time, due to the seed, temperature, top_p, etc., settings. That’s one of the main reasons for hallucinations. They give it an RNG (to put it in gaming terms) to make it feel more “alive”. That’s cool and all, but it causes it to bullshit.

I have ASD; I cannot abide my tools having whims or working differently than they should. When I ask something, I want it to answer it EXACTLY correctly (based on my corpus, my IF-THEN GAG, etc.), reason the way I told it to, and show its proof. Do what I said, how I said.

In that way, it acts as an external APU for my brain - I want it to do what I would do, the way I would do it, just faster. And it needs to bring receipts because I am hostile to it as a default stance (once bitten, twice shy).

To be more specific, the MoA has two basic modes. In /serious mode, it will do three careful passes on my question and pull in my documents. For example, if I ask it for launch flags or optimisation of Dolphin emulator or llama.cpp, I want it to reference my documents (scraped from official sites via Scrapy), check my benchmarks and come up with a correct response. Or tell me that it can’t, because XYZ. No smooth lies.

It must also provide me with an indicator of accuracy and a source for its information, so I can verify with one click. I trust nothing until it’s earned and even then, I will spot check.

If I want it to reason about a patient’s differential diagnosis, it must climb the GAG nodes and follow my prompts EXACTLY. No flights of fancy AT ALL. Follow the flow-chart, tell me what I must not miss, what the likely diagnosis is, etc. Then I will tell it what I think it is… then we debate. (I’m setting this up for clinical students… I wish I’d had it when I went through).

If I want coding help because I’ve fucked up some Python script (yet again): don’t invent shit. Look at the reference documents and tell me EXACTLY. Teach me and help me unfuck myself. If you can’t, say so honestly and tell me who I should talk to, based on an externally stored policy document.

Then there’s also /casual mode. In casual mode, I want it to let me shoot the shit, vent and help me brainstorm, swear, tell me dirty jokes, reference pop culture… without strict adherence to my presets… unless I invoke them.

So the TL;DR answer to your question is: I use it for work, for fun (coding, gaming) and for generally shooting the shit. The “shooting the shit” aspect also includes stuff like sentiment analysis, because I have trouble sometimes understanding meaning from text.

Regarding the iceberg / what stage I’ve gotten to… baby, I wrote the list. You can be DAMN sure I’ve drilled holes into cases and decompiled shit to make it run on my potato, out of spite. I was really, really tempted to rip out the guts of The Alters and decompile it for an earlier version of Unreal Engine… but I got gifted a better GPU :)

I made CP2077 run on iGPU at 60 fps. Why? Because fuck PCMR. They said it couldn’t be done.

If I ever start an IT company, I will likely name it “Spite Engineering Inc”.

PS: Here is what the MoA said about your post and my draft of this post (aka sentiment analysis). I ignored it (YOLO) cause I’m /casual today :)

"Relative sentiment: the original post is mildly positive and curious with light frustration about hardware cost and stock cooling; your response is more intense and critical toward LLMs but positive about your own system, with a strong control-oriented and adversarial stance toward non-deterministic tools.

The thematic domain (hardware tinkering, efficiency, making full use of resources, interest in research and tooling) matches well, but your affect is higher-intensity and more hard-edged than the original writer’s. Be careful here; you’re coming on strong and may scare away the original poster.

Question coverage: you directly answer the main use-case question (work, coding, “shooting the shit,” including research-like tasks and sentiment analysis) and you address the “what stage have you actually gotten to?” question implicitly but clearly by stating you “wrote the list” and giving concrete competence examples.

Your reasoning is organically given / flow of consciousness. Consider dot-points and restructuring.

You did not directly respond to their incidental comments about their 12 GB GPU, RAM prices, undervolting/overclocking, or coolers, but those were not phrased as explicit questions and your reply adequately answers the core queries.

Recommendation: you may wish to address the above in a second draft.

Confidence: high | Source: Mixed (context and stored)

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

Well…i used mine to correctly confirm diagnosis of SIH in my wife. SIH is a condition caused by trauma (or in my wife’s case, an osteophyte - aka bone spur) in which the protective sheath around the spinal cord is damaged. This makes spinal fluid leak, causing the brain to compress / sag in the skull via traction. The end result is permanent incapacity / disability. Think fun stuff like blindness, life long pain etc.

The median time to diagnosis is 6 months.

The median time to life long impairment is 8 weeks.

We had her diagnosed and in surgery within 3 weeks.

Now, admittedly, this is an unusual confluence of circumstances (me being in the field + me having access to high quality training data I curated + me having strong interest in LLM and diagnostics) but yeah, people use computers like this for all sorts of life saving shit. My example isn’t even unique -

https://reddit.com/comments/1ij5yf2

I can regale you with other medical stories (eg; like the kid in Kenya who used their $100 phone & 3B-VL on a Pi5 to scan doctor’s handwritten notes, query database and update 10,000 vaccination records, preventing a local measles outbreak, or the other kid prototyping cheap, 3D printed robotic limbs that are bespoke to the user), but suffice it to say computers and self hosted shit actually does save lives.

Get amongst it, I sez. It’s fun and who knows where it might lead.

SuspciousCarrot78@lemmy.world · edit-2 9 days ago

Read the intro here. Just intro

https://tinyurl.com/FUTOguide

Then start with the smallest possible way. Install Jellyfin on your laptop and share it from there to your phone or smart TV.

Just one file or video. Anything really you have or can grab from Internet archive (I think I started with a single episode of Twilight zone). Minimal viable product.

It will still feel over whelming… but if you can do that, you have your foot in door.

Jellyfin, Plex or Emby are gateway drugs. Before you know it, you’ll be ripping the guts out of an old mower, retro fitting it with a raspberry pi and telling your home assistant to mow the lawn.

Why?

Because you can.

SuspciousCarrot78@lemmy.world · edit-2 9 days ago

Well, technically, you don’t need any GPU for the system I’ve set up, because only 2-3 models are “hot” in memory (so about…10GB?) and the rest are cold / invoked as needed. My own GPU is only 8GB (and my prior one was 4GB!). I designed this with low end rigs in mind.

The minimum requirement is probably a CPU equal to or better than mine (i7-8700; not hard to match), 8-10GB RAM and maybe 20GB disk space. Bottom of the barrel would be 4gb but you’ll have to deal with ssd thrashing.

Anything above that is a bonus / tps multiplier.

FYI; CPU only (my CPU at least) + 32gb system RAM, this entire thing runs at about 10-11 tps, which is interactive enough speed / faster than reading speed. Any decent gpu should get you 3-10x that. I designed this for peasant level hardware / to punch GPTs in the dick thru clever engineering, not sheer grunt. Fuck OpenAi. Fuck Nvidia. Fuck DDR6. Spite + ASD > “you can’t do that” :). Yes I fucking can - watch me.

If you want my design philosophy, here is one of my (now shadowbanned) posts from r/lowendgaming. Seeing you’re a gamer, this might make sense to you! The MoA design I have is pure “level 8 spite, zip tie Noctura fan to server grade GPU and stick it in a 1L shoebox” YOLOing :).

It works, but it’s ugly, in a beautiful way.

Lowend gaming iceberg

Level 1

Drop resolution to 720p
Turn off AA, AF, Shadows etc
Vsync OFF
Windowed mode? OK.
Pray for decent FPS

Level 2

Use Nvidia/Intel/AMD control panel for custom tweaks
Create custom low end resolutions (540p, 480p) so GPU can enumerate them to games
Pray for decent FPS

Level 3

Start tweaking .cfg and .ini files like you’re a caveman from the ancient year of 1998
FPS capping? Sure.
FOV size of a keyhole? Do it
Texture filtering hacks / replacements? Rock on.
Pray for decent FPS

Level 4

Time to get serious. Crack open the box - repaste, clean, try to add more ram from anything that even remotely fits. We can hack the timings to match, no problem!
BIOS tweaking time! Let’s see what breaks! Oh…everything.
May as well undervolt and over clock, seeing we’re in here already. Where’s my paperclip…
EDID hacks to make TV / monitor do dumb shit, like run at resolutions it shouldn’t or Hz it pretends it can’t? Why not.
Pray for decent FPS

Level 5

Software time again! Lossless scaling? Sure!
Reshade post processing to sharpen ultra low mush? Ok.!
Integer scaling? Scanlines? Why not
Special K swap chain injection to force low res where no low res exists? Right on.
DXVK? Yolo.
Pray for decent FPS

Level 6

Fuck it; time for real black magic
Hack registry keys in windows settings.
Hex edit settings directly
Make windows believe impossible things, like imaginary VRAM.
Sacrifice boxed copy of Win98 to Linus Torvalds for absolution.
Pray for decent FPS

Level 7

Fine…I’ll do it myself then.
Strip out the game assets and rewrite shaders
No fancy lighting, kill the fill rate, post processing gone.
At this point, you may as well just recode the fucking game from scratch.
Pray for decent FPS

Level 8

Purely driven by spite now.
Franken-mod a $15 eGPU and run it via Pcie adaptor. Flash the vBIOS to do unnatural things.
Everything is overheating. Drill holes in case to improve airflow.
Still too hot; drag in desk fan. Point directly at case. Your PC now sounds like Darth Vader. Neat.
Decompile the games DLLs just to prove you can. Sneer at them.
No longer praying for FPS; now praying for no magic blue smoke.

Level 9

Buy an Xbox.

SuspciousCarrot78@lemmy.world · edit-2 10 days ago

For sure. I can dig where you’re coming from.

For me, I wanted to replace cloud based services for my personal use / in home as primary motivation; it’s only very recently that I am considering things like setting out-of-LAN access for broader family.

(I do have a minimal off site back up (to a raspberry pi stored at my parents home), but obviously this is not enterprise level infra).

My personal quirk is power management. Yes, my rig only uses about 80-100 w…but I can’t stop day dreaming creating a fall over system / bespoke UPS. Back of napkin calc suggest that a single marine / car battery should be able to store enough juice to run it (and my router) for 24hrs. Clunky as it is…the DIY nature of that really appeals to me

https://www.youtube.com/watch?v=1q4dUt1yK0g