Great write up, thanks. For video learners, Wolfgang does a good step-by-step on YouTube
Great write up, thanks. For video learners, Wolfgang does a good step-by-step on YouTube
I’d love you to check back later with your conclusions.
Guide to Self Hosting LLMs with Ollama.
ollama run llama3.2
If it’s an M1, you def can and it will work great. With Ollama.
+1 for Forgejo. I started on Gogs, then gathered that there had been some drama with that and Gitea. Forgejo is FOSS, simple to get going, and comfortable to use if you’re coming from GitHub. It’s actively maintained, and communication with the project is great.
Thanks, I ended up going with Garage, but it has the same issue. I assumed I could just specify some buckets with their keys in the docker-compose or garage.toml, but no - they had to be done through the api or command line.
This is correct, I already installed the minio cli, but when I came back and read this, I tried it out and yes, once garage is running in the container, you can
alias garage="docker exec -ti <container name> /garage"
so you can do the cli things like garage bucket info test-bucket
or whatever. The --help
for the garage
command is pretty great, which is good since they don’t write it up much in the docs.
Thanks. I ended up going with Garage (in Docker), and installed the minio client cli for these tasks.
One I’m writing. I use the host file system (as I have a strong preference for simple) for it’s storage, but I’m interested in adding Litestream for replicating the database onto AWS.
"Convert this text to make it sound like from a random person: "
Love the effort you’ve put into this question. You’ve clearly done some quality research and thinking.
When I asked myself this same question a couple of years ago, I ended up just buying a second hand Synology NAS to use alongside my mini-pc. That would meet your criteria, and avoids the (I’m not sure what magnitude) reliability risk of using disks connected over USB. It’s more proprietary than I’d like, but it’s battle tested and reliable for me.
Shoutout to Magic Earth, the (weirdly named) iOS app that uses OpenStreeMap data. Works on CarPlay, has reliable routing, and I get a buzz out of updating a changed a speed limit or something on OSM and then seeing the change implemented a few weeks later when I’m driving through there again.
starcoder2:latest f67ae0f64584 1.7 GB 3 days ago
phi3:latest d184c916657e 2.2 GB 3 weeks ago
deepseek-coder-v2:latest 8577f96d693e 8.9 GB 3 weeks ago
llama3:8b-instruct-q8_0 1b8e49cece7f 8.5 GB 3 weeks ago
dolphin-mistral:latest 5dc8c5a2be65 4.1 GB 3 weeks ago
codeqwen:latest df352abf55b1 4.2 GB 3 weeks ago
llama3:latest 365c0bd3c000 4.7 GB 4 weeks ago
I mostly use starcoder2 with Continue for code autocomplete, the big deepseek coder is a bit slow (I can feel it thinking), but it and the regular llama3 are good for chatbot type programming questions.
I don’t really have anything to compare the M1 performance to. I guess the 8GB models output text a little slower than the web versions of the same models, and the 4GB ones about the same. Using ollama in the terminal, there’s sometimes a 0.5-2 second pause before it starts outputting. Not with phi3 though - it’s surprisingly snappy for the quality of answers.
An M1 MacBook with 16GB cheerfully runs llama3:8b outputting about 5 words a second. A second hand MacBook like that probably costs half to a third of a secondhand RTX3090.
It must suck to be a bargain hunting gamer. First bitcoin, and now AI.
edit: a letter
I use the Continue VS Code plugin with Ollama to use a couple of different models (deepseek-coder-v2 & starcoder2) to recreate a local only Github Copilot type experience for coding. This is on an M1 Apple Silicon though. For autocomplete the generation needs to be pretty brisk - I’m not sure how that would go in a VM without a GPU.
Yep, I think there’s sound arguments for separating out your storage (NAS) and network (router/DNS/PiHole) infrastructure. After that, whatever suits your purpose. I virtualise all my serious services on one machine under Proxmox (mostly for ease of snapshots) then have another machine for things I’m fiddling with, usually again under Proxmox so they are easy to move to production when I’m happy with them.
My NAS and production server run 24/7, I’ve got a dev server that I turn off if I’m not expecting to use it for a week or so. Usually when I do that, I immediately need it for something and I’m away from home. I have chosen equipment to try and minimize energy use to allow for constant running.
My view on UPS is it’s a crucial part of getting your availability percentage up. As my home lab turned into crucial services I used to replace commercial cloud options, that became more important to me. Whether it is to you will depend on what you’re running and why.
I’ve heard that one of the most likely times for hard drives to fail is on power up, and it also makes sense to me that the heating/cooling cycles would be bad for the magnetic coating, so my NAS is configured to keep them spinning, and it hasn’t been turned off since I last did a drive change.
I agree. Get a domain name, point it to the internal address of your NGINX Proxy manager (or other reverse proxy that manages certificates that you are used to). A bit of work initially, then trivial to add services afterwards.
I didn’t really need encryption for my internal services (although I guess that’s good), but I kept getting papercuts with browser warnings, not being able to save passwords, and some services (eg container repository on Forgejo) just flat out refusing to trust a http connection.
My step-up from Pi was to ebay HP 800 G1 minis then G2’s. They are really well made, there’s full repair manuals available, and they are just a pleasure to swap bits in and out. I’ve heard good things about, and expect similar build quality from the 1 liter Lenovos.
I agree that RAM is a likely constraint rather than processor for self-hosting workloads. Particularly in my case as I’m on Proxmox and run all my docker containers in separate LXCs. I run 32GB in the G2’s which was a straightforward upgrade (they take laptop like memory). One some of them I’ve upgraded the SSDs, or if not, I’ve added M.2 NVME drives (that the G2’s have a slot for).
Build anything small into a container on your laptop, push it to DockerHub or the Github package registry then host it on fly.io for free.