Here is a setup that gives you the best of both worlds: AI models running on your own hardware, secure access from anywhere, and an automatic fallback to the cloud when your local machine is busy or offline. It is useful for privacy-focused work, for cutting cost, and for keeping AI available when the internet is not.
What you need
A Mac with Apple Silicon, or a PC with a capable GPU. 16 GB of memory for small models, 32 GB or more for larger ones.
LM Studio (free) to run the models.
Tailscale (free for personal use) for secure remote access.
An orchestration layer if you want automatic fallback. Optional, but it is where the magic is.
Step 1: Run a model locally
Install LM Studio and download a model. On Apple Silicon, a small coding model is a good starting point on 16 to 24 GB of memory, and a larger version is comfortable on 48 GB or more. Load the model, start with a modest context length, turn on cache quantization to save memory, and start the local server. Confirm it is serving on its port.
Step 2: Reach it from anywhere with Tailscale
Tailscale builds a private network between your devices. No port forwarding, no firewall rules, nothing exposed to the open internet. Install it on the machine running LM Studio, sign in, and note the private address it assigns. One detail that trips people up: set LM Studio to listen on all network interfaces, not just localhost, so it accepts connections over Tailscale.
Step 3: Point your AI client at the local server
In your orchestration tool, register the LM Studio server as a provider using that private address. Give the model a short alias so it is easy to call. Restart, and you have a local model you can reach from any of your devices.
Step 4: Add the fallback chain
This is the part that makes it reliable. If you have more than one machine, register each as a provider, then set an order:
Try the most capable machine first.
Fall back to the always-on machine if the first is offline.
Fall back to a cloud model if both are unavailable.
If both local machines are off, you wait a few seconds and the cloud takes over. You never hit a dead end.
Pick a model that fits your memory
Memory available | Model size | Context length |
|---|---|---|
8 to 16 GB | 3B to 7B | 2,048 to 4,096 |
16 to 24 GB | 7B to 13B | 4,096 to 8,192 |
24 to 48 GB | 13B to 32B | 8,192 to 16,384 |
48 GB and up | 32B and up | 16,384 and up |
A few things that save you time
Close memory-hungry apps before loading a large model.
Start with a small context length and raise it once the model loads cleanly.
Wired beats wireless for the machine serving the model.
If a model will not load, drop the context length or step down to a smaller model before anything else.
Why bother
Building a local AI setup is not only about saving money or protecting privacy, though it does both. It is about owning the tools that shape your work. Whether you are an educator protecting student data, a consultant handling sensitive information, or someone who just wants more control, this gives you flexibility and independence. The future of AI is not only in the cloud. Some of it can sit right on your desk.
If you want help thinking through what a setup like this looks like for a school or district, reach out.
Dr. Chris Sanzeri is the founder of Evalve Consulting, an AI implementation practice for education organizations. He spent 15+ years in education leadership and builds custom AI tools, automations, and local AI systems for schools and districts.
