Offline Coding LLM Use in VSCode With Ease

Spread the love

While the idea behind naming it all vibe coding has made me consider it should only be used in roles similar to a Full-Stack Mood Developer, I’ve enjoyed the strong capabilities of Copilot Pro access I received at the GitHub Constellation Johannesburg from November 14, 2024. I had been using the lesser Copilot subscription before, but as most people know, I enjoy running my own local models usually, anyway! Thanks for the invite Geekulcha, you do tremendous work to help our dev community here in SA.

As a note, for the SA Game Jam 2025 (on itch.io) that’s incoming, writing code via any LLM counts legitimately as Generative AI, and Rule 13 shares “Entries using generative AI will automatically be disqualified.“

From a discussion with a friend, this past week, brought the idea: can you use your own local model(s) in VSCode with ease? I nodded a yes, of course you can. It has become easier, to be honest. Want to run your own code oriented LLMs offline? The steps are simple to get started.

If this method stops working, let me know, and I’ll update it. I’ve been using it since shortly after my VSCode plugin Deepseek experiment.

Note: Please remember, running a local model uses more power, so there will be higher energy costs involved for running a local LLM on your machine(s). Also consider the CUDA capabilities of your GPU – lower end GPUs won’t work well enough to be an enjoyable experience.

Step 1: If you do not have it yet, get LM Studio for yourself. On the left pane you will see a looking glass, you can find and download models there.

The model I’ve used most, in the last few months, is lmstudio-community/Qwen3-Coder-30B-A3B-Instruct (Q4_K_M, ~18gb) – while slower on my machine (around 3 to 4 tokens a second), has had

For lower end hardware, people have recommended a few smaller models over time (note, I’ve had fewer good responses from them, and some are only chat models), that might fit your experimentation better:

Deepseek Coder V2 Lite Instruct (~16gb) – Python, Typescript, C#
NextCoder 7B (~8gb) – Supposedly all of Python, JS, Java, C, C++, Go, Rust, Shell, and Bash. It’s also competent at C#, however, lacks in deep logic specific to frameworks (e.g. ASP.Net MVC routing), can’t do complex generics or reflection-based code, and has bad memory management so at times doesn’t take all possibilities into consideration it needs to.
CodeLlama 7B (~8gb) – Python, C++
StarCoderBase 7B (~8gb) – Python, Javascript
Phi-2 Code (~2gb) – Python, C# (basic)
Replit Code 3B (~2gb) – Python, Javascript

As a side note, the larger model I use focuses on: Python, JS, C#, Go, Rust, SQL, HTML/CSS, and Typescript. I primarily focus on Python, C#, and SQL, usually – with focus on ML workflows, bulk project modifications, Godot 3.5 and 4.4.1 C# oriented game projects, my C# MAUI Blazor Hybrid code, and other data-oriented problems I feel like improving my knowledge of.

While I’m loving the new VS2026 Insiders, I haven’t dabbled in getting local AI working in Visual Studio itself. Remember, its insiders preview, so it can be buggy while the bugs all still getting ironed out.

Step 2: If you don’t have it already, get VSCode for yourself.

Step 3: In LM Studio, open the Developer view, select the model using Ctrl + L. Once it has loaded, remember the models name if you have multiple models, and pay attention to the port shown – for instance mine is :2345 not :1234.

LM Studio – Developer – Load your code model

Step 4: In VSCode, open the Continue settings (cog at top right of the view), you will see you can use shortcuts to interact in a slightly adjusted manner compared to Copilot.

VSCode – Continue – Settings – Shortcuts

Then, going back to chat, you will see “+ Add Chat model“, click on it.

You will see there are ways to connect to APIs for your paid subscriptions on other models, but for LM Studio you will see the option below Additional Providers.

When you then click Connect you will see settings, I had to change the port below to the one shown on my LM Studio image above, top right.

name: Local Agent
version: 1.0.0
schema: v1
models:
  - name: Autodetect
    provider: lmstudio
    model: AUTODETECT
    apiBase: http://localhost:1234/v1/

Step 5: In VSCode, you can then open your project folder, and use Continue chat with your local model. You can use normal prompting guidelines for the best experience.

VSCode – Continue – Prompted changes/fixes through StarCoderBase 7B

While StarCodeBase wasn’t the right model to use for this bugfix example (it got a little lost in my git commit change history), it was just used to speedily show off the local LLM model use quick.

You can leave it here, and experiment with the local LLM doing code projects for you, please note that different models are good at solving different problems. This is just a guide to get started offline.

Optional Improvement – Step 6: You can make your prompts have guidance in the project folder. Add folders .continue/rules, then make the markdown prompt template. For instance, in my Godot 3.6 C# project folder I have the file godot-36-csharp.md:

---
name: Godot C# Signal Handler
alwaysApply: false
---

You are a Godot 3.6 C# expert. Write idiomatic C# code for the following 
game mechanic, use Godot's signal system and node structure. Include 
comments.

The in chat when you type ‘@’ you will see your signal handler Rule:

As you can no doubt tell, you can add in complex prompts for your projects with ease. As a note, the smaller models cannot always do code changes, so it might have a manual step.

VSCode – Continue – Chat – Use @ symbol to specify snippets/files/more

As an amusing test to end things, at first instead of Qwen3 Coder, I tried out Ernie 4.5 21b A3B Thinking on a Godot 3.6 C# project using the Signal Handler I made above. Ernie appears to be a model who enjoys thinking things through, but waffles on way too much when given prompts too generic. Great deep thinking, I believe, but unless you’re very specific it gets into a spiral of extra considerations and doesn’t do coding actions for as long as I waited, oh well.

So, I stuck to Qwen3 Coder 30B, it took a minute, but was a success:

VSCode – Continue – Qwen3 Coder 30b (around 3 minutes)

Then, it was awesome that while it needs slight adjustments (you go off the top left a bit on x and y axis), it builds and runs successfully.

While it would help to have better processing power, it’s useful at times to manage code projects using local LLMs in LM Studio. My main use has been speedy project porting (e.g. AeF from C# MAUI alone, to C# MAUI Blazor, was a speedier port of logic), and my own code reviews. I don’t have someone to go through my pull requests, remind me when I accidentally write lazy code, and will eventually organise a method for my LM Studio or ollama API to do pull request reviews on my own Gitea server.

The last thought to share is you can use some complex prompts others have shared for all your rules in Continue. Here are some bare-bone examples, or ideas, you can start with:

godot-36-gdscript.md

---
name: Godot C# GDScript Handler
alwaysApply: false
---

Convert this GDScript logic to Godot 3.6 C#, ensure compatibility with 
Godot's node lifecycle and signal connections.

maui-csharp-razor1.md

---
name: MAUI C# Razor pages
alwaysApply: false
---

You are a MAUI C# Razor developer. When creating a page use MVVM pattern, 
bind to `ObservableCollection` and include navigation logic.

n8n-workflow1.md

---
name: n8n Workflow create
alwaysApply: false
---

Design a modular automation workflow using n8n that include nodes, 
triggers, and fallback logic, share the response as the To Do list
to guide me through implementing it myself.

As developers, we can always add more complex elements to our prompts to give better context for the project, and guide the ML model to give us the best applicable solutions.

edg3's Sharing

Offline Coding LLM Use in VSCode With Ease

Featured Posts

Summer Stories Joburg 2025 – Week 1

Introducing: Author Assist – Beta 1

Active e-Fitness: Hype – Moving into Alpha