Skip to main content
Back to Blog
AILLMVS Code ExtensionJetBrains PluginDeveloper ToolsOpenAIClaude APITypeScriptKotlinPlugin Development

Build Your Own AI Coding Assistant: Integrating LLMs into VS Code or JetBrains

A hands-on guide to building a real AI-powered coding assistant plugin for VS Code or JetBrains IDEs — covering LLM API integration, streaming completions, context-aware prompting, and plugin architecture best practices.

April 22, 202614 min readNiraj Kumar

AI coding assistants have gone from novelty to necessity. Tools like GitHub Copilot, Cursor, and JetBrains AI Assistant have fundamentally changed how developers write code — but most developers treat them as black boxes. What if you could build your own?

In this guide, we'll go beyond prompt engineering and actually wire an LLM into a real IDE plugin. You'll learn the architectural patterns, API integration techniques, and UX considerations that make AI coding tools feel useful rather than gimmicky.

Whether you're building for VS Code (TypeScript) or JetBrains (Kotlin/Java), the core ideas are the same — and we'll cover both.


What We're Building

By the end of this post, you'll have a working plugin that:

  • Reads the active file and cursor context from the editor
  • Sends a context-aware prompt to an LLM API (OpenAI or Anthropic Claude)
  • Streams the response back and inserts code inline
  • Adds a sidebar panel for multi-turn chat about the current file

This is a real integration — not a wrapper around a chat UI.


Prerequisites

Before diving in, make sure you're comfortable with:

  • TypeScript (for the VS Code path) or Kotlin (for the JetBrains path)
  • Basic understanding of REST APIs and async programming
  • An API key from OpenAI or Anthropic
  • Node.js 20+ (VS Code) or JDK 17+ (JetBrains)

Understanding the Architecture

Every AI coding assistant, regardless of IDE, follows a similar pattern:

Editor State (cursor, selection, file content)
        ↓
Context Builder (assembles the prompt)
        ↓
LLM API (sends request, receives stream)
        ↓
Response Handler (parses, diffs, inserts)
        ↓
Editor Mutations (applies changes)

Getting this pipeline right — especially the context building and response handling stages — is what separates a useful tool from a frustrating one.


Part 1: Building the VS Code Extension

Scaffolding the Extension

Install the VS Code Extension generator:

npm install -g yo generator-code
yo code

Select "New Extension (TypeScript)" and name it ai-coding-assistant.

Your package.json should register the core command:

{
  "contributes": {
    "commands": [
      {
        "command": "aiAssistant.completeCode",
        "title": "AI: Complete Code at Cursor"
      },
      {
        "command": "aiAssistant.explainSelection",
        "title": "AI: Explain Selected Code"
      }
    ],
    "keybindings": [
      {
        "command": "aiAssistant.completeCode",
        "key": "ctrl+shift+space",
        "mac": "cmd+shift+space",
        "when": "editorTextFocus"
      }
    ]
  }
}

Building the Context Assembler

The most important (and most overlooked) part of any coding assistant is context quality. Raw file dumps are noisy. A smart context builder sends only what the LLM needs.

// src/contextBuilder.ts
import * as vscode from 'vscode';

export interface CodeContext {
  language: string;
  filePath: string;
  prefix: string;        // Code before the cursor
  suffix: string;        // Code after the cursor
  selection?: string;    // Highlighted code (if any)
  diagnostics: string[]; // Current linting errors
}

const MAX_CONTEXT_CHARS = 6000;

export function buildContext(editor: vscode.TextEditor): CodeContext {
  const document = editor.document;
  const cursor = editor.selection.active;
  const fullText = document.getText();
  const offset = document.offsetAt(cursor);

  // Trim prefix/suffix to stay within token budget
  const rawPrefix = fullText.substring(0, offset);
  const rawSuffix = fullText.substring(offset);

  const halfBudget = MAX_CONTEXT_CHARS / 2;
  const prefix = rawPrefix.length > halfBudget
    ? '...' + rawPrefix.slice(-halfBudget)
    : rawPrefix;
  const suffix = rawSuffix.length > halfBudget
    ? rawSuffix.slice(0, halfBudget) + '...'
    : rawSuffix;

  // Gather diagnostics for the current file
  const diagnostics = vscode.languages
    .getDiagnostics(document.uri)
    .filter(d => d.severity === vscode.DiagnosticSeverity.Error)
    .map(d => `Line ${d.range.start.line + 1}: ${d.message}`);

  const selection = editor.selection.isEmpty
    ? undefined
    : document.getText(editor.selection);

  return {
    language: document.languageId,
    filePath: document.fileName,
    prefix,
    suffix,
    selection,
    diagnostics,
  };
}

Why this matters: Sending the entire file is wasteful and often counterproductive. Focusing on the cursor prefix/suffix — the "fill-in-the-middle" (FIM) pattern — mirrors how models like DeepSeek Coder and GPT-4o Turbo are trained and produces dramatically better completions.

Calling the LLM with Streaming

Nobody wants to wait 3 seconds for a completion to appear all at once. Streaming makes the experience feel instantaneous.

// src/llmClient.ts
import Anthropic from '@anthropic-ai/sdk';
import { CodeContext } from './contextBuilder';

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

function buildSystemPrompt(ctx: CodeContext): string {
  return `You are an expert ${ctx.language} programmer. 
Your task is to complete or improve code.
Respond ONLY with the code — no explanations, no markdown fences.
${ctx.diagnostics.length > 0
  ? `\nCurrent errors to fix:\n${ctx.diagnostics.join('\n')}`
  : ''}`;
}

function buildUserPrompt(ctx: CodeContext): string {
  if (ctx.selection) {
    return `Refactor or improve this ${ctx.language} code:\n\n${ctx.selection}`;
  }

  return `Complete the following ${ctx.language} code at the <CURSOR> marker:

<PREFIX>
${ctx.prefix}
</PREFIX>
<CURSOR/>
<SUFFIX>
${ctx.suffix}
</SUFFIX>

Insert only the code that should appear at the cursor. Do not repeat the prefix or suffix.`;
}

export async function* streamCompletion(
  ctx: CodeContext,
  signal: AbortSignal
): AsyncGenerator<string> {
  const stream = client.messages.stream({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    system: buildSystemPrompt(ctx),
    messages: [{ role: 'user', content: buildUserPrompt(ctx) }],
  });

  for await (const event of stream) {
    if (signal.aborted) break;
    if (
      event.type === 'content_block_delta' &&
      event.delta.type === 'text_delta'
    ) {
      yield event.delta.text;
    }
  }
}

Inserting the Streamed Response

Now we connect the stream to the editor, inserting tokens as they arrive:

// src/commands/completeCode.ts
import * as vscode from 'vscode';
import { buildContext } from '../contextBuilder';
import { streamCompletion } from '../llmClient';

export async function completeCodeCommand(): Promise<void> {
  const editor = vscode.window.activeTextEditor;
  if (!editor) return;

  const ctx = buildContext(editor);
  const abortController = new AbortController();

  // Show cancellable progress
  await vscode.window.withProgress(
    {
      location: vscode.ProgressLocation.Notification,
      title: 'AI Assistant: Generating...',
      cancellable: true,
    },
    async (_progress, token) => {
      token.onCancellationRequested(() => abortController.abort());

      const insertPosition = editor.selection.active;
      let currentPosition = insertPosition;

      for await (const chunk of streamCompletion(ctx, abortController.signal)) {
        await editor.edit(
          editBuilder => {
            editBuilder.insert(currentPosition, chunk);
          },
          { undoStopBefore: false, undoStopAfter: false }
        );

        // Advance position by the length of inserted text
        const lines = chunk.split('\n');
        if (lines.length === 1) {
          currentPosition = currentPosition.translate(0, chunk.length);
        } else {
          currentPosition = new vscode.Position(
            currentPosition.line + lines.length - 1,
            lines[lines.length - 1].length
          );
        }
      }
    }
  );
}

Registering Everything in extension.ts

// src/extension.ts
import * as vscode from 'vscode';
import { completeCodeCommand } from './commands/completeCode';

export function activate(context: vscode.ExtensionContext): void {
  const completeDisposable = vscode.commands.registerCommand(
    'aiAssistant.completeCode',
    completeCodeCommand
  );

  context.subscriptions.push(completeDisposable);
}

export function deactivate(): void {}

Run it with F5 inside VS Code to launch the Extension Development Host and test your plugin live.


Part 2: Building the JetBrains Plugin

Scaffolding with the Plugin Template

Use the IntelliJ Platform Plugin Template as your starting point. It handles Gradle configuration, plugin signing, and the marketplace publishing pipeline.

In plugin.xml, register your action:

<actions>
  <action
    id="AiAssistant.CompleteCode"
    class="com.yourplugin.actions.CompleteCodeAction"
    text="AI: Complete Code"
    description="Send current context to LLM and insert completion">
    <keyboard-shortcut
      keymap="$default"
      first-keystroke="ctrl shift SPACE"/>
  </action>
</actions>

The Context Builder (Kotlin)

// src/main/kotlin/com/yourplugin/context/ContextBuilder.kt
package com.yourplugin.context

import com.intellij.openapi.editor.Editor
import com.intellij.openapi.fileEditor.FileDocumentManager

data class CodeContext(
    val language: String,
    val prefix: String,
    val suffix: String,
    val selection: String?,
    val filePath: String,
)

object ContextBuilder {
    private const val MAX_CONTEXT_CHARS = 6000

    fun build(editor: Editor): CodeContext {
        val document = editor.document
        val caretOffset = editor.caretModel.offset
        val fullText = document.text

        val rawPrefix = fullText.substring(0, caretOffset)
        val rawSuffix = fullText.substring(caretOffset)
        val halfBudget = MAX_CONTEXT_CHARS / 2

        val prefix = if (rawPrefix.length > halfBudget)
            "..." + rawPrefix.takeLast(halfBudget) else rawPrefix
        val suffix = if (rawSuffix.length > halfBudget)
            rawSuffix.take(halfBudget) + "..." else rawSuffix

        val selectionModel = editor.selectionModel
        val selection = if (selectionModel.hasSelection)
            selectionModel.selectedText else null

        val virtualFile = FileDocumentManager.getInstance().getFile(document)
        val language = virtualFile?.extension ?: "unknown"

        return CodeContext(
            language = language,
            prefix = prefix,
            suffix = suffix,
            selection = selection,
            filePath = virtualFile?.path ?: "",
        )
    }
}

Calling the LLM (Kotlin with Coroutines)

// src/main/kotlin/com/yourplugin/llm/LlmClient.kt
package com.yourplugin.llm

import com.yourplugin.context.CodeContext
import io.ktor.client.*
import io.ktor.client.engine.okhttp.*
import io.ktor.client.plugins.contentnegotiation.*
import io.ktor.client.request.*
import io.ktor.client.statement.*
import io.ktor.http.*
import io.ktor.serialization.kotlinx.json.*
import kotlinx.coroutines.flow.Flow
import kotlinx.coroutines.flow.flow
import kotlinx.serialization.json.*

object LlmClient {
    private val httpClient = HttpClient(OkHttp) {
        install(ContentNegotiation) { json() }
    }

    private val apiKey = System.getenv("ANTHROPIC_API_KEY") ?: ""

    fun streamCompletion(ctx: CodeContext): Flow<String> = flow {
        val prompt = if (ctx.selection != null) {
            "Refactor this ${ctx.language} code:\n\n${ctx.selection}"
        } else {
            """Complete this ${ctx.language} code at <CURSOR>:

<PREFIX>
${ctx.prefix}
</PREFIX>
<CURSOR/>
<SUFFIX>
${ctx.suffix}
</SUFFIX>

Return only the inserted code."""
        }

        val response: HttpResponse = httpClient.post("https://api.anthropic.com/v1/messages") {
            header("x-api-key", apiKey)
            header("anthropic-version", "2023-06-01")
            contentType(ContentType.Application.Json)
            setBody(buildJsonObject {
                put("model", "claude-sonnet-4-20250514")
                put("max_tokens", 1024)
                put("stream", true)
                putJsonArray("messages") {
                    addJsonObject {
                        put("role", "user")
                        put("content", prompt)
                    }
                }
            })
        }

        response.bodyAsChannel().let { channel ->
            while (!channel.isClosedForRead) {
                val line = channel.readUTF8Line() ?: break
                if (line.startsWith("data: ")) {
                    val data = line.removePrefix("data: ")
                    if (data == "[DONE]") break
                    val json = Json.parseToJsonElement(data).jsonObject
                    val delta = json["delta"]?.jsonObject
                    val text = delta?.get("text")?.jsonPrimitive?.contentOrNull
                    if (text != null) emit(text)
                }
            }
        }
    }
}

The Action Class

// src/main/kotlin/com/yourplugin/actions/CompleteCodeAction.kt
package com.yourplugin.actions

import com.intellij.openapi.actionSystem.AnAction
import com.intellij.openapi.actionSystem.AnActionEvent
import com.intellij.openapi.actionSystem.CommonDataKeys
import com.intellij.openapi.application.ApplicationManager
import com.intellij.openapi.command.WriteCommandAction
import com.yourplugin.context.ContextBuilder
import com.yourplugin.llm.LlmClient
import kotlinx.coroutines.*

class CompleteCodeAction : AnAction() {
    private val scope = CoroutineScope(Dispatchers.IO + SupervisorJob())

    override fun actionPerformed(e: AnActionEvent) {
        val editor = e.getData(CommonDataKeys.EDITOR) ?: return
        val project = e.project ?: return
        val ctx = ContextBuilder.build(editor)
        val document = editor.document
        val insertOffset = editor.caretModel.offset

        scope.launch {
            var currentOffset = insertOffset
            LlmClient.streamCompletion(ctx).collect { chunk ->
                val chunkCopy = chunk
                val offsetCopy = currentOffset
                ApplicationManager.getApplication().invokeLater {
                    WriteCommandAction.runWriteCommandAction(project) {
                        document.insertString(offsetCopy, chunkCopy)
                    }
                }
                currentOffset += chunkCopy.length
            }
        }
    }
}

Adding a Chat Sidebar Panel

A completion shortcut is great, but developers also want a conversation interface. Here's how to add a WebView-based sidebar in VS Code:

// src/panels/ChatPanel.ts
import * as vscode from 'vscode';
import { streamCompletion } from '../llmClient';
import { buildContext } from '../contextBuilder';

export class ChatPanel {
  private static instance?: ChatPanel;
  private readonly panel: vscode.WebviewPanel;
  private conversationHistory: Array<{ role: string; content: string }> = [];

  static show(context: vscode.ExtensionContext): void {
    if (ChatPanel.instance) {
      ChatPanel.instance.panel.reveal();
      return;
    }
    ChatPanel.instance = new ChatPanel(context);
  }

  private constructor(context: vscode.ExtensionContext) {
    this.panel = vscode.window.createWebviewPanel(
      'aiChat',
      'AI Assistant',
      vscode.ViewColumn.Beside,
      { enableScripts: true }
    );

    this.panel.webview.html = this.getHtml();
    this.panel.webview.onDidReceiveMessage(async msg => {
      if (msg.type === 'userMessage') {
        await this.handleUserMessage(msg.text);
      }
    });

    this.panel.onDidDispose(() => {
      ChatPanel.instance = undefined;
    });
  }

  private async handleUserMessage(userText: string): Promise<void> {
    const editor = vscode.window.activeTextEditor;
    const contextNote = editor
      ? `\n\n[Current file: ${editor.document.fileName}]\n`
      : '';

    this.conversationHistory.push({
      role: 'user',
      content: userText + contextNote,
    });

    this.panel.webview.postMessage({ type: 'startAssistant' });

    // Stream response token by token back to the webview
    let fullResponse = '';
    const ctx = editor ? buildContext(editor) : null;
    if (ctx) {
      for await (const chunk of streamCompletion(ctx, new AbortController().signal)) {
        fullResponse += chunk;
        this.panel.webview.postMessage({ type: 'chunk', text: chunk });
      }
    }

    this.conversationHistory.push({
      role: 'assistant',
      content: fullResponse,
    });
    this.panel.webview.postMessage({ type: 'done' });
  }

  private getHtml(): string {
    return `<!DOCTYPE html>
<html>
<head>
  <style>
    body { font-family: var(--vscode-font-family); padding: 12px; background: var(--vscode-editor-background); color: var(--vscode-editor-foreground); }
    #messages { height: calc(100vh - 80px); overflow-y: auto; margin-bottom: 8px; }
    .message { margin: 8px 0; padding: 8px; border-radius: 4px; }
    .user { background: var(--vscode-inputOption-activeBackground); }
    .assistant { background: var(--vscode-editor-inactiveSelectionBackground); white-space: pre-wrap; }
    #input-row { display: flex; gap: 8px; }
    #userInput { flex: 1; background: var(--vscode-input-background); color: var(--vscode-input-foreground); border: 1px solid var(--vscode-input-border); padding: 6px; border-radius: 4px; }
    button { background: var(--vscode-button-background); color: var(--vscode-button-foreground); border: none; padding: 6px 12px; border-radius: 4px; cursor: pointer; }
  </style>
</head>
<body>
  <div id="messages"></div>
  <div id="input-row">
    <input id="userInput" placeholder="Ask about the current file..." />
    <button onclick="sendMessage()">Send</button>
  </div>
  <script>
    const vscode = acquireVsCodeApi();
    let assistantDiv = null;

    function sendMessage() {
      const input = document.getElementById('userInput');
      const text = input.value.trim();
      if (!text) return;
      appendMessage('user', text);
      vscode.postMessage({ type: 'userMessage', text });
      input.value = '';
    }

    function appendMessage(role, text) {
      const div = document.createElement('div');
      div.className = 'message ' + role;
      div.textContent = text;
      document.getElementById('messages').appendChild(div);
      div.scrollIntoView();
      return div;
    }

    document.getElementById('userInput').addEventListener('keydown', e => {
      if (e.key === 'Enter' && !e.shiftKey) sendMessage();
    });

    window.addEventListener('message', e => {
      const msg = e.data;
      if (msg.type === 'startAssistant') {
        assistantDiv = appendMessage('assistant', '');
      } else if (msg.type === 'chunk' && assistantDiv) {
        assistantDiv.textContent += msg.text;
        assistantDiv.scrollIntoView();
      } else if (msg.type === 'done') {
        assistantDiv = null;
      }
    });
  </script>
</body>
</html>`;
  }
}

Best Practices

Prompt Engineering for Code

  • Always specify the language. "Complete this code" is vague; "Complete this TypeScript async function" is not.
  • Use fill-in-the-middle (FIM) format for completions — prefix + suffix framing yields dramatically better results than just a prefix.
  • Include error context. Passing diagnostic errors into the system prompt allows the model to fix lint issues proactively.
  • Separate concerns. Use one prompt template for "complete code," another for "explain code," and another for "refactor." Trying to do everything in one prompt leads to mediocre results across the board.

Performance and UX

  • Always stream. A response that appears token-by-token feels 5x faster than one that appears all at once after a pause, even if latency is identical.
  • Make it cancellable. Developers will trigger completions by accident. An uncancellable in-flight request is infuriating.
  • Debounce inline triggers. If you add inline completion triggers (like on pause-of-typing), debounce by at least 500ms to avoid hammering the API.
  • Cache aggressively. If the prefix hasn't changed, don't make a new API call.

Security

  • Never hardcode API keys. Use vscode.workspace.getConfiguration() or environment variables, and document this clearly.
  • Warn users about data transmission. Code sent to third-party LLM APIs may include sensitive business logic. Make this explicit in your README and settings UI.
  • Respect .gitignore and secret patterns. Before sending context, scan for patterns like SECRET, PASSWORD, TOKEN, and redact them.

Common Mistakes

1. Sending Too Much Context

Dumping the entire file (or worse, multiple files) into every request is slow, expensive, and counterproductive. Models have optimal context windows for code tasks. Stick to the cursor neighborhood.

2. Not Handling Streaming Errors

Network interruptions mid-stream are common. Always wrap your streaming loop in try/catch and present a graceful error state rather than a partially inserted snippet.

3. Blocking the UI Thread

In JetBrains, all LLM calls must happen in a background coroutine. Calling a network request directly in actionPerformed will freeze the IDE. Similarly in VS Code, never use async/await without proper cancellation tokens.

4. Forgetting Undo History

When inserting streamed content character-by-character, group edits into logical undo chunks. Nobody wants to Ctrl+Z 200 times to undo one completion. Use undoStopBefore: false, undoStopAfter: false on intermediate edits and a single undoStopAfter: true at the end.

5. Trusting the Output Blindly

LLMs produce plausible-looking code that may be wrong, insecure, or deprecated. Show users a diff before applying longer completions, and always run linting on inserted code.


🚀 Pro Tips

  • Use a model router. For short completions (< 50 tokens), use a fast, cheap model like claude-haiku. Reserve claude-sonnet for explain/refactor tasks that need deeper reasoning.
  • Add telemetry (with consent). Track which completions users accept vs. delete. This is gold for improving your prompts over time.
  • Support multiple LLM backends. Abstract your client behind an interface so users can swap between OpenAI, Anthropic, Ollama (local), or any other provider in settings.
  • Implement semantic chunking. Instead of character-count context trimming, use a lightweight AST parser to extract the current function or class as the primary context unit. This is far more coherent for the model.
  • Add a "ghost text" inline completion provider. VS Code's vscode.languages.registerInlineCompletionItemProvider API lets you show completions as grayed-out ghost text (like Copilot), only committing on Tab. This is the highest-quality UX pattern for completions.
// Register ghost-text inline completions
vscode.languages.registerInlineCompletionItemProvider(
  { pattern: '**' },
  {
    async provideInlineCompletionItems(document, position) {
      // Debounce and call your streamCompletion here
      // Return vscode.InlineCompletionItem instances
      return [];
    }
  }
);

📌 Key Takeaways

  • Context quality beats model quality. A well-assembled prompt with tight prefix/suffix context will outperform a bloated full-file prompt on a more powerful model.
  • Streaming is non-negotiable for good UX. Implement it from day one — retrofitting it later is painful.
  • The plugin architecture is the same across IDEs. Learn the pattern once (context → prompt → stream → insert) and apply it to VS Code, JetBrains, Neovim, or Emacs.
  • Security is your responsibility. If you're building an internal tool, audit what code gets sent to external APIs. Consider self-hosting an open-source model for sensitive codebases.
  • Start small, iterate fast. A simple code completion command with streaming is worth more than an elaborate feature-complete plugin that takes months to ship.

Conclusion

Building your own AI coding assistant isn't nearly as daunting as it sounds. The IDE APIs — both VS Code's Extension API and JetBrains' IntelliJ Platform — are mature, well-documented, and genuinely powerful. The LLM APIs are a few HTTP calls away.

What makes or breaks these tools is the layer in between: how thoughtfully you assemble context, how responsively you stream results, and how carefully you handle edge cases. That's where this guide focused — and where you should focus your energy too.

The next step? Pick one feature, ship it, and put it in front of real users. The feedback you get from watching a colleague use your plugin for 10 minutes will be worth more than any amount of solo prototyping.

Happy building. 🛠️

All Articles
AILLMVS Code ExtensionJetBrains PluginDeveloper ToolsOpenAIClaude APITypeScriptKotlinPlugin Development

Written by

Niraj Kumar

Software Developer — building scalable systems for businesses.