Build AI Agents(5)-Models

We can build our projects with agents in VS Code, and we use the agent to plan a solution, create and edit multiple files, run commands, and fix their own errors, all from a single natural-language prompt.

• We describe what we want, the agent does the work.

We can hand off tasks to autonomous AI agents that iterate until the job is done. For example, we can use the agent to build a simple personal portfolio page with HTML, CSS and JavaScript.

Create a project folder

Agents usually works in the context of a folder, also known as a workspace, and agents can work across multiple workspaces without opening a separate window for each one.

• We can run any agents with any model across our full project

$ mkdir myportfolio$ git init

We can add additional components and scaffolders based on our development workflow.

• Scaffolding tools like Yeoman and packages from the npm package manager provide templates and tools to create projects.

Start an agent session

The agent sessions are grouped by workspaces by default, we can use the session list to switch between sessions.

• We can configure customizations to modify the agent's behavior to match our coding practices.

Here are the default configuration options:

1. Agent: the generic agent for performing the task. For specialized tasks, we could create custom agents, like a code review or testing agent.
2. Language model: depending on our setup, we can choose from multiple language models and configure additional settings.
3. Default Approvals: the agent will automatically approve safe actions but will ask for our approval for potentially risky actions.
4. Folder & branch: the agent works directly on the files in our folder and commits to the current branch.

VS Code has built-in support for AI features like inline suggestions and AI agents that help us with coding tasks. Enter the following prompt in the chat input and press Enter:

Create a personal portfolio page with HTML, CSS, and JavaScript in separate files. Include a header with my name and a short bio, a section for projects with cards, and a contact section. Use modern styling and add some sample content.

The agent will analyze our request, plan the work, and then start creating and editing files. If it encounters errors, it self-corrects or asks for clarification and approval.

Iterate on the design

The Agents window is great for workflows where we hand off tasks to the agent and then validate the outcome, rather than the specific code.

For web-based applications, we can preview the agent's work in the integrated browser without having to leave VS Code.

The agent adds the selected element to our prompt as context, including its HTML, CSS, and a screenshot, so we can continue to enter prompt that describes the change we want, and press Enter.

Use a gradient color for the text and use cursive.

The agent will apply the changes to the element that we selected, then we can refresh the page in the integrated browser to see the updates.

Review code changes

The Changes panel lists every file the agent created or modified during its session, we can move between the different files with the navigation controls in the title bar.

After committing the changes, the Changes panel is back empty because there are no pending changes. The change stats are also cleared from the entry in the session list.

Start new agent session

We can start a new session to run an agent to add a theme switcher to your portfolio page. The agent applies the changes directly to our files, and we can review them as inline diffs in the editor.

Add a theme switcher button that toggles between a light and dark color theme for the page.

We can ask the agent to preview the page and validate the new feature itself in the browser.

The agent can iterate on its changes based on what it sees in the browser. Enter the following prompt and press Enter:

Verify that the theme switcher works correctly and review the design aligns with the rest of the page. If there are any issues, fix them.

The agent asks to approve opening the integrated browser. Select Allow in this session to let the agent access the browser for previewing and validating its changes.

The agent will ask to approve opening the integrated browser. Select Allow in this session to let the agent access the browser for previewing and validating its changes.

Configure agent session

There are many building blocks combine each time we send a request in an agent session.

A language model does the reasoning in the agent session. To respond usefully, it needs context:

• VS Code assembles the relevant files, conversation history, and other information and sends it to the model.
• The language model calls tools to read and edit files, run commands, or reach external services to act on our environment.

An agent ties these together in the agent loop, calling tools and feeding the results back to the model until the task is complete.

Choose the right model

A language model processes text input (a "prompt") and generates text output by default, note that language models are trained on data up to a certain date and might produce outdated or incorrect information for topics beyond their training data.

Capabilities are model-dependent and might differ from the built-in models, for example, support for tool calling, vision, or thinking.

• The prompt is assembled from multiple sources:

1. Chat message
2. Conversation history
3. File contents
4. Tool outputs
5. Custom instructions

• The model generates responses that can include explanations, code edits, or requests to call tools.

Language models actually don't execute code or access files directly. Instead, they generate text that the agent loop interprets as actions. When a model requests a tool call, The agent host(VS Code, etc.) executes the tool and feeds the result back to the model for the next iteration.

• The same prompt can produce different results each time.
• The quality of the response depends on the quality and relevance of the context provided in the prompt
• The agent needs to mitigate incorrect information with tools and workspace indexing.

The context window is the total amount of information a model can process in a single request. It includes everything:

• System prompt
• Custom instructions
• Conversation history
• File contents
• Tool outputs
• Current chat message

Different models have different context window sizes.

• When the context window fills up, we need to summarizes older parts of the conversation to make room.
• Important details from early in a long conversation might be compressed or lost.

We can also type /compact in the chat input to manually trigger compaction at any time. Optionally, add custom instructions after the command to guide the summary, for example /compactfocus on the API design decisions.

Some language models can perform extended reasoning, also called "thinking", before producing a response.

Instead of generating an answer immediately, a reasoning model first works through the problem internally, considering multiple approaches, evaluating trade-offs, and building a step-by-step chain of thought. This internal reasoning happens in dedicated thinking tokens that are separate from the final output.

Reasoning models are especially effective for complex tasks like multi-step debugging, architectural planning, code refactoring, and mathematical or scientific analysis.

For simpler tasks like generating boilerplate or answering basic questions, the extra reasoning adds latency without significant benefit.

Thinking effort controls how much reasoning a model applies to each request.

• Higher effort levels produce more thorough internal reasoning, which improves quality for complex problems.
• Lower effort levels reduce latency and token usage by limiting or skipping the thinking step.

The available effort levels and their default values vary by model and provider. Some models also support adaptive thinking, where the model dynamically decides whether and how much to reason based on the complexity of each request, rather than always using a fixed thinking budget.

VS Code sets default effort levels based on evaluations and online performance data, and has adaptive reasoning enabled where supported. For most use cases, the defaults work well without changes.

Thinking tokens count toward the model's context window, even though they are not visible in the response. The actual thinking output is typically returned in summarized form or can be omitted entirely for lower latency.

• Higher thinking effort levels can produce more thinking tokens, which can increase latency.

Each model has different strengths. Some are optimized for speed and work well for quick edits and simple questions. Others have larger context windows or better reasoning capabilities, making them ideal for complex tasks.

• Fast models are best for quick code edits, boilerplate generation, and straightforward questions.
• Reasoning models excel at complex refactoring, architectural decisions, multi-step debugging, and tasks that require analyzing trade-offs.
• Large context models work well for large codebases or long conversations where retaining more information matters

Auto model selection combines two systems to route each request to the optimal model.

• One system tracks real-time model health and availability.
• The other evaluates task complexity.

They match each task to the model that can solve it most efficiently, reserving higher-cost reasoning models for problems that need them and routing simpler tasks to faster models.

Other factors also affect credit consumption, such as thinking effort (higher effort produces more thinking tokens), context window size, and tool usage.