In the Cascade panel Ctrl/⌘ + L, you can easily switch between different models of your choosing.

Depending on the model you select, each of your input prompts will consume a different number of prompt credits.

Under the text input box, you will see a model selection dropdown menu. You will see the following models available:

ModelPrompt creditsFreePro/TrialTeamsEnterpriseImages?
SWE-101true
SWE-1-lite0
GPT-4o1true
GPT-4.10.251true
o31
o3 (high reasoning)1
o3-mini (medium reasoning)1
o4-mini (medium reasoning)0.251true
o4-mini (high reasoning)0.251true
Claude 3.5 Sonnet1🗝✓ Pro
🗝 Trial
true
Claude 3.7 Sonnet1🗝✓ Pro
🗝 Trial
true
Claude 3.7 Sonnet (Thinking)1.25🗝✓ Pro
🗝 Trial
true
Claude Sonnet 4🗝 or 🪙🗝🗝 or 🪙🪙🪙true
Claude Sonnet 4 (Thinking)🗝 or 🪙🗝🗝 or 🪙🪙🪙true
Claude 4 Opus🗝🗝🗝true
Claude 4 Opus (Thinking)🗝🗝🗝true
DeepSeek-V3-03240
DeepSeek-R10.5
Gemini 2.0 Flash0.25
Gemini 2.5 Flash0.1true
Gemini 2.5 Flash (Thinking)0.15true
Gemini 2.5 Pro0.751true
xAI Grok-31
xAI Grok-3 mini (Thinking)0.125
✓ Available via flat credit pricing
🗝 Available via BYOK
🪙 Available via API Pricing
1 Promo pricing only available for a limited time

SWE-1

SWE-1 is our family of in-house frontier models built specifically for software engineering tasks.

Based on our internal evals, it has performance nearing that of frontier models from the foundation labs.

  • SWE-1: High-reasoning, tool-capable, and Cascade-optimized. Claude 3.5-level performance at a fraction of the cost.
  • : Replaces Cascade Base — faster, better, and free for all.
  • SWE-1-mini: Powers passive suggestions in Windsurf Tab, optimized for real-time latency.

Bring your own key (BYOK)

This is only available to free and paid individual users.

For certain models, we allow users to bring their own API keys. In the model dropdown menu, individual users will see models labled with BYOK.

Note that this is different from API Pricing.

To input your API key, navigate to this page in the subscription settings and add your key.

If you have not configured your API key, it will return an error if you try to use the BYOK model.

Currently, we only support BYOK for these models:

  • Claude 4 Sonnet
  • Claude 4 Sonnet (Thinking)
  • Claude 4 Opus
  • Claude 4 Opus (Thinking)

API Pricing

Unlike flat rate pricing, where a fixed number of credits are used for each user prompt, API pricing charges a fixed number of credits per token processed (i.e. proportional to compute). The number of credits per token processed is variable based on the model selected.

Both API pricing and flat rate pricing consume the same pool of credits. Models with API pricing are clearly marked in the model selector.

We charge the model’s API price plus a 20% margin. Each credit corresponds to $0.04.

We utilize the same tokenizers as the model providers (Anthropic’s for Claude models, OpenAI’s for GPT models, etc) to ensure accurate and consistent token counting and pricing. View OpenAI’s tokenizer demo

API pricing charges for all tokens processed from your prompts, automatically retrieved context, tool calls, past conversation history, etc. Since token processing costs from model providers differ between input and output tokens, cached or non-cached, the number of credits consumed for the same overall number of tokens can differ given different distributions of these token types. Reasoning tokens from “thinking” models are charged as output tokens. Windsurf balances the context length with costs for prompt cache reads by summarizing the conversation when it gets too long; the resulting summary would be charged as non-cache input tokens. Fractional credit usage is rounded up to the nearest hundredth of a credit.

Note that API pricing is separate from BYOK (bring-your-own-key). With BYOK all usage goes through your own API key, so Windsurf does not charge any credits.

Here is the pricing for models that are available via API pricing in various plans:

ModelPlans with API PricingInput Tokens (Credits / Million Tokens)1Cache Read Tokens (Credits / Million Tokens)2Output Tokens (Credits / Million Tokens)
Claude Sonnet 4- Pro
- Teams
- Enterprise (contracted)
- Enterprise (self-serve)
909450
Claude Sonnet 4 (Thinking)- Pro
- Teams
- Enterprise (contracted)
- Enterprise (self-serve)
909450
1 For Anthropic models, prompt cache writes cost an extra 25% compared to uncached input. Most input tokens into Windsurf are written into the prompt cache for subsequent steps. Output tokens are written into the prompt cache for no extra cost. See Anthropic’s API pricing for details.
2 The prompt cache has a limited TTL (time-to-live) determined by the model provider (eg. approximately 5 minutes on Anthropic). Even within the TTL, the prompt cache is not guaranteed to hit. Prompt cache misses are charged as input tokens.

Example Conversation

To show how API pricing works in practice, let us walk through an example conversation with Cascade using Claude Sonnet 4 directly.

RoleMessageTokensNoteCost per message
UserRefactor @my_function20kInput (cache write). Note: Incl. full shared timeline, editor context & system prompt.2.25 Credits
WindsurfLet me first analyze my_function to come up with a plan to refactor it.1kOutput tokens.0.45 Credits
tool_callAnalyze my_function23kInput (cache read) + Input (cache write).0.42 Credits
WindsurfHere is a plan to refactor my_function […] do you want me to continue with implementing?2kOutput tokens.0.90 Credits
UserYes, continue.46kInput (cache read) + Input (cache write).0.52 Credits
tool_callEdit foo.py50kInput (cache read) + Output tokens.2.22 Credits
tool_callAdd bar.py56kInput (cache read) + Output tokens.3.15 Credits
WindsurfI am done refactoring my_function. Here is a summary of my changes: […]2kOutput tokens.0.90 Credits
Total200k10.81 Credits