Models
In the Cascade panel Ctrl/⌘ + L
, you can easily switch between different models of your choosing.
Depending on the model you select, each of your input prompts will consume a different number of prompt credits.
Under the text input box, you will see a model selection dropdown menu. You will see the following models available:
Model | Prompt credits | Free | Pro/Trial | Teams | Enterprise | Images? |
---|---|---|---|---|---|---|
SWE-1 | 01 | ✓ | ✓ | ✓ | ✓ | true |
SWE-1-lite | 0 | ✓ | ✓ | ✓ | ✓ | |
GPT-4o | 1 | ✓ | ✓ | ✓ | ✓ | true |
GPT-4.1 | 0.251 | ✓ | ✓ | ✓ | ✓ | true |
o3 | 1 | ✓ | ✓ | ✓ | ✓ | ✓ |
o3 (high reasoning) | 1 | ✓ | ✓ | ✓ | ✓ | ✓ |
o3-mini (medium reasoning) | 1 | ✓ | ✓ | ✓ | ||
o4-mini (medium reasoning) | 0.251 | ✓ | ✓ | ✓ | ✓ | true |
o4-mini (high reasoning) | 0.251 | ✓ | ✓ | ✓ | ✓ | true |
Claude 3.5 Sonnet | 1 | 🗝 | ✓ Pro 🗝 Trial | ✓ | ✓ | true |
Claude 3.7 Sonnet | 1 | 🗝 | ✓ Pro 🗝 Trial | ✓ | ✓ | true |
Claude 3.7 Sonnet (Thinking) | 1.25 | 🗝 | ✓ Pro 🗝 Trial | ✓ | ✓ | true |
Claude Sonnet 4 | 🗝 or 🪙 | 🗝 | 🗝 or 🪙 | 🪙 | 🪙 | true |
Claude Sonnet 4 (Thinking) | 🗝 or 🪙 | 🗝 | 🗝 or 🪙 | 🪙 | 🪙 | true |
Claude 4 Opus | 🗝 | 🗝 | 🗝 | true | ||
Claude 4 Opus (Thinking) | 🗝 | 🗝 | 🗝 | true | ||
DeepSeek-V3-0324 | 0 | ✓ | ||||
DeepSeek-R1 | 0.5 | ✓ | ||||
Gemini 2.0 Flash | 0.25 | ✓ | ✓ | ✓ | ||
Gemini 2.5 Flash | 0.1 | ✓ | ✓ | ✓ | true | |
Gemini 2.5 Flash (Thinking) | 0.15 | ✓ | ✓ | ✓ | true | |
Gemini 2.5 Pro | 0.751 | ✓ | ✓ | ✓ | ✓ | true |
xAI Grok-3 | 1 | ✓ | ||||
xAI Grok-3 mini (Thinking) | 0.125 | ✓ |
🗝 Available via BYOK
🪙 Available via API Pricing
1 Promo pricing only available for a limited time
SWE-1
SWE-1 is our family of in-house frontier models built specifically for software engineering tasks.
Based on our internal evals, it has performance nearing that of frontier models from the foundation labs.
- SWE-1: High-reasoning, tool-capable, and Cascade-optimized. Claude 3.5-level performance at a fraction of the cost.
- : Replaces Cascade Base — faster, better, and free for all.
- SWE-1-mini: Powers passive suggestions in Windsurf Tab, optimized for real-time latency.
Bring your own key (BYOK)
For certain models, we allow users to bring their own API keys. In the model dropdown menu, individual users will see models labled with BYOK
.
Note that this is different from API Pricing.
To input your API key, navigate to this page in the subscription settings and add your key.
If you have not configured your API key, it will return an error if you try to use the BYOK model.
Currently, we only support BYOK for these models:
Claude 4 Sonnet
Claude 4 Sonnet (Thinking)
Claude 4 Opus
Claude 4 Opus (Thinking)
API Pricing
Unlike flat rate pricing, where a fixed number of credits are used for each user prompt, API pricing charges a fixed number of credits per token processed (i.e. proportional to compute). The number of credits per token processed is variable based on the model selected.
Both API pricing and flat rate pricing consume the same pool of credits. Models with API pricing are clearly marked in the model selector.
We charge the model’s API price plus a 20% margin. Each credit corresponds to $0.04.
API pricing charges for all tokens processed from your prompts, automatically retrieved context, tool calls, past conversation history, etc. Since token processing costs from model providers differ between input and output tokens, cached or non-cached, the number of credits consumed for the same overall number of tokens can differ given different distributions of these token types. Reasoning tokens from “thinking” models are charged as output tokens. Windsurf balances the context length with costs for prompt cache reads by summarizing the conversation when it gets too long; the resulting summary would be charged as non-cache input tokens. Fractional credit usage is rounded up to the nearest hundredth of a credit.
Note that API pricing is separate from BYOK (bring-your-own-key). With BYOK all usage goes through your own API key, so Windsurf does not charge any credits.
Here is the pricing for models that are available via API pricing in various plans:
Model | Plans with API Pricing | Input Tokens (Credits / Million Tokens)1 | Cache Read Tokens (Credits / Million Tokens)2 | Output Tokens (Credits / Million Tokens) |
---|---|---|---|---|
Claude Sonnet 4 | - Pro - Teams - Enterprise (contracted) - Enterprise (self-serve) | 90 | 9 | 450 |
Claude Sonnet 4 (Thinking) | - Pro - Teams - Enterprise (contracted) - Enterprise (self-serve) | 90 | 9 | 450 |
2 The prompt cache has a limited TTL (time-to-live) determined by the model provider (eg. approximately 5 minutes on Anthropic). Even within the TTL, the prompt cache is not guaranteed to hit. Prompt cache misses are charged as input tokens.
Example Conversation
To show how API pricing works in practice, let us walk through an example conversation with Cascade using Claude Sonnet 4 directly.
Role | Message | Tokens | Note | Cost per message |
---|---|---|---|---|
User | Refactor @my_function | 20k | Input (cache write). Note: Incl. full shared timeline, editor context & system prompt. | 2.25 Credits |
Windsurf | Let me first analyze my_function to come up with a plan to refactor it. | 1k | Output tokens. | 0.45 Credits |
tool_call | Analyze my_function | 23k | Input (cache read) + Input (cache write). | 0.42 Credits |
Windsurf | Here is a plan to refactor my_function […] do you want me to continue with implementing? | 2k | Output tokens. | 0.90 Credits |
User | Yes, continue. | 46k | Input (cache read) + Input (cache write). | 0.52 Credits |
tool_call | Edit foo.py | 50k | Input (cache read) + Output tokens. | 2.22 Credits |
tool_call | Add bar.py | 56k | Input (cache read) + Output tokens. | 3.15 Credits |
Windsurf | I am done refactoring my_function. Here is a summary of my changes: […] | 2k | Output tokens. | 0.90 Credits |
Total | 200k | 10.81 Credits |