In Cascade, you can easily switch between different models of your choosing.

Depending on the model you select, each of your input prompts will consume a different number of prompt credits.

Under the text input box, you will see a model selection dropdown menu containing the following models:

SWE-1

SWE-1 is our family of in-house frontier models built specifically for software engineering tasks.

Based on our internal evals, it has performance nearing that of frontier models from the foundation labs.

SWE-1: High-reasoning, tool-capable, and Cascade-optimized. Claude 3.5-level performance at a fraction of the cost.
: Replaces Cascade Base — faster, better, and free for all.
SWE-1-mini: Powers passive suggestions in Windsurf Tab, optimized for real-time latency.

Bring your own key (BYOK)

This is only available to free and paid individual users.

For certain models, we allow users to bring their own API keys. In the model dropdown menu, individual users will see models labled with BYOK.

Note that this is different from API Pricing.

To input your API key, navigate to this page in the subscription settings and add your key.

If you have not configured your API key, it will return an error if you try to use the BYOK model.

Currently, we only support BYOK for these models:

Claude 4 Sonnet
Claude 4 Sonnet (Thinking)
Claude 4 Opus
Claude 4 Opus (Thinking)

API Pricing

Unlike flat rate pricing, where a fixed number of credits are used for each user prompt, API pricing charges a fixed number of credits per token processed (i.e. proportional to compute). The number of credits per token processed is variable based on the model selected.

Both API pricing and flat rate pricing consume the same pool of credits. Models with API pricing are clearly marked in the model selector.

We charge the model’s API price plus a 20% margin. Each credit corresponds to $0.04.

We utilize the same tokenizers as the model providers (Anthropic’s for Claude models, OpenAI’s for GPT models, etc) to ensure accurate and consistent token counting and pricing. View OpenAI’s tokenizer demo

API pricing charges for all tokens processed from your prompts, automatically retrieved context, tool calls, past conversation history, etc. Since token processing costs from model providers differ between input and output tokens, cached or non-cached, the number of credits consumed for the same overall number of tokens can differ given different distributions of these token types. Reasoning tokens from “thinking” models are charged as output tokens. Windsurf balances the context length with costs for prompt cache reads by summarizing the conversation when it gets too long; the resulting summary would be charged as non-cache input tokens. Fractional credit usage is rounded up to the nearest hundredth of a credit.

Note that API pricing is separate from BYOK (bring-your-own-key). With BYOK all usage goes through your own API key, so Windsurf does not charge any credits.

Here is the pricing for models that are available via API pricing in various plans:

Model	Plans with API Pricing	Input Tokens (Credits / Million Tokens)¹	Cache Read Tokens (Credits / Million Tokens)²	Output Tokens (Credits / Million Tokens)
Claude Sonnet 4	- Pro - Teams - Enterprise (contracted) - Enterprise (self-serve)	90	9	450
Claude Sonnet 4 (Thinking)	- Pro - Teams - Enterprise (contracted) - Enterprise (self-serve)	90	9	450

_{¹ For Anthropic models, prompt cache writes cost an extra 25% compared to uncached input. Most input tokens into Windsurf are written into the prompt cache for subsequent steps. Output tokens are written into the prompt cache for no extra cost. See Anthropic’s API pricing for details.}
_{² The prompt cache has a limited TTL (time-to-live) determined by the model provider (eg. approximately 5 minutes on Anthropic). Even within the TTL, the prompt cache is not guaranteed to hit. Prompt cache misses are charged as input tokens.}

Example Conversation

To show how API pricing works in practice, let us walk through an example conversation with Cascade using Claude Sonnet 4 directly.

Role	Message	Tokens	Note	Cost per message
User	Refactor @my_function	20k	Input (cache write). Note: Incl. full shared timeline, editor context & system prompt.	2.25 Credits
Windsurf	Let me first analyze my_function to come up with a plan to refactor it.	1k	Output tokens.	0.45 Credits
`tool_call`	Analyze my_function	23k	Input (cache read) + Input (cache write).	0.42 Credits
Windsurf	Here is a plan to refactor my_function […] do you want me to continue with implementing?	2k	Output tokens.	0.90 Credits
User	Yes, continue.	46k	Input (cache read) + Input (cache write).	0.52 Credits
`tool_call`	Edit foo.py	50k	Input (cache read) + Output tokens.	2.22 Credits
`tool_call`	Add bar.py	56k	Input (cache read) + Output tokens.	3.15 Credits
Windsurf	I am done refactoring my_function. Here is a summary of my changes: […]	2k	Output tokens.	0.90 Credits
Total		200k		10.81 Credits

Get Started

Features

Cascade (JetBrains)

Accounts

Context Awareness

Best Practices

Troubleshooting

Security

Models

SWE-1

Bring your own key (BYOK)

API Pricing

Example Conversation

Get Started

Features

Cascade (JetBrains)

Accounts

Context Awareness

Best Practices

Troubleshooting

Security

​SWE-1

​Bring your own key (BYOK)

​API Pricing

​Example Conversation

SWE-1

Bring your own key (BYOK)

API Pricing

Example Conversation