SiliconFlow - AI Inference for Everyone!

Models

Model	Description
FLUX.1 Kontext [pro]	...
FLUX.1 Kontext [max]	...
FLUX 1.1 [pro]	...
Ultra	...
FLUX.1 Kontext [pro]	...
FLUX.1 Kontext [max]	...
FLUX 1.1 [pro]	...
Ultra	...
Wan2.1-T2V-14B (Turbo)	...
Wan2.1-T2V-14B	...
Wan2.1-I2V-14B-720P (Turbo)	...
Wan2.1-T2V-14B (Turbo)	...
Wan2.1-T2V-14B	...
Wan2.1-I2V-14B-720P (Turbo)	...
Wan2.1-T2V-14B (Turbo)	...
Wan2.1-T2V-14B	...
Wan2.1-I2V-14B-720P (Turbo)	...

MULTIMODAL

High-Speed Inference for Image, Video, and Beyond

From image generation to visual understanding, our platform accelerates multimodal models with unmatched performance.

Get Started

MULTIMODAL

High-Speed Inference for Image, Video, and Beyond

From image generation to visual understanding, our platform accelerates multimodal models with unmatched performance.

Get Started

MULTIMODAL

High-Speed Inference for Image, Video, and Beyond

From image generation to visual understanding, our platform accelerates multimodal models with unmatched performance.

Get Started

LLMs

Run Powerful LLMs Faster, Smarter, at Any Scale

Serve open and commercial LLMs through our optimized stack. Lower latency, higher throughput, and predictable costs.

LLMs

Run Powerful LLMs Faster, Smarter, at Any Scale

Serve open and commercial LLMs through our optimized stack. Lower latency, higher throughput, and predictable costs.

LLMs

Run Powerful LLMs Faster, Smarter, at Any Scale

Serve open and commercial LLMs through our optimized stack. Lower latency, higher throughput, and predictable costs.

Model
DeepSeek-R1
DeepSeek-R1
DeepSeek-V3
DeepSeek-V3
GLM-4.5
GLM-4.5
GLM-4.5-Air
GLM-4.5-Air
Qwen3-235B-A2
Qwen3-235B-A2
Qwen3-235B-2507
Qwen3-235B-2507
Kimi-K2-Instruct
Kimi-K2-Instruct
GLM-4.1V-9B
GLM-4.1V-9B
ERNIE-4.5-300B
ERNIE-4.5-300B
Hunyuan-A13B-Instruct
Hunyuan-A13B-Instruct
MiniMax-M1-80k
MiniMax-M1-80k
Qwen3-30B-A3B
Qwen3-30B-A3B
Qwen3-32B
Qwen3-32B
Qwen3-14B
Qwen3-14B
Qwen3-8B
Qwen3-8B
Qwen3-Reranker-8B
Qwen3-Reranker-8B
Qwen3-Embedding-8B
Qwen3-Embedding-8B
Qwen3-Reranker-4B
Qwen3-Reranker-4B
Qwen3-Embedding-4B
Qwen3-Embedding-4B
Qwen3-Reranker-0.6B
Qwen3-Reranker-0.6B

Explore More

products

Flexible Deployment Options, Built for Every Use Case

Run models serverlessly, on dedicated endpoints, or bring your own setup.

Get Started

products

Flexible Deployment Options, Built for Every Use Case

Run models serverlessly, on dedicated endpoints, or bring your own setup.

Get Started

products

Flexible Deployment Options, Built for Every Use Case

Run models serverlessly, on dedicated endpoints, or bring your own setup.

Get Started

Serverless

Run any model instantly â no setup, no scaling headaches. Just call the API and pay only for what you use.

Learn More

Fine-tuning

Easily adapt base models to your data. Fine-tune with built-in monitoring and elastic compute, without managing infrastructure.

Learn More

Reserved GPUs

Lock in GPU capacity for stable performance and predictable billing. Ideal for high-volume or scheduled inference jobs.

Learn More

advantage

Built for What Developers Really Care About

Speed, accuracy, reliability, and fair pricingâno trade-offs.

Speed
Blazing-fast inference for both language and multimodal models.

Flexibility
Serverless, dedicated, or customârun models your way.

Efficiency
Higher throughput, lower latency, and better price.

Privacy
No data stored, ever. Your models stay yours.

Control
Fine-tune, deploy, and scale your models your wayâno infrastructure headaches, no lock-in.

Simplicity
One API for all models, fully OpenAI-compatible.

FAQ

Frequently asked questions

What types of models can I deploy on your platform?

How does your pricing structure work?

Can I customize the models to fit my specific needs?

What kind of support do you offer for developers?

How do you ensure the performance and reliability of your APIs?

Is your platform compatible with OpenAI standards?

FAQ

Frequently asked questions

What types of models can I deploy on your platform?

How does your pricing structure work?

Can I customize the models to fit my specific needs?

What kind of support do you offer for developers?

How do you ensure the performance and reliability of your APIs?

Is your platform compatible with OpenAI standards?

LLMs

Built for What Developers Really Care About

Speed, accuracy, reliability, and fair pricingâno trade-offs.

Speed
Blazing-fast inference for both language and multimodal models.

Flexibility
Serverless, dedicated, or customârun models your way.

Efficiency
Higher throughput, lower latency, and better price.

Privacy
No data stored, ever. Your models stay yours.

Dev-Ready
SDKs, observability, scalingâall out of the box.

Simplicity
One API for all models, fully OpenAI-compatible.

FAQ

Frequently asked questions

What types of models can I deploy on your platform?

How does your pricing structure work?

Can I customize the models to fit my specific needs?

What kind of support do you offer for developers?

How do you ensure the performance and reliability of your APIs?

Is your platform compatible with OpenAI standards?

LLMs

Built for What Developers Really Care About

Speed, accuracy, reliability, and fair pricingâno trade-offs.

Speed
Blazing-fast inference for both language and multimodal models.

Flexibility
Serverless, dedicated, or customârun models your way.

Efficiency
Higher throughput, lower latency, and better price.

Privacy
No data stored, ever. Your models stay yours.

Dev-Ready
SDKs, observability, scalingâall out of the box.

Simplicity
One API for all models, fully OpenAI-compatible.

ð GLM-4.5 on SiliconFlowOne Platform ð

Models

MULTIMODAL

MULTIMODAL

MULTIMODAL

LLMs

LLMs

LLMs

products

products

products

Serverless

Fine-tuning

Reserved GPUs

advantage

FAQ

FAQ

LLMs

FAQ

LLMs

ð GLM-4.5 on SiliconFlowOne Platform ð