This article provides detailed guidance on OpenAI API 429 errors (TPM/RPM limits), implementing retry with exponential backoff, and multi-API-key rotation for building robust LLM applications.
OpenAI API Rate Limits restrict the number of requests and tokens per time period. When exceeded, the API returns 429 errors. This article covers error causes, troubleshooting, and retry strategies.
| Type | Description | Org Limits |
|---|---|---|
| RPM | Requests per Minute | Usually 200-500 |
| TPM | Tokens per Minute | Usually 60K-120K |
| RPD | Requests per Day | By subscription |
import openai
try:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
except openai.error.RateLimitError as e:
print(f"Rate Limit Error: {e}")
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=2, max=60)
)
def call_with_retry(messages):
return openai.ChatCompletion.create(
model="gpt-4",
messages=messages
)
API_KEYS = ["key1", "key2", "key3"]
current_key_index = 0
def call_with_rotation(messages):
global current_key_index
for _ in range(len(API_KEYS)):
openai.api_key = API_KEYS[current_key_index]
try:
return openai.ChatCompletion.create(messages=messages)
except openai.error.RateLimitError:
current_key_index = (current_key_index + 1) % len(API_KEYS)
raise Exception("All keys exhausted")
Auto-repair applied, but unresolved findings remain.
代码示例可执行
重试逻辑代码验证通过