Skip to main content

Completions Response Format

Here’s the response schema as a TypeScript type:
TypeScript
// Definitions of subtypes are below
type Response = {
  id: string;
  // Depending on whether you set "stream" to "true" and
  // whether you passed in "messages" or a "prompt", you
  // will get a different output shape
  choices: (NonStreamingChoice | StreamingChoice | NonChatChoice)[];
  created: number; // Unix timestamp
  model: string;
  object: 'chat.completion' | 'chat.completion.chunk';

  system_fingerprint?: string; // Only present if the provider supports it

  // Usage data is always returned for non-streaming.
  // When streaming, you will get one usage object at
  // the end accompanied by an empty choices array.
  usage?: ResponseUsage;
};
// If the provider returns usage, we pass it down
// as-is. Otherwise, we count using the GPT-4 tokenizer.

type ResponseUsage = {
  /** Including images and tools if any */
  prompt_tokens: number;
  /** The tokens generated */
  completion_tokens: number;
  /** Sum of the above two fields */
  total_tokens: number;
  /** Detailed breakdown of completion tokens */
  completion_tokens_details?: {
    accepted_prediction_tokens?: number | null;
    audio_tokens?: number | null;
    reasoning_tokens?: number; // Tokens used for reasoning (for models like o1, o3)
    rejected_prediction_tokens?: number | null;
    image_tokens?: number;
  };
  /** Detailed breakdown of prompt tokens */
  prompt_tokens_details?: {
    audio_tokens?: number | null;
    cached_tokens?: number;
  };
  /** Total cost of the request */
  cost?: number;
  /** Whether the request used Bring Your Own Key */
  is_byok?: boolean;
  /** Detailed cost breakdown */
  cost_details?: {
    upstream_inference_cost?: number;
    upstream_inference_prompt_cost?: number;
    upstream_inference_completions_cost?: number;
  };
};
// Subtypes:
type NonChatChoice = {
  finish_reason: string | null;
  text: string;
  error?: ErrorResponse;
};

type NonStreamingChoice = {
  finish_reason: string | null;
  native_finish_reason: string | null;
  message: {
    content: string | null;
    role: string;
    tool_calls?: ToolCall[];
  };
  error?: ErrorResponse;
};

type StreamingChoice = {
  finish_reason: string | null;
  native_finish_reason: string | null;
  delta: {
    content: string | null;
    role?: string;
    tool_calls?: ToolCall[];
  };
  error?: ErrorResponse;
};

type ErrorResponse = {
  code: number; // See "Error Handling" section
  message: string;
  metadata?: Record<string, unknown>; // Contains additional error information such as provider details, the raw error message, etc.
};

type ToolCall = {
  id: string;
  type: 'function';
  function: FunctionCall;
};
Here’s an example response:
{
  "id": "gen-1770283226-LLYbKoYDuMJMdKzmarHz",
  "created": 1770283226,
  "model": "o3-mini",
  "object": "chat.completion",
  "system_fingerprint": null,
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Building the world's tallest skyscraper is an immensely complex undertaking...",
        "role": "assistant",
        "tool_calls": null,
        "function_call": null
      },
      "provider_specific_fields": {
        "native_finish_reason": "completed"
      }
    }
  ],
  "usage": {
    "completion_tokens": 1671,
    "prompt_tokens": 16,
    "total_tokens": 1687,
    "completion_tokens_details": {
      "accepted_prediction_tokens": null,
      "audio_tokens": null,
      "reasoning_tokens": 512,
      "rejected_prediction_tokens": null,
      "image_tokens": 0
    },
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": 0
    },
    "cost": 0.00737,
    "is_byok": false,
    "cost_details": {
      "upstream_inference_cost": 0.00737,
      "upstream_inference_prompt_cost": 1.76e-05,
      "upstream_inference_completions_cost": 0.0073524
    }
  },
  "provider": "OpenAI"
}
When using models that support reasoning (like OpenAI o1, o3 series), the reasoning_tokens field in completion_tokens_details shows how many tokens were used for the model’s internal reasoning process. These tokens are part of the completion_tokens count and contribute to the overall cost.

Finish Reason

Some models and providers may have additional finish reasons. The raw finish_reason string returned by the model is available via the native_finish_reason property.