Navigating OpenAI's JSON-Structured Outputs: Limitations and Solutions

I recently encountered some limitations and unexpected behaviors while working with OpenAI's JSON-structured outputs. After spending some time troubleshooting, I decided to document these findings for future reference and to help others who might face similar challenges.

JSON Schema Limitations

The first issue I discovered is that OpenAI's implementation doesn't have full support for JSON Schema. There are certain features that simply don't work as expected. You can check the supported schema features in OpenAI's documentation.

Default Values

One specific limitation is the inability to use default values in your JSON Schema or Pydantic model. When you include defaults, the API call fails, even though this is a standard feature in JSON Schema.

I found a helpful blog post by the team at Fractional AI that addresses this issue. They've developed a wrapper implementation that allows you to use your predefined Pydantic models as you normally would. The wrapper converts your models to a JSON schema that's compatible with OpenAI's API and then converts the responses back to your original model format.

This solution worked exceptionally well for my use case, though I needed to make a few tweaks since I was using self-references in my Pydantic models. With some minor adjustments, I got it working perfectly.

additionalProperties: false

Another important consideration is setting additionalProperties: false in all objects within your JSON schema. This prevents the model from including unexpected fields in the response that aren't defined in your schema.

Without this setting, you might get an API call error like this (where YourSchemaName is the name of your Pydantic model or JSON schema):

                    
openai.BadRequestError: Error code: 400 - {
    'error': {
        'message': "Invalid schema for response_format 'YourSchemaName': In context=(), 'additionalProperties' is required to be supplied and to be false.",
        'type': 'invalid_request_error',
        'param': 'response_format',
        'code': None
    }
}

Here's an example of how to properly set this in your schema:

{
    "type": "object",
    "properties": {
        "name": { "type": "string" },
        "age": { "type": "integer" }
    },
    "required": ["name", "age"],
    "additionalProperties": false
}

Debugging Empty Parsed Objects

Another issue that puzzled me for quite some time involved making API calls that appeared successful (returning 200 OK responses) but resulted in empty parsed objects. When debugging, I could see that the response contained JSON data, but when trying to access the parsed attribute of the message, it returned None.

After some debugging, I discovered that my JSON answer didn't fully conform to the JSON schema I had provided. The specific issue was that I hadn't properly specified the 'strict: true' field in my JSON schema when making the OpenAI API call.

response = client.beta.chat.completions.parse(
    messages=messages,
    response_format=response_format,
    **kwargs,
)

# This is the JSON response from the API, not fully conforming to the JSON schema
response = response.choices[0].message.content
# This is the parsed response, but because it doesn't conform to the JSON schema, it's None
parsed_response = response.choices[0].message.parsed

The key takeaway here is to always set strict: true in your JSON schema definition. Without this specification, you might get a successful API response (200 OK) but with no content in the parsed object.

One important side note: if the API call cannot generate a valid answer (i.e., a JSON object that conforms to your schema), you'll receive an explicit error rather than a silent failure. This is actually a helpful feature, as it prevents you from proceeding with invalid data and makes debugging easier.

Best Practices for Working with JSON-Structured Outputs

When working with OpenAI's JSON-structured outputs, there are several important limitations and requirements to keep in mind:

First, be aware of the JSON schema limitations in the OpenAI API. If you have existing Pydantic models or JSON schemas that use default values, you'll need to use a wrapper to adapt these schemas before making the API call, as defaults aren't supported by the API.

Remember to add additionalProperties: false to all the objects within your JSON schema.

Make sure to pass strict: true in your JSON schema definition (this appears in all examples in OpenAI docs, but it's not mentioned explicitly what happens if you don't, i.e. that you'll get a parsed = None response).

Finally, before implementing any solution, verify two important details: first, check that the specific OpenAI model you plan to use supports JSON structured outputs. Second, confirm that you have access to this model in your current usage tier by reviewing the usage tiers documentation. This can save you time and frustration during implementation.

JSON Schema Limitations

Default Values

additionalProperties: false

Debugging Empty Parsed Objects

Best Practices for Working with JSON-Structured Outputs

Tags