Chat Inference
This endpoint is used to infer a model
Body
The user to give the model.
The system prompt to give the model.
The history to give the model. This is optional.
The maximum number of tokens to generate. Defaults to 256 tokens.
The top k tokens to sample from. Defaults to 50 tokens.
The top p sampling parameter. Defaults to 0.9.
The temperature sampling parameter, 0 being greedy sampling. Defaults to 1.0.
The presence penalty parameter. Defaults to 1.0.
The frequency penalty parameter. Defaults to 1.0.
Whether to stream the response or not. Defaults to false.
The id of the model to use for inference.
Response
The id of the completion.
The type of the completion. Always “chat.completion”.
The unix timestamp of the completion.
The completion choices. Each choice is a JSON object with the fields “role” (either “assistant” or “user”) and “content” (the text of the choice).