Create a Dataset

Once you’re logged into the platform, you can create a dataset. Navigate to the dataset page by either clicking on the “Create a Dataset” button or the datasets tab.

Give the dataset a name, and choose either the “Chat” option or the “Text Completion” option.

You can use Chat when you want a model you can chat with, and text completion when you want the model to generate more text similar to what you’ve already given it.

Then click “create” and you’ll see a screen that looks like this.

Upload Data to the Dataset

Next, you want to click “Upload File”, and you’ll have the option to upload a file that your model will learn from. We currently support .csv, .jsonl, and .txt files up to several gigabytes.

It’s imperative to get this part correct to have a model with the best results. If you chose the “Chat” option earlier, you’ll want your data to be in the format of question & answer pairs.

Some sample files for each file format we support can be downloaded on the platform. You can model your files after these.

Once your file is uploaded, you’ll get an option to choose the input and output columns. The input column is the term used to define the question (for example: human), and the output is the answer (for example: assistant).

After choosing the input and output columns, click upload.

Now that your data is uploaded, you’ll see your I/O pairs on the right. If you need to add another Question & Answer (Input/Output) pair, you can do so by clicking the “Create I/O pair” button.

You’ve now completed the data uploading process, and can begin fine-tuning your model.