OpenAI flag product – GPT-3 – was known to have problems with toxic and unwanted pieces of text. Unfortunately, systems like GPT-3 are being trained on the content found internet. That includes the misinformation, hate and toxic messages that people put out there. The result is that text generators may produce pieces that are biased, misogynistic, racist, or harmful in other ways. This is a problem not only for language models but also for chatbots, translators and other tool to generate text.
To deal with these issues, OpenAI has built a newer version that is called InstructGPT. It is said to be better at following users’ instructions as well as better adjustment of the content to the user. It is also said that InstructGPT does not make up facts as often as the previous version. As the diagrams below show, InstructGPT is showing a slight decrease in toxic outputs and improvements in ‘truthful’ area and is also more appropriate for the users.


To create an updated model the company added another layer of testing to their existing GPT-3. They used reinforcement learning to teach the model which pieces they should and should not include in the generated output. Because on the internet there is less content form minority groups, they could not train the model from scratch as it will result in a worse performing algorithm.
OpenAI however did not get rid of GPT- 3, it is still available to users, but it is not being advised to use. Also, InstructGPT is a default model for users.
Sources: