I think (slash worry) that this is going to be a simple upgrade in future iterations. Obviously there are powerful attention mechanisms at work to keep the subject matter coherent, but we’re not too far off the model being able to generate a core thesis, and ensure that all the generated text supports that in some way.
I think that if that worked it would prove that either language is a much more powerful tool than we realize, or our cognitive capacities are much more trivial than we realize.
The model fundamentally has no understanding of the world, so if it can successfully argue about a central thesis without simply selecting pre-existing fragments, then it would suggest that the statistical relations between words capture directly our reasoning about the world.