Synthetic data (data from some kind of generative AI) has been used in some form or another for quite some time[0]. The license for LLaMA 3.1 has been updated to specifically allow its use for generation of synthetic training data. Famously, there is a ToS clause from OpenAI in terms of using them for data generation for other models but it's not enforced ATM. It's pretty common/typical to look through a model card, paper, etc and see the use of an LLM or other generative AI for some form of synthetic data generation in the development process - various stages of data prep, training, evaluation, etc.
Phi is another really good example but that's already covered from the article.
Phi is another really good example but that's already covered from the article.
[0] - https://www.latent.space/i/146879553/synthetic-data-is-all-y...