Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Synthetic data (data from some kind of generative AI) has been used in some form or another for quite some time[0]. The license for LLaMA 3.1 has been updated to specifically allow its use for generation of synthetic training data. Famously, there is a ToS clause from OpenAI in terms of using them for data generation for other models but it's not enforced ATM. It's pretty common/typical to look through a model card, paper, etc and see the use of an LLM or other generative AI for some form of synthetic data generation in the development process - various stages of data prep, training, evaluation, etc.

Phi is another really good example but that's already covered from the article.

[0] - https://www.latent.space/i/146879553/synthetic-data-is-all-y...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: