Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: