Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not the parameters that are sent, it's the layer outputs. That makes for a few thousands floats per token


Woops! I would have thought the number of neurons roughly equals the number of parameters, but you are right. The number of parameters is much higher.


The embedding size is only 8k so while the parameters are 70B. So it's a huge difference




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: