Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I recently experimented with running llama-3.1-8b-instruct locally on my Consumer hardware, aka my Nvidia RTX 4060 with 8GB VRAM, as I wanted to experiment with prompting pdfs with a large context which is extremely expensive with how LLMs are priced.

I was able to fit the model with decent speeds (30 tokens/seconds) and a 20k token context completely on the GPU.

For summarization, the performance of these models are decent enough. However unfortunately in my use case I felt using Gemini's Free Tier with it's multimodal capabilities and much better quality output made running local LLMs not really worth it as of right now, atleast for consumers.



you moved the goalposts when you add 'multimodal' there; another item is, no one reads PDF tables and illustrations perfectly, at any price AFAIK


Supposedly submitting screenshots of pdfs (at a large enough zoom per tile/page) to OpenAI gtp4o or Google’s whatever is currently the best way of handling charts and tables.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: