In the past month local models have been ramping up in major way meanwhile the namesake providers have upped prices, went offline randomly, and started doing slimier and slimier things.
I really think the future is local compute. Or at least self hosted models.
Is there a library of good tools for LLMs to call? I have to imagine the bot-detection avoidance mechanisms are a major engineering effort and not likely to work out of the box with a simple harness and random local LLM.
Kagi also has an API. People who hate ads are probably the same folk that should be paying for Kagi. That's the sane alternative world where companies respect their users.
Oh, you got me so excited. I've had a Kagi sub for 3 years, but their API is still in closed beta. I guess I could (and should reach out and ask for access).
firecrawl: "if you post content or intellectual property within the Services or give us Feedback about the Services, you hereby grant to us a worldwide, irrevocable, non-exclusive, royalty-free license to use, reproduce, modify, publish, translate and distribute any content that you submit in any form [...] You also grant to us the right to sub-license these rights"
exa: "Query Data is used to improve our products and technology, including by training and fine-tuning models that power our Services"
perplexity: "Perplexity may retain, copy, distribute and otherwise use Search Data for its lawful business purposes, including the improvement and development of products and services."
linkup: "Client grants Linkup a worldwide right to use, reproduce and modify the Client Data, including prompts, for the purposes of providing, maintaining, developing, training"
tavily: "we may use certain portions of your query data to improve our responses to future queries"..."We may share your query data with third-party search index providers (e.g., Google)"
That's not how it works. Whether local or hosted, every modern model has a cutoff date for its training data, and can be leveraged by agents / harnesses / tools to fetch context from the internet or wherever.
Qwen 3.6 which was released this month is a large but still smaller model. Supposedly it's at about sonnet level when configured correctly. It can be run on commodity hardware without purchasing a data center.
https://www.reddit.com/r/LocalLLaMA/comments/1so1533/qwen36_...
Then there are middle size ones which require multiple gpus which are like gpts latest flagships.
It's basically whatever you can afford. Any trash heap laptop can run code auto complete models locally no problem. The rest require some level of investment, an idle gaming pc, or a serious investment
GLM 5.1 and DeepSeek 4 are acceptable, but the cost of hardware and energy cost that depending on your use case you may as well purchase a Tokens. They get useless and stupid rapidilty if you quant enough to run on single 16-24GB GPU style.
I really think the future is local compute. Or at least self hosted models.