My setup is a bit of a mess as I experiment with different ways of configuring and hosting local models. So at some point I was experimenting with the router server but stopped doing that, but some of my settings are still in models.ini while some are on the command line.
With the following as the relevant settings in models.ini (I actually have no idea if these settings are applied when not using the router server, it's been hard for me to figure out what settings are actually applied when using bot the command line and models.ini
[*]
jinja = true
seed = 3407
flash-attn = on
[unsloth/gemma-4-31B-it-GGUF:UD-Q8_K_XL]
temperature = 1.0
top_p = 0.95
top_k = 64
As my harness, I'm using pi, with a pretty vanilla config.
Anyhow, Gemms 4 31b worked in this config, but it was slow and RAM hungry. Since then, I've mostly moved to Qwen 3.6 35b-a3b because it's a lot faster.
I'm not actually doing anything useful with these yet, but I've used them for some experiments and Qwen 3.6 35b-a3b was capable of doing some pretty long mostly unsupervised agentic loops in my experimentation.
Can you share your switches and approach for using tools?