• 4 Posts
  • 86 Comments
Joined 9 months ago
cake
Cake day: July 13th, 2024

help-circle




  • in general, you would want something fast (probably something that fits in your GPU/VRAM) so you can get suggestions as fast as you can type. for chat, you’ll probably want the most intelligent/lorgest model you can run, it’s likely fine if it’s running on the CPU/RAM since the quality of an individual answer is more important than the speed in which many small answers can be generated. so, probably qwen for both, but, different sizes/quant for different use cases.