I was mass-sending everything to GPT-4. Here's what I changed.
I'm a solo dev from Argentina building AI tools. For months I was doing what most of us do — every API call went straight to GPT-4 (now GPT-4o). Summarization? GPT-4. Formatting a JSON? GPT-4. Answ...

Source: DEV Community
I'm a solo dev from Argentina building AI tools. For months I was doing what most of us do — every API call went straight to GPT-4 (now GPT-4o). Summarization? GPT-4. Formatting a JSON? GPT-4. Answering "what's 2+2"? You guessed it. Then I looked at my bill and did some math. The numbers that made me stop Here's what the main LLM providers charge per 1M tokens right now: Model Input Output GPT-4o $2.50 $10.00 GPT-4o-mini $0.15 $0.60 Llama 3.1 8B (via Groq) $0.05 $0.05 Claude 3.5 Sonnet $3.00 $15.00 Look at that gap between GPT-4o and Llama. That's a 50x price difference on input tokens. And here's the thing — for probably 70% of what I was sending to GPT-4o, Llama would've given me the same answer. What I tried first The obvious solution: just add some if/else logic. pythonif is_simple(prompt): model = "llama-3.1-8b-instant" else: model = "gpt-4o" Sounds easy. It's not. What's "simple"? How do you define that? Token count? Keywords? And then you need different API clients for OpenAI vs