Breakdown Flow Generate Soal
📝 Generate Kuis (10 Soal PG)
PG: 300–500tk/soal · Essay: 400–600tk · T/F: 200–350tk
🃏 Generate Flashcard (20 Kartu)
Per kartu: 150–250 tokens (pertanyaan + jawaban)
▶ Generate dari YouTube (10 menit)
📄 Upload Dokumen PDF (10 Halaman)
📓 Generate dari Jurnal Guru
✏️ Generate dari Free Text
Estimasi Cost per Request
Ringkasan Cost per Flow (Model Mid-Tier)
| Flow | Est. Tokens | Claude Sonnet 4.6 | Gemini 2.5 Flash | GPT-5.4 Mini |
|---|---|---|---|---|
| Kuis 10 soal PG | ~5,000 | $0.024 | $0.014 | $0.023 |
| Flashcard 20 kartu | ~4,600 | $0.022 | $0.013 | $0.021 |
| YouTube 10 menit | ~8,500 | $0.041 | $0.023 | $0.039 |
| Dokumen 10 hal | ~10,500 | $0.051 | $0.029 | $0.048 |
| Jurnal Guru | ~7,000 | $0.034 | $0.019 | $0.032 |
| Free Text | ~5,500 | $0.027 | $0.015 | $0.025 |
Knowledge processing → generate 5 soal PG (harga per request)
Skenario sama untuk semua baris: output ~2.000 tokens (5 soal PG + opsi + penjelasan ringkas), prompt + skema JSON ~450 tokens. Kolom Est. input (materi) adalah teks/setara token setelah pemrosesan (RAG chunk, teks jurnal, ringkasan vision gambar, transkrip YouTube, dll.) yang masuk ke LLM. Belum termasuk biaya unduh/transkrip YouTube, storage, atau OCR di luar call model.
| Jenis knowledge / processing | Est. input (materi) | Total est. tokens* | Sonnet 4.6 | Gemini 2.5 Flash | GPT-5.4 Mini |
|---|---|---|---|---|---|
| Chunk teks / RAG (potongan dokumen) | ~2.500 | ~4.950 | ~$0.0389 | ~$0.0059 | ~$0.0112 |
| Jurnal guru (teks 3–5 hal) | ~2.000 | ~4.450 | ~$0.0374 | ~$0.0057 | ~$0.0108 |
| Gambar / diagram (setara caption vision) | ~600 | ~3.050 | ~$0.0332 | ~$0.0053 | ~$0.0098 |
| Link YouTube (cuplikan transkrip ~10 menit) | ~4.000 | ~6.450 | ~$0.0434 | ~$0.0063 | ~$0.0123 |
*Total tokens = input (materi + 450) + output 2.000. Harga dari tarif per juta token bagian 02 (Sonnet $3/$15 in/out, Flash $0,30/$2,50, Mini $0,75/$4,50).
Detail Cost — Generate Kuis 10 Soal PG (1K input / 4K output)
| Provider | Model | Input Cost | Output Cost | Total |
|---|---|---|---|---|
| OpenAI | GPT-5.5 | $0.0050 | $0.1200 | $0.1250 |
| OpenAI | GPT-5.4 | $0.0025 | $0.0600 | $0.0625 |
| OpenAI | GPT-5.4 Mini | $0.0008 | $0.0180 | $0.0188 |
| OpenAI | GPT-5.4 Nano | $0.0002 | $0.0050 | $0.0052 |
| Claude | Opus 4.7 | $0.0050 | $0.1000 | $0.1050 |
| Claude | Sonnet 4.6 | $0.0030 | $0.0600 | $0.0630 |
| Claude | Haiku 4.5 | $0.0010 | $0.0200 | $0.0210 |
| Gemini | 2.5 Pro | $0.0013 | $0.0400 | $0.0413 |
| Gemini | 2.5 Flash | $0.0003 | $0.0100 | $0.0103 |
| Gemini | 2.5 Flash-Lite | $0.0001 | $0.0016 | $0.0017 🏆 |
Perbandingan Provider AI
| Provider | Model | Input $/M | Output $/M | Context | Speed | Quality |
|---|---|---|---|---|---|---|
| OpenAI | GPT-5.5 | $5.00 | $30.00 | 1M | Fast | Excellent |
| OpenAI | GPT-5.4 | $2.50 | $15.00 | 1M | Fast | Excellent |
| OpenAI | GPT-5.4 Mini | $0.75 | $4.50 | 1M | Very Fast | Good |
| OpenAI | GPT-5.4 Nano | $0.20 | $1.25 | 1M | Ultra Fast | Fair |
| Claude | Opus 4.7 | $5.00 | $25.00 | 1M | Medium | Excellent |
| Claude | Sonnet 4.6 | $3.00 | $15.00 | 1M | Fast | Excellent |
| Claude | Haiku 4.5 | $1.00 | $5.00 | 200K | Very Fast | Good |
| Gemini | 3.1 Pro Preview | $2.00 | $12.00 | 2M | Medium | Excellent |
| Gemini | 2.5 Pro | $1.25 | $10.00 | 1M | Medium | Excellent |
| Gemini | 2.5 Flash | $0.30 | $2.50 | 1M | Very Fast | Very Good |
| Gemini | 2.5 Flash-Lite | $0.10 | $0.40 | 1M | Ultra Fast | Good |
Fitur Tambahan
| Feature | OpenAI | Claude | Gemini |
|---|---|---|---|
| Batch API (50% off) | ✅ | ✅ | ✅ |
| Prompt Caching (90% off) | ✅ | ✅ | ✅ |
| Web Search Tool | ✅ $10/1K | ✅ $10/1K | ✅ $14/1K |
| Structured Output | ✅ Native | ✅ Native | ✅ Native |
| Max Context | 1M tokens | 1M tokens | 2M tokens |
Rekomendasi Model untuk Generate Soal
🏆 Best Overall Rekomendasi
Gemini 2.5 Flash
Terbaik untuk production default — price/performance ratio terbaik, 1M context, multimodal support, free tier generous.
💎 Best Premium Quality Premium
Claude Sonnet 4.6
Quality tertinggi untuk educational content — soal essay, pembahasan mendalam, formatting konsisten, safety terbaik.
⚡ Best Ultra-High Volume Budget
Gemini 2.5 Flash-Lite
Cocok untuk flashcard, soal True/False, A/B testing, atau volume 100K+ request/bulan.
✨ Hybrid Strategy Optimal
Mix & Match
Model Selection Flowchart
$0.0017
$0.0630
2M context
$0.0103
Quality/Cost Ratio per Model
Plus Minus Tiap Model
Claude API
- Highest quality reasoning & explanation
- Best untuk complex educational content
- Superior safety & appropriateness
- 1M context window + prompt caching
- Paling mahal ($5/$25 per M tokens)
- Latency lebih tinggi
- Overkill untuk simple questions
- Best balance quality/cost di Claude family
- Excellent untuk educational content
- Consistent, well-formatted output
- 1M context window
- ~3x lebih mahal dari Gemini Flash
- Tidak secepat Flash models
- Very fast response
- Cost-effective dari Claude family
- Good quality untuk price point
- Context hanya 200K
- Quality gap vs Sonnet noticeable
- Masih lebih mahal dari Gemini Flash
Gemini API
- Best price/performance ratio
- Very fast response
- 1M context window
- Multimodal support (PDF, PPT + gambar)
- Batch API 50% + prompt caching
- Quality sedikit di bawah Claude Sonnet
- Function calling tidak serobust Claude
- Formatting kurang konsisten
- Ultra cost-effective — paling murah
- Ultra-fast response time
- Scalable ke millions of requests
- 1M context window
- Quality gap untuk complex questions
- Limited reasoning depth
- Butuh lebih banyak prompt engineering
Optimasi Cost
⚡ Batch Processing
Semua provider (OpenAI, Claude, Gemini) mendukung Batch API. Cocok untuk generate bank soal atau pre-generated content.
🗄 Prompt Caching
Cache system prompts, template soal, dan context dokumen yang sering dipakai. Hit rate 50% bisa hemat signifikan.
🔀 Intelligent Model Routing
Routing otomatis berdasarkan tipe soal dan kompleksitas. Gunakan model paling murah yang cukup untuk task tersebut.
✂️ Output Token Limiting
Set max_tokens=2000 untuk mengontrol output dan menghilangkan verbosity yang tidak perlu.
Total Potential Savings — 10K Requests/Month
| Strategi | Monthly Cost | Cost/Generate | Quality |
|---|---|---|---|
| Claude Opus Only | $1,050 | $0.105 | Excellent |
| Claude Sonnet Only | $630 | $0.063 | Excellent |
| Gemini Flash Only | $103 | $0.0103 | Very Good |
| Hybrid Optimized | $109 | $0.0109 | Excellent |
| Hybrid + Batch + Cache 🏆 | $70 | $0.007 | Excellent |
Rekomendasi Final & Roadmap
Implementasi Bertahap
MVP / Testing
Best value untuk testing dan iterasi cepat. Setup API keys semua provider, bangun prompt templates, ukur actual token usage.
Production — Hybrid Strategy
Scale — Full Optimization
Fine-tune model selection berdasarkan data real, implement advanced caching, pertimbangkan fine-tuning untuk domain spesifik.
Cost Projection by Scale
| Monthly Requests | Gemini Flash | Hybrid | Optimized | Claude Sonnet |
|---|---|---|---|---|
| 1,000 | $10 | $11 | $7 | $63 |
| 5,000 | $52 | $55 | $35 | $315 |
| 10,000 | $103 | $109 | $70 | $630 |
| 50,000 | $515 | $545 | $350 | $3,150 |
| 100,000 | $1,030 | $1,090 | $700 | $6,300 |
Implementasi Checklist
- Setup API keys: OpenAI, Claude, Gemini
- Build model router logic
- Test dengan sample soal setiap tipe
- Measure quality & cost
- Implement Gemini Flash as default
- Build prompt templates per tipe soal
- Setup basic caching
- Deploy untuk internal testing
- Analisis token usage patterns
- Implement intelligent routing
- Setup batch processing
- Fine-tune prompts dari results
- Deploy hybrid strategy
- Setup monitoring & alerts
- Implement cost tracking per feature
- Launch ke beta teachers
Monitoring & KPIs
📊 Cost Metrics
- Cost per generate
- Monthly total cost
- Cost by model & question type
- Cache hit rate & savings
🎓 Quality Metrics
- Teacher satisfaction score
- Edit rate (target <30%)
- Rejection rate
- Student performance on AI Qs
⚡ Performance Metrics
- Response time P50/P95/P99
- Success rate (target >98%)
- Error rate by model
- Tokens per request
Alert Thresholds
alerts: cost: daily_spend: > $50 cost_per_generate: > $0.15 monthly_projection: > $2000 performance: response_time_p95: > 10s error_rate: > 2% quality: rejection_rate: > 20% edit_rate: > 50%
Format JSON Output
Response — Multiple Choice
{ "request_id": "req_abc123", "status": "success", "model_used": "gemini-2.5-flash", "usage": { "input_tokens": 6500, "output_tokens": 4200, "total_tokens": 10700, "cost_usd": 0.0125 }, "questions": [ { "id": "q1", "type": "multiple_choice", "points": 10, "question": { "text": "...", "latex": "\\int_{0}^{2}..." }, "options": [ { "key": "a", "value": "..." }, { "key": "b", "value": "..." }, … ], "correct_answer": "b", "explanation": { "text": "...", "steps": [ ... ] }, "bloom_taxonomy": "apply" } ] }
Response — Flashcard
{ "model_used": "gemini-2.5-flash-lite", "usage": { "cost_usd": 0.0016 }, "flashcards": [ { "id": "fc1", "front": { "text": "Apa itu integral tentu?" }, "back": { "text": "Integral yang memiliki batas bawah dan batas atas...", "latex": "\\int_{a}^{b} f(x)dx", "key_concept": "Integral dengan batas" } } ] }
Error Response
{ "status": "error", "error": { "code": "INSUFFICIENT_CONTEXT", "message": "Dokumen tidak memiliki cukup konten untuk 10 soal", "details": { "suggestion": "Upload dokumen lebih panjang atau kurangi jumlah soal" } } }
Studi Kasus: 5 Soal PG + Knowledge — Biaya IDR & JSON
Asumsi token & knowledge (satu request)
Skenario: guru meminta 5 soal pilihan ganda dari materi yang sudah di-embed sebagai knowledge (chunk RAG / potongan dokumen). Estimasi token mengikuti pola dokumen ini (PG ~300–500 tk/soal output); knowledge ditambahkan eksplisit di input.
📚 Knowledge / konteks materi
Setara ~2–3 hal teks padat atau satu chunk retrieval.
⚙️ System + instruksi + skema JSON
Prompt tetap, format output terstruktur.
⬇️ Input total
⬆️ Output (5 PG + opsi + penjelasan ringkas)
Total request ≈ 4.950 tokens.
cost_idr di bawah = cost_usd × 17.678, dibulatkan ke rupiah penuh (integer).
Perbandingan biaya (model acuan vs bagian 02)
Biaya dihitung dari harga per juta token pada tabel Detail Cost — Generate Kuis 10 Soal PG (Claude Sonnet 4.6, Gemini 2.5 Flash, GPT-5.4 Mini) dengan 2.950 input dan 2.000 output token.
| Provider | Model | Input (tk) | Output (tk) | Cost (USD) | Cost (IDR)* |
|---|---|---|---|---|---|
| Claude | Sonnet 4.6 | 2.950 | 2.000 | ~$0.0389 | ~Rp 688 |
| OpenAI | GPT-5.4 Mini | 2.950 | 2.000 | ~$0.0112 | ~Rp 198 |
| Gemini | 2.5 Flash | 2.950 | 2.000 | ~$0.0059 | ~Rp 104 🏆 |
*IDR = USD × 17.678, dibulatkan ke rupiah penuh per request (kurs 18 Mei 2026).
Pada 1.000 request skenario yang sama (×1.000): Gemini ≈ Rp 104.000 → GPT-5.4 Mini ≈ Rp 198.000 → Sonnet ≈ Rp 688.000 (notasi ribuan dengan titik; pajak, cache, dan batch API belum dihitung).
Contoh response JSON (knowledge + usage + 5 soal)
Struktur selaras bagian 09; objek knowledge untuk jejak materi. Opsi PG: array { "key", "value" }. Perbedaan biaya menurut jenis knowledge (jurnal, gambar, YouTube, chunk) ada di tabel Knowledge processing → generate 5 soal PG pada bagian 02. Angka usage di bawah = baris chunk / RAG + Gemini Flash.
{ "request_id": "req_5pg_knowledge_01", "status": "success", "model_used": "gemini-2.5-flash", "knowledge": { "source": "rag_chunk", "document_id": "doc_integral_kelas12", "token_estimate": 2500, "excerpt": "Integral tentu ∫_a^b f(x)dx adalah limit jumlah Riemann..." }, "usage": { "input_tokens": 2950, "output_tokens": 2000, "total_tokens": 4950, "cost_usd": 0.00589, "cost_idr": 104, "fx_idr_per_usd": 17678 }, "questions": [ { "id": "q1", "type": "multiple_choice", "question": { "text": "Jika ∫_0^2 x dx = L, nilai L adalah?" }, "options": [ { "key": "a", "value": "1" }, { "key": "b", "value": "2" }, { "key": "c", "value": "3" }, { "key": "d", "value": "4" } ], "correct_answer": "b", "explanation": { "text": "∫_0^2 x dx = [x²/2]_0^2 = 2" } }, { "id": "q2", "type": "multiple_choice", "question": { "text": "Sifat linearitas integral berlaku untuk?" }, "options": [ { "key": "a", "value": "..." }, { "key": "b", "value": "..." }, … ], "correct_answer": "a" }, { "id": "q3", "type": "multiple_choice", "question": { "text": "Teorema dasar kalkulus menghubungkan …" }, "options": [ { "key": "a", "value": "..." }, … ], "correct_answer": "c" }, { "id": "q4", "type": "multiple_choice", "question": { "text": "Luas di bawah kurva y=f(x) dari a ke b …" }, "options": [ { "key": "a", "value": "..." }, … ], "correct_answer": "d" }, { "id": "q5", "type": "multiple_choice", "question": { "text": "Substitusi pada integral tentu digunakan ketika …" }, "options": [ { "key": "a", "value": "..." }, … ], "correct_answer": "b" } ] }
Ringkas perbandingan: untuk skenario knowledge + 5 soal PG, Gemini 2.5 Flash memberi biaya per request terendah pada tabel acuan; GPT-5.4 Mini di tengah; Claude Sonnet 4.6 tertinggi namun tetap relevan untuk konten yang lebih panjang atau reasoning berat (selaras rekomendasi hybrid di atas). Aktual token bisa naik jika knowledge atau penjelasan per soal diperlebar.