Research Report · v1.0

Analisis Token & Cost
AI Question Generation
LMS Kuis Cards School

Mei 2026 AI Research Team 3 Providers · 14 Models Reviewed CTO & CFO
$0.007
Cost per Generate
Hybrid Optimized
89%
Penghematan vs Sonnet
Full Optimization
$70
Budget / Bulan
10K requests
9.0/10
Quality Score
Hybrid Strategy
14
Model Dibandingkan
3 Providers
🎯
Rekomendasi Utama: Gunakan Hybrid Strategy — Gemini 2.5 Flash (default), Gemini Flash-Lite (flashcard), Claude Sonnet 4.6 (essay/kompleks). Dengan Batch API + Prompt Caching, biaya turun hingga 89% dibanding Claude-only.
01

Breakdown Flow Generate Soal

📝 Generate Kuis (10 Soal PG)

Input Tokens~1,000
Output Tokens~4,000
Total~5,000 tokens

PG: 300–500tk/soal · Essay: 400–600tk · T/F: 200–350tk

🃏 Generate Flashcard (20 Kartu)

Input Tokens~600
Output Tokens~4,000
Total~4,600 tokens

Per kartu: 150–250 tokens (pertanyaan + jawaban)

▶ Generate dari YouTube (10 menit)

Transcript~3,000–5,000
Instruksi + Output~4,500
Total~8,500 tokens

📄 Upload Dokumen PDF (10 Halaman)

Dokumen~6,000
Instruksi + Output~4,500
Total~10,500 tokens

📓 Generate dari Jurnal Guru

Jurnal 3–5 hal~2,000
Instruksi + Output~4,500
Total~7,000 tokens

✏️ Generate dari Free Text

Prompt User~300–1,500
System + Output~4,500
Total~5,500 tokens
02

Estimasi Cost per Request

Ringkasan Cost per Flow (Model Mid-Tier)

Flow Est. Tokens Claude Sonnet 4.6 Gemini 2.5 Flash GPT-5.4 Mini
Kuis 10 soal PG~5,000$0.024$0.014$0.023
Flashcard 20 kartu~4,600$0.022$0.013$0.021
YouTube 10 menit~8,500$0.041$0.023$0.039
Dokumen 10 hal~10,500$0.051$0.029$0.048
Jurnal Guru~7,000$0.034$0.019$0.032
Free Text~5,500$0.027$0.015$0.025

Knowledge processing → generate 5 soal PG (harga per request)

Skenario sama untuk semua baris: output ~2.000 tokens (5 soal PG + opsi + penjelasan ringkas), prompt + skema JSON ~450 tokens. Kolom Est. input (materi) adalah teks/setara token setelah pemrosesan (RAG chunk, teks jurnal, ringkasan vision gambar, transkrip YouTube, dll.) yang masuk ke LLM. Belum termasuk biaya unduh/transkrip YouTube, storage, atau OCR di luar call model.

Jenis knowledge / processing Est. input (materi) Total est. tokens* Sonnet 4.6 Gemini 2.5 Flash GPT-5.4 Mini
Chunk teks / RAG (potongan dokumen)~2.500~4.950~$0.0389~$0.0059~$0.0112
Jurnal guru (teks 3–5 hal)~2.000~4.450~$0.0374~$0.0057~$0.0108
Gambar / diagram (setara caption vision)~600~3.050~$0.0332~$0.0053~$0.0098
Link YouTube (cuplikan transkrip ~10 menit)~4.000~6.450~$0.0434~$0.0063~$0.0123

*Total tokens = input (materi + 450) + output 2.000. Harga dari tarif per juta token bagian 02 (Sonnet $3/$15 in/out, Flash $0,30/$2,50, Mini $0,75/$4,50).

Detail Cost — Generate Kuis 10 Soal PG (1K input / 4K output)

ProviderModelInput CostOutput CostTotal
OpenAIGPT-5.5$0.0050$0.1200$0.1250
OpenAIGPT-5.4$0.0025$0.0600$0.0625
OpenAIGPT-5.4 Mini$0.0008$0.0180$0.0188
OpenAIGPT-5.4 Nano$0.0002$0.0050$0.0052
ClaudeOpus 4.7$0.0050$0.1000$0.1050
ClaudeSonnet 4.6$0.0030$0.0600$0.0630
ClaudeHaiku 4.5$0.0010$0.0200$0.0210
Gemini2.5 Pro$0.0013$0.0400$0.0413
Gemini2.5 Flash$0.0003$0.0100$0.0103
Gemini2.5 Flash-Lite$0.0001$0.0016$0.0017 🏆
03

Perbandingan Provider AI

ProviderModelInput $/MOutput $/MContextSpeedQuality
OpenAIGPT-5.5$5.00$30.001MFastExcellent
OpenAIGPT-5.4$2.50$15.001MFastExcellent
OpenAIGPT-5.4 Mini$0.75$4.501MVery FastGood
OpenAIGPT-5.4 Nano$0.20$1.251MUltra FastFair
ClaudeOpus 4.7$5.00$25.001MMediumExcellent
ClaudeSonnet 4.6$3.00$15.001MFastExcellent
ClaudeHaiku 4.5$1.00$5.00200KVery FastGood
Gemini3.1 Pro Preview$2.00$12.002MMediumExcellent
Gemini2.5 Pro$1.25$10.001MMediumExcellent
Gemini2.5 Flash$0.30$2.501MVery FastVery Good
Gemini2.5 Flash-Lite$0.10$0.401MUltra FastGood

Fitur Tambahan

FeatureOpenAIClaudeGemini
Batch API (50% off)
Prompt Caching (90% off)
Web Search Tool✅ $10/1K✅ $10/1K✅ $14/1K
Structured Output✅ Native✅ Native✅ Native
Max Context1M tokens1M tokens2M tokens
04

Rekomendasi Model untuk Generate Soal

🏆 Best Overall Rekomendasi

Gemini 2.5 Flash

Input$0.30/M tokens
Output$2.50/M tokens
10K req/month$103/bulan

Terbaik untuk production default — price/performance ratio terbaik, 1M context, multimodal support, free tier generous.

💎 Best Premium Quality Premium

Claude Sonnet 4.6

Input$3.00/M tokens
Output$15.00/M tokens
10K req/month$630/bulan

Quality tertinggi untuk educational content — soal essay, pembahasan mendalam, formatting konsisten, safety terbaik.

⚡ Best Ultra-High Volume Budget

Gemini 2.5 Flash-Lite

Input$0.10/M tokens
Output$0.40/M tokens
10K req/month$17/bulan

Cocok untuk flashcard, soal True/False, A/B testing, atau volume 100K+ request/bulan.

✨ Hybrid Strategy Optimal

Mix & Match

40% Flash-Lite~$7
50% Gemini Flash~$52
10% Claude Sonnet~$63
Total 10K req/month~$122/bulan

Model Selection Flowchart

🚀 Request Masuk
Tipe Soal?
Flashcard → Flash-Lite
$0.0017
Essay / Advanced → Claude Sonnet
$0.0630
Dokumen >50 hal → Gemini 3.1 Pro
2M context
Default (lainnya)
Gemini 2.5 Flash
$0.0103
Urgent?
Ya → Real-time API
Tidak → Batch API (50% off)

Quality/Cost Ratio per Model

Flash-Lite (ROI)
0.42
Gemini Flash
0.08
Claude Sonnet
0.015
Claude Opus
0.009
05

Plus Minus Tiap Model

Claude API

Claude Opus 4.7 — Premium · $5/$25 per M
Plus
  • Highest quality reasoning & explanation
  • Best untuk complex educational content
  • Superior safety & appropriateness
  • 1M context window + prompt caching
Minus
  • Paling mahal ($5/$25 per M tokens)
  • Latency lebih tinggi
  • Overkill untuk simple questions
Claude Sonnet 4.6 — ⭐ Recommended · $3/$15 per M
Plus
  • Best balance quality/cost di Claude family
  • Excellent untuk educational content
  • Consistent, well-formatted output
  • 1M context window
Minus
  • ~3x lebih mahal dari Gemini Flash
  • Tidak secepat Flash models
Claude Haiku 4.5 — Budget · $1/$5 per M
Plus
  • Very fast response
  • Cost-effective dari Claude family
  • Good quality untuk price point
Minus
  • Context hanya 200K
  • Quality gap vs Sonnet noticeable
  • Masih lebih mahal dari Gemini Flash

Gemini API

Gemini 2.5 Flash — ⭐ Best Value · $0.30/$2.50 per M
Plus
  • Best price/performance ratio
  • Very fast response
  • 1M context window
  • Multimodal support (PDF, PPT + gambar)
  • Batch API 50% + prompt caching
Minus
  • Quality sedikit di bawah Claude Sonnet
  • Function calling tidak serobust Claude
  • Formatting kurang konsisten
Gemini 2.5 Flash-Lite — 🏆 Best ROI · $0.10/$0.40 per M
Plus
  • Ultra cost-effective — paling murah
  • Ultra-fast response time
  • Scalable ke millions of requests
  • 1M context window
Minus
  • Quality gap untuk complex questions
  • Limited reasoning depth
  • Butuh lebih banyak prompt engineering
06

Optimasi Cost

⚡ Batch Processing

50% Discount

Semua provider (OpenAI, Claude, Gemini) mendukung Batch API. Cocok untuk generate bank soal atau pre-generated content.

Claude Sonnet 10K/mo$630 → $315

🗄 Prompt Caching

70% Savings

Cache system prompts, template soal, dan context dokumen yang sering dipakai. Hit rate 50% bisa hemat signifikan.

First call$0.0195
Subsequent (cached)$0.006 (69% off)

🔀 Intelligent Model Routing

30-40% Savings

Routing otomatis berdasarkan tipe soal dan kompleksitas. Gunakan model paling murah yang cukup untuk task tersebut.

✂️ Output Token Limiting

10-20% Savings

Set max_tokens=2000 untuk mengontrol output dan menghilangkan verbosity yang tidak perlu.

Total Potential Savings — 10K Requests/Month

StrategiMonthly CostCost/GenerateQuality
Claude Opus Only$1,050$0.105Excellent
Claude Sonnet Only$630$0.063Excellent
Gemini Flash Only$103$0.0103Very Good
Hybrid Optimized$109$0.0109Excellent
Hybrid + Batch + Cache 🏆$70$0.007Excellent
💰
Hybrid + Optimization: 93% lebih murah dari Opus-only · 89% lebih murah dari Sonnet-only · 32% lebih murah dari Flash-only — dengan quality setara Sonnet.
07

Rekomendasi Final & Roadmap

Implementasi Bertahap

Phase 1 · Bulan 1–3

MVP / Testing

Primary ModelGemini 2.5 Flash
Target Budget$100–150/bulan

Best value untuk testing dan iterasi cepat. Setup API keys semua provider, bangun prompt templates, ukur actual token usage.

Phase 2 · Bulan 4–6

Production — Hybrid Strategy

60% Gemini Flash$62
30% Flash-Lite (flashcard)$5
10% Claude Sonnet (complex)$63
Total~$130/bulan
Phase 3 · Bulan 7+

Scale — Full Optimization

Dengan Batch + Caching~$70–80/bulan

Fine-tune model selection berdasarkan data real, implement advanced caching, pertimbangkan fine-tuning untuk domain spesifik.

Cost Projection by Scale

Monthly RequestsGemini FlashHybridOptimizedClaude Sonnet
1,000$10$11$7$63
5,000$52$55$35$315
10,000$103$109$70$630
50,000$515$545$350$3,150
100,000$1,030$1,090$700$6,300

Implementasi Checklist

Week 1–2: Setup
  • Setup API keys: OpenAI, Claude, Gemini
  • Build model router logic
  • Test dengan sample soal setiap tipe
  • Measure quality & cost
Week 3–4: Build
  • Implement Gemini Flash as default
  • Build prompt templates per tipe soal
  • Setup basic caching
  • Deploy untuk internal testing
Week 5–6: Optimize
  • Analisis token usage patterns
  • Implement intelligent routing
  • Setup batch processing
  • Fine-tune prompts dari results
Week 7–8: Production
  • Deploy hybrid strategy
  • Setup monitoring & alerts
  • Implement cost tracking per feature
  • Launch ke beta teachers
08

Monitoring & KPIs

📊 Cost Metrics

  • Cost per generate
  • Monthly total cost
  • Cost by model & question type
  • Cache hit rate & savings

🎓 Quality Metrics

  • Teacher satisfaction score
  • Edit rate (target <30%)
  • Rejection rate
  • Student performance on AI Qs

⚡ Performance Metrics

  • Response time P50/P95/P99
  • Success rate (target >98%)
  • Error rate by model
  • Tokens per request

Alert Thresholds

alerts:
  cost:
    daily_spend: > $50
    cost_per_generate: > $0.15
    monthly_projection: > $2000
  
  performance:
    response_time_p95: > 10s
    error_rate: > 2%
  
  quality:
    rejection_rate: > 20%
    edit_rate: > 50%
09

Format JSON Output

Response — Multiple Choice

{
  "request_id": "req_abc123",
  "status": "success",
  "model_used": "gemini-2.5-flash",
  "usage": {
    "input_tokens": 6500,
    "output_tokens": 4200,
    "total_tokens": 10700,
    "cost_usd": 0.0125
  },
  "questions": [
    {
      "id": "q1",
      "type": "multiple_choice",
      "points": 10,
      "question": { "text": "...", "latex": "\\int_{0}^{2}..." },
      "options": [
        { "key": "a", "value": "..." },
        { "key": "b", "value": "..." },
        
      ],
      "correct_answer": "b",
      "explanation": { "text": "...", "steps": [ ... ] },
      "bloom_taxonomy": "apply"
    }
  ]
}

Response — Flashcard

{
  "model_used": "gemini-2.5-flash-lite",
  "usage": { "cost_usd": 0.0016 },
  "flashcards": [
    {
      "id": "fc1",
      "front": { "text": "Apa itu integral tentu?" },
      "back": {
        "text": "Integral yang memiliki batas bawah dan batas atas...",
        "latex": "\\int_{a}^{b} f(x)dx",
        "key_concept": "Integral dengan batas"
      }
    }
  ]
}

Error Response

{
  "status": "error",
  "error": {
    "code": "INSUFFICIENT_CONTEXT",
    "message": "Dokumen tidak memiliki cukup konten untuk 10 soal",
    "details": {
      "suggestion": "Upload dokumen lebih panjang atau kurangi jumlah soal"
    }
  }
}
10

Studi Kasus: 5 Soal PG + Knowledge — Biaya IDR & JSON

Asumsi token & knowledge (satu request)

Skenario: guru meminta 5 soal pilihan ganda dari materi yang sudah di-embed sebagai knowledge (chunk RAG / potongan dokumen). Estimasi token mengikuti pola dokumen ini (PG ~300–500 tk/soal output); knowledge ditambahkan eksplisit di input.

📚 Knowledge / konteks materi

Est. tokens~2.500

Setara ~2–3 hal teks padat atau satu chunk retrieval.

⚙️ System + instruksi + skema JSON

Est. tokens~450

Prompt tetap, format output terstruktur.

⬇️ Input total

Knowledge + prompt~2.950

⬆️ Output (5 PG + opsi + penjelasan ringkas)

Est. tokens~2.000

Total request ≈ 4.950 tokens.

💱
Kurs (18 Mei 2026): Rp 17.678 / USD (nilai acuan dokumen ini). Tim Finance dapat mengganti dengan kurs internal atau BI. cost_idr di bawah = cost_usd × 17.678, dibulatkan ke rupiah penuh (integer).

Perbandingan biaya (model acuan vs bagian 02)

Biaya dihitung dari harga per juta token pada tabel Detail Cost — Generate Kuis 10 Soal PG (Claude Sonnet 4.6, Gemini 2.5 Flash, GPT-5.4 Mini) dengan 2.950 input dan 2.000 output token.

Provider Model Input (tk) Output (tk) Cost (USD) Cost (IDR)*
ClaudeSonnet 4.62.9502.000~$0.0389~Rp 688
OpenAIGPT-5.4 Mini2.9502.000~$0.0112~Rp 198
Gemini2.5 Flash2.9502.000~$0.0059~Rp 104 🏆

*IDR = USD × 17.678, dibulatkan ke rupiah penuh per request (kurs 18 Mei 2026).

Pada 1.000 request skenario yang sama (×1.000): Gemini ≈ Rp 104.000 → GPT-5.4 Mini ≈ Rp 198.000 → Sonnet ≈ Rp 688.000 (notasi ribuan dengan titik; pajak, cache, dan batch API belum dihitung).

Contoh response JSON (knowledge + usage + 5 soal)

Struktur selaras bagian 09; objek knowledge untuk jejak materi. Opsi PG: array { "key", "value" }. Perbedaan biaya menurut jenis knowledge (jurnal, gambar, YouTube, chunk) ada di tabel Knowledge processing → generate 5 soal PG pada bagian 02. Angka usage di bawah = baris chunk / RAG + Gemini Flash.

{
  "request_id": "req_5pg_knowledge_01",
  "status": "success",
  "model_used": "gemini-2.5-flash",
  "knowledge": {
    "source": "rag_chunk",
    "document_id": "doc_integral_kelas12",
    "token_estimate": 2500,
    "excerpt": "Integral tentu ∫_a^b f(x)dx adalah limit jumlah Riemann..."
  },
  "usage": {
    "input_tokens": 2950,
    "output_tokens": 2000,
    "total_tokens": 4950,
    "cost_usd": 0.00589,
    "cost_idr": 104,
    "fx_idr_per_usd": 17678
  },
  "questions": [
    {
      "id": "q1",
      "type": "multiple_choice",
      "question": { "text": "Jika ∫_0^2 x dx = L, nilai L adalah?" },
      "options": [
        { "key": "a", "value": "1" },
        { "key": "b", "value": "2" },
        { "key": "c", "value": "3" },
        { "key": "d", "value": "4" }
      ],
      "correct_answer": "b",
      "explanation": { "text": "∫_0^2 x dx = [x²/2]_0^2 = 2" }
    },
    { "id": "q2", "type": "multiple_choice", "question": { "text": "Sifat linearitas integral berlaku untuk?" }, "options": [ { "key": "a", "value": "..." }, { "key": "b", "value": "..." }, … ], "correct_answer": "a" },
    { "id": "q3", "type": "multiple_choice", "question": { "text": "Teorema dasar kalkulus menghubungkan …" }, "options": [ { "key": "a", "value": "..." }, … ], "correct_answer": "c" },
    { "id": "q4", "type": "multiple_choice", "question": { "text": "Luas di bawah kurva y=f(x) dari a ke b …" }, "options": [ { "key": "a", "value": "..." }, … ], "correct_answer": "d" },
    { "id": "q5", "type": "multiple_choice", "question": { "text": "Substitusi pada integral tentu digunakan ketika …" }, "options": [ { "key": "a", "value": "..." }, … ], "correct_answer": "b" }
  ]
}

Ringkas perbandingan: untuk skenario knowledge + 5 soal PG, Gemini 2.5 Flash memberi biaya per request terendah pada tabel acuan; GPT-5.4 Mini di tengah; Claude Sonnet 4.6 tertinggi namun tetap relevan untuk konten yang lebih panjang atau reasoning berat (selaras rekomendasi hybrid di atas). Aktual token bisa naik jika knowledge atau penjelasan per soal diperlebar.

📎

Resources & Referensi