Token Efficiency of Programming Languages for LLM Code Generation¶
Источник: исследование Martin Alderson (RosettaCode, GPT-4 tokenizer, 19 языков)
Rankings¶
| # | Language | Avg Tokens | Category |
|---|---|---|---|
| 1 | J | ~70 | Array language, pure ASCII |
| 2 | Clojure | 109 | Functional |
| 3 | APL | 110 | Array language |
| 4 | Haskell | 115 | Functional |
| 5 | F# | 118 | Functional |
| 6 | Python | 130 | Dynamic |
| 7 | Ruby | ~135 | Dynamic |
| 8 | JavaScript | 148 | Dynamic |
| 9 | Go | ~160 | Statically typed |
| 10 | C# | ~170 | Statically typed |
| 11 | C | 182 | Procedural |
Ключевой факт: 2.6× разрыв между самым эффективным (J) и наименее эффективным (C).
Key Findings¶
Dynamic languages win on token count¶
No type declarations = fewer tokens. However, JavaScript is a notable outlier — most verbose dynamic language in the set.
Functional languages punch above their weight¶
Haskell and F# compete with dynamic languages despite being statically typed. Reason: excellent type inference eliminates the need for explicit type annotations.
APL/J paradox¶
APL's famous terseness hurts LLMs — its Unicode glyphs (⍳, ⍴, ⌽) tokenize poorly, each becoming multiple tokens. J uses ASCII and dominates at just 70 tokens average.
Typed languages still win for LLM development¶
- Compile-time catch of hallucinations
- LSP integration works better
- Rapid feedback loop
Using typed languages for LLMs has an awful lot of benefits — not least because it can compile and get rapid feedback on any syntax errors or method hallucinations.
Frameworks matter more than languages¶
Follow-up research found that web framework choice has a larger token impact than language selection.
Why It Matters¶
As LLMs become primary coding assistants, the context window is a hard limit. Every token spent on boilerplate shrinks space for:
- Business logic
- Tests
- Documentation
- Code review
Token footprint directly drives productivity and API cost (OpenAI/Anthropic per-token billing).
Practical Takeaways¶
| Context | Recommended |
|---|---|
| Long AI-assisted sessions | F#, Haskell, Ruby, Clojure |
| Ecosystem + AI tooling | Python (reasonable middle ground) |
| Maximum efficiency | J (but niche, steep learning curve) |
| Performance-critical modules | C / Rust (keep separate from LLM context) |
| Avoid for token-constrained contexts | C, Java, plain Go unless necessary |
Hybrid approach: token-efficient orchestration layer (Clojure/F#) + performance modules (C/Rust).