1. Why a single per-tool calculator is not enough
Almost every commentary on "the cost of AI" today picks one tool and quotes its per-unit price. That is misleading, because no real piece of content is produced by a single tool. A 60-second YouTube Short is a workflow: a language model writes the script, a video model renders the footage, a text-to-speech model dubs the voice-over, an image model produces the thumbnail, and a translation model localises the captions for two or three secondary markets. The total bill is the sum of five line items in five different price units — per million tokens, per second of video, per million characters of speech, per image, and per million characters again for translation. Sticker prices like "GPT-5 costs $5 per million input tokens" or "Sora is $0.75 per second" answer nothing on their own; what creators and marketers need is the workflow total, broken down so they know which stage to optimise.
That is what this calculator does. It encodes five common content scenarios — Shorts, short-form ads, online-lecture episodes, audio-only podcasts, and SEO blog posts with illustrations — into deterministic stage graphs. Each stage has a sensible default model picked for you at the low / mid / high quality tier you choose, and you can swap any model independently to see the immediate effect on the total. The site does not store anything server-side; pricing data ships as a static JSON file with the build, and your inputs are persisted in localStorage and (optionally) shared via a base64-encoded URL token.
2. The five cost categories explained
2.1 Scripted LLM
Every workflow starts with text — the script for a Short, the ad copy, the lecture transcript, or the blog draft. Frontier LLMs price input and output separately, with output usually four to five times more expensive than input. For a 60-second Short the script is small (about 220 output tokens at 2.2 words/second, plus a ~600-token system brief), so even Claude Opus 4.7 at $75 per million output tokens costs less than $0.02 for the script stage. For a 20-minute podcast script, output is closer to 6,000 tokens and the script alone can exceed $0.40 on the premium tier. The calculator estimates output tokens from your length input using the conventional 2.2 spoken words per second, then applies a constant multiplier (1.4) to convert words to BPE tokens.
2.2 AI video generation
Video is the line item that dominates ad and Shorts budgets. As of 2026-05, public list prices range from Kling 2 at $0.20 per second of 1080p output, through Runway Gen-3 at $0.40, Veo 2 at $0.50, and Sora at $0.75 per second on the Pro tier. The calculator multiplies the chosen per-second rate by your length input, with a sanity cap on the maximum continuous shot each model supports (Veo 2 maxes at 8 seconds per clip; Sora can do 20). For longer content you would chain multiple shots, but the cost arithmetic is the same. The "best mix" panel on the home page surfaces the cheapest video model for your scenario so you can see the absolute floor before committing.
2.3 Text-to-speech voice-over
Voice-over cost is driven by character count, and 60 seconds of natural-pace speech is around 750 characters in Latin scripts. ElevenLabs is by far the most expensive premium voice at an effective $165 per million characters (Creator tier amortised), but for sponsorship reads and ad spots its perceived quality justifies the spend. OpenAI's TTS-1 standard tier is $15 per million characters — eleven times cheaper — and Google's WaveNet sits at $16. For long-form podcasts, switching from ElevenLabs to OpenAI HD at $30/1M characters can shave 30 to 80 percent off the audio bill at the cost of a less expressive voice. The calculator labels these tiers low / mid / high so you can swap them in one click.
2.4 Image generation (thumbnails and inline visuals)
For Shorts and lectures, a single thumbnail is enough. For ads you usually generate two key visuals for retargeting, and for SEO blog posts a hero image plus two inline illustrations is standard. List prices have converged around $0.04 per 1024×1024 image for DALL-E 3 standard, Midjourney v6 amortised, and Imagen 3, with FLUX 1.1 Pro slightly higher at $0.05 for photorealistic thumbnails. The cheap end is self-hosted Stable Diffusion XL on a rented GPU, which works out to under $0.01 per image if you batch generate. The calculator counts images by the scenario's recipe — one for Shorts, two for ads, three for blog posts.
2.5 Subtitle and copy translation
If you publish in two or more languages, translation is its own line item. DeepL Pro and Google Cloud Translation v3 are priced at $20 per million characters for top quality. A much cheaper alternative is GPT-4o mini used in prompt-translation mode, which works out to roughly $1.20 per million characters of output and is good enough for subtitles when post-edited. The calculator multiplies your character count by(languages − 1) so the source-language version is free, and only extra locales accrue translation cost.
3. Worked example: a 60-second YouTube Short in three languages
Take the default Shorts scenario at the mid quality tier with three languages (English plus two locales). Stage by stage at 2026-05 prices: the script costs about $0.001 on GPT-4o mini ($0.15 per million input tokens, $0.60 per million output), the 60-second video costs $24 on Runway Gen-3 at $0.40 per second, the voice-over costs about $0.012 on Google WaveNet at $16 per million characters (750 chars × 1 language), the thumbnail costs $0.04 on Imagen 3, and the subtitle translation for two extra languages costs about $0.0018 on GPT-4o mini translate ($1.20 per million characters × 750 chars × 2). The total comes out to roughly $24.07, dominated almost entirely by the video stage. Push quality up to "high" and the video jumps to $30 on Veo 2 at $0.50/s; push down to "low" with Kling 2 at $0.20/s and the entire piece costs $12.05.
The same math run on a 20-minute podcast looks very different. The video stage disappears (it's audio-only), the script cost jumps to about $0.50 on Claude Sonnet 4.6 because output is now 6,000+ tokens, the narration is the dominant line at roughly $2.50 on ElevenLabs for 15,000 characters of speech, the cover art adds $0.05, and translating a 3,000-character show note into two locales on DeepL costs about $0.12. Total: under $3.20 per episode at premium quality, which is why the "AI podcast" format scales so well — the cost floor is two orders of magnitude below short-form video.
4. Where to look for cost savings
Three rules cover 90 percent of optimisation work. Cap output tokens on every LLM stage; output is four to five times more expensive than input, so a runaway 8,000-token script on Opus alone wipes out the savings from an entire video-tier downgrade. Pick the cheapest credible video model for your target quality — Kling 2 vs Veo 2 is a 2.5× swing and viewers usually cannot tell the difference at 1080p mobile playback. And route translation through a cheap LLM when post-editing is acceptable; $1.20 per million characters on GPT-4o mini is sixteen times cheaper than DeepL and produces serviceable subtitles for casual content.
The cost categories are not independent. A premium narration is wasted on a low-quality video, and a Veo 2 hero shot is overkill for casual Shorts. Use the quality-tier slider as a coordinated dial — it picks consistent defaults across all five stages — and override individual stages only when a specific category needs extra polish (typically the voice for ads, and the video for hero pieces).
5. Caveats and how to keep this current
All prices in this calculator are public list prices as of 2026-05. Enterprise agreements, volume discounts, and per-month subscription credits (ElevenLabs, Midjourney, Runway) can change the effective cost by 20 to 40 percent in either direction; the calculator amortises subscription plans into a per-unit equivalent where possible. Video generation pricing is the most volatile category — three of the five video models on the list re-priced between 2025-11 and 2026-05 — so always cross-check your largest line item against the vendor's current pricing page before committing to a campaign budget.
The dataset behind the calculator is published as a static JSON file at /prices.json under CC0; you are welcome to fork it for your own internal tooling. For the full assumption set — words per second, system-prompt sizes, and language conversion factors — read the source code at the link on the footer.
Continue to the FAQ for quick answers to the questions creators ask most, or jump back to the calculator to model your own content.