Whether you need to capture meeting minutes without frantic note-taking, turn an interview recording into a draft article, or follow a lecture hands-free, iPhone already has surprisingly capable transcription built in. iOS 18's Voice Memos can automatically transcribe and separate speakers on-device. Pair that with a dedicated app like Otter.ai or Notta and you get AI-generated summaries, Zoom integration, and batch conversion of existing audio files. This guide covers everything from the free built-in options to the premium services worth paying for — including picks by use case so you can go straight to what you need.
Table of Contents
- What iPhone transcription can do
- iOS built-in transcription (free)
- Free third-party apps
- Premium apps compared
- Picks by use case
- 5 tips to improve transcription accuracy
- Common questions
- Summary
What iPhone transcription can do
When it's useful
iPhone transcription turns spoken audio into editable text — and the range of situations where that's genuinely useful is wider than most people expect.
- Meeting and call notes: record first, transcribe later, then share a clean summary with the team
- Interviews and journalism: convert recorded conversations into draft text for faster article writing
- Lectures and seminars: capture everything when note-taking can't keep up with the speaker
- Language practice: speak into Dictation and see exactly how your pronunciation is recognized
- Quick idea capture: dictate a thought in 10 seconds instead of typing it out
- Captions and subtitles: generate text for video or podcast content without manual typing
People who say they "can't keep up" in fast meetings or interviews often find that switching to a record-then-transcribe workflow cuts their post-meeting work by half.
Built-in vs. third-party apps
It helps to understand what iOS already handles and what dedicated apps add before you commit to anything.
| Category | Strengths | Weaknesses |
|---|---|---|
| iOS built-in | Real-time dictation, Voice Memos auto-transcription, free, works offline (iOS 16+) | No audio file import, limited AI summary, speaker labels basic |
| Free third-party apps | Longer recording sessions, cloud sync, better multilingual support | Monthly free-tier caps, ads, limited export formats |
| Premium services | Speaker diarization, AI summaries, Zoom/Meet integration, batch file processing | $10–$20 / month subscription |
For short voice memos, quick dictation, or the occasional meeting, the built-in features cover most of what you need at zero cost. If you're transcribing meetings or interviews daily, a paid service pays for itself in saved time quickly.
iOS built-in transcription (free)
iOS ships with several distinct transcription methods — each suited to a different workflow. Knowing which one to reach for saves a lot of frustration.
Keyboard Dictation (mic icon)
The microphone icon on the iPhone keyboard is the fastest way to get speech into text. It works in any text field on the system — Notes, Mail, Messages, search bars, web forms, everything.
- Languages: English, Spanish, French, and dozens more
- Accuracy: solid for everyday conversational speech
- Internet: works offline in iOS 16 and later (on-device processing)
- Punctuation: say "period," "comma," "new line," or "new paragraph" to insert punctuation
For quick idea capture or drafting a short email, Dictation is the fastest option with no setup. If you just want to turn speech into text right now, this is the fastest path.
Voice Memos transcription (iOS 18+)
iOS 18 added automatic transcription directly inside the Voice Memos app — including speaker detection — all processed on-device by Apple Silicon. This is the most significant built-in transcription upgrade in years.
- Open the Voice Memos app and record (or tap an existing recording)
- Tap the waveform to open the detail view
- Tap the "…" menu in the top right and choose "Transcribe"
- Wait 30 seconds to a few minutes depending on the recording length
- Speaker diarization: yes — labels speakers as "Speaker 1," "Speaker 2," and so on
- Processing: fully on-device (your audio never leaves your iPhone)
- Languages: English and a growing list of others
- Cost: free with iOS 18
Recording a meeting in Voice Memos and transcribing it afterward is the simplest and most private way to create meeting notes without any third-party accounts or subscriptions.
Notes app voice input
Open Notes, start a new note, and tap the mic icon on the keyboard to dictate directly into the note. The result syncs over iCloud to all your Apple devices automatically.
- Saved to: iCloud (automatic backup across devices)
- Features: format as checklists, add images, mix typed and dictated text
- Apple Pencil: handwriting and voice input can live in the same note
This is ideal when you want to capture audio and skip the copy-paste step. Dictate your thoughts directly where they'll live, and Notes handles storage, search, and sharing from there.
Live Captions (iOS 16+)
Live Captions is a system-wide real-time captioning feature that overlays subtitles on anything playing audio — whether that's a FaceTime call, a Zoom meeting, a YouTube video, or ambient sound picked up by the microphone.
- Go to Settings → Accessibility → Live Captions
- Toggle Live Captions on
- A caption box appears at the top of the screen whenever audio is detected
- Tap the microphone icon to caption ambient sound; the speaker icon for device audio
- Processing: fully on-device (no cloud required)
- Languages: optimized for English; other languages vary by iOS version and region
- Saving: captions are displayed only — take a screenshot to save text
Live Captions is particularly useful for English-language meetings and video calls. If you need to follow a fast-paced English conversation in real time, it's the closest iOS gets to a live interpreter.
Tips to improve accuracy
The recording environment has a bigger impact on transcription accuracy than the app you choose. A few simple habits make a noticeable difference.
- Keep the microphone close to the speaker — within about 12 inches is ideal; don't leave the iPhone in the middle of a large conference table
- Minimize background noise — air conditioning, coffee shop ambient noise, and music all degrade accuracy significantly
- Speak at a steady pace — rushing through sentences causes more misrecognitions than any other factor
- Use punctuation voice commands — "period," "comma," "question mark," and "new paragraph" work in Dictation
- Add custom vocabulary — register proper nouns and industry jargon in Settings → General → Keyboard → Text Replacement
Getting these basics right means you can often use the free built-in tools for serious work without needing to upgrade.
Free third-party apps
When the built-in features aren't quite enough — you need longer sessions, cloud backup, or better multilingual support — these free and freemium apps fill the gap without an upfront cost.
Google Docs voice typing
Google Docs has a voice typing feature that lets you dictate directly into a document with Google's speech recognition engine behind it. It's especially strong for long-form dictation where you want results saved to the cloud instantly.
- Cost: free (Google account required)
- Platforms: iPhone / iPad / Android / web
- Highlights: auto-saves to Google Drive, accessible from any device, automatic punctuation
- Best for: drafting documents, reports, or blog posts by voice
Because everything saves to Google Drive, you can start a document on your iPhone and continue editing it on a laptop without missing a beat. For voice-driven document creation, this is the most seamless free option.
Otter.ai (free plan)
Otter.ai is the leading name in English transcription, and its free plan is genuinely useful. It records and transcribes in real time, stores the transcript with timestamps, and lets you search across all your past recordings.
- Cost: free (300 monthly transcription minutes, 30-minute limit per conversation)
- Platforms: iPhone / Android / Web / Chrome extension
- Highlights: real-time transcription, speaker identification, keyword search across transcripts
- Languages: English primary; limited support for other languages
The free tier is enough for several short meetings a week. If your meetings are in English and you want the most polished free transcription experience, Otter.ai is the place to start.
Notta (free plan)
Notta is a transcription service with strong multilingual support — particularly good at both English and Japanese. The free plan gives you 120 minutes per month and supports importing existing audio files, which the built-in iOS features don't handle.
- Cost: free (120 minutes / month); Pro plans from $14.99 / month
- Platforms: iPhone / Android / Web / Chrome extension
- Highlights: real-time transcription, audio file import (MP3, MP4, WAV, M4A), text export (TXT, DOCX, PDF)
- Speaker diarization: available on Pro plan
120 minutes covers two or three short meetings a week comfortably. Try the free tier to gauge the accuracy for your accent and subject matter before committing to a subscription.
Speechy Lite
Speechy Lite keeps things simple: open the app, tap record, and your words appear on screen as you speak. It's a clean, focused real-time transcription app that's a good entry point if you've never used a dedicated transcription tool.
- Cost: free (with ads); paid version removes limits
- Platforms: iPhone / iPad
- Highlights: real-time transcription, save as note, Siri Shortcuts support, minimal interface
- Languages: matches your iOS keyboard language settings
There's almost no learning curve — you tap once to start and once to stop. If you want to try transcription without any account setup or configuration, Speechy Lite is the lowest-friction starting point.
Premium apps compared
If transcription is a daily part of your workflow — meetings, interviews, podcasts — the free tier limits will eventually become a friction point. These paid services remove the caps and add features that serious users depend on.
Quick comparison
| Service | Price (approx.) | English accuracy | Speaker ID | AI summary | Zoom/Meet bot |
|---|---|---|---|---|---|
| Otter.ai Pro | $16.99 / month | ◎ | ◎ | ◎ | ◎ |
| Notta Pro | $14.99 / month | ◎ | ◎ | ◎ | ◎ |
| Descript | $15 / month | ◎ | ◎ | ○ | △ |
| Rev.com | $1.50 / minute | ◎◎ (human) | ◎ | △ | ✕ |
Prices change. Check each service's website for current rates before subscribing.
Otter.ai Pro
Otter.ai Pro is the go-to choice for English-language meetings. Its Zoom and Google Meet bot joins calls automatically, transcribes in real time, and delivers a summary with action items after the call ends — all without you lifting a finger once it's set up.
- Cost: $16.99 / month (check otter.ai for current pricing)
- Transcription: unlimited minutes (meeting-length limits lifted)
- Speaker diarization: yes — names speakers based on meeting participants
- AI features: meeting summaries, action item extraction, follow-up suggestions
- Integrations: Zoom, Google Meet, Microsoft Teams (bot joins automatically)
- Languages: optimized for English; other languages limited
The shared live transcript — where everyone in the meeting can view and highlight text in real time — is a genuine productivity feature. For teams that run on English video calls, Otter.ai Pro is the easiest automation to put in place.
Or try the web app: otter.ai
Notta Pro
Notta Pro differentiates itself with strong multilingual support — it handles English, Japanese, Spanish, and over 50 other languages with consistently high accuracy. If your work involves content in more than one language, Notta's multi-language capability is a meaningful advantage over English-first competitors.
- Cost: $14.99 / month (annual plans available; check notta.ai for current pricing)
- Transcription: unlimited minutes on Pro
- Speaker diarization: up to 10 speakers
- AI features: meeting summary, action item extraction, AI chat with transcripts
- Integrations: Zoom, Google Meet, Microsoft Teams, Webex
- File import: MP3, MP4, WAV, M4A, and more
- Export: TXT, DOCX, PDF, SRT
The ability to import existing audio files and process them in bulk makes Notta especially useful for journalists or researchers who record interviews separately. For anyone working across multiple languages or needing to batch-process archived recordings, Notta Pro is the strongest all-around choice.
Web and desktop app: notta.ai
Descript
Descript is a different kind of transcription tool — it's primarily a podcast and video editor where the transcript is the editing interface. Delete a sentence from the text and the audio or video clip disappears with it. For content creators, this workflow is transformative.
- Cost: around $15 / month (check descript.com for current plans)
- Transcription: automatic with upload; strong English accuracy
- Speaker diarization: yes
- AI features: filler word removal ("um," "uh"), AI voice overdub, text-based editing
- File support: audio (MP3, WAV, M4A) and video (MP4, MOV)
- Best for: podcast production, video interviews, YouTube content
Descript doesn't have a real-time meeting transcription mode — it's designed for recorded files. If you produce audio or video content and want to edit by editing text rather than scrubbing a timeline, Descript is in a category of its own.
Web app: descript.com
Rev.com
Rev.com offers both AI-generated transcription (fast and cheap) and human-verified transcription (slower, more expensive, and significantly more accurate). The human option is particularly valuable when accuracy is non-negotiable — legal depositions, medical notes, archival research.
- Cost: AI transcription around $0.25 / minute; human transcription around $1.50 / minute (check rev.com for current rates)
- Turnaround: AI is near-instant; human transcription typically within a few hours
- Accuracy: human transcription routinely achieves 99%+ accuracy
- Speaker diarization: yes, on both tiers
- Best for: legal, medical, academic, archival use cases where errors are costly
Rev doesn't have a recurring subscription model for individual users — you pay per minute of audio, which makes it economical for occasional high-stakes work. When the transcription absolutely has to be right, human-verified Rev is the safest choice.
Web: rev.com
Japanese-language specialists
Two additional services are worth knowing about if you're working with Japanese-language audio content.
Rimo Voice is a Japanese-specialized transcription service with industry-leading accuracy for Japanese speech. It's a pay-per-use service (around ¥10 per 30 seconds) with speaker diarization and AI summary features. Because it processes audio on servers in Japan, it's a practical choice for organizations with data-sovereignty requirements. rimo.app
CLOVA Note by LINE is a free transcription app optimized for Japanese that includes speaker diarization for up to eight speakers at no cost — which is unusual at the free tier. It's primarily popular in Japan and works well for Japanese meetings. If you need to transcribe Japanese content without a paid subscription, CLOVA Note is worth a look. clovanote.line.me
Picks by use case
Different workflows call for different tools. Here's a quick-reference guide to the best option for each situation.
| Use case | Recommended |
|---|---|
| Quick voice memo or short dictation | iOS Keyboard Dictation (built-in) |
| Private meeting recording and transcription | Voice Memos (iOS 18+) |
| Real-time English meeting transcription (free) | Otter.ai free plan |
| Multilingual transcription or audio file import (free) | Notta free plan |
| English online meetings with Zoom/Meet automation | Otter.ai Pro |
| Multilingual meetings or batch file processing | Notta Pro |
| Podcast or video production | Descript |
| Legal, medical, or archival accuracy | Rev.com (human) |
| Academic lectures (real-time) | Live Captions (built-in) or Otter.ai |
| English pronunciation practice | iOS Keyboard Dictation |
| Japanese audio content | Notta Pro or Rimo Voice |
For most people starting out, Voice Memos (iOS 18+) for recorded audio and Otter.ai's free plan for live meetings is the best zero-cost combination. Add a paid plan only when the free-tier minutes become a consistent bottleneck.
5 tips to improve transcription accuracy
No matter which app you use, the quality of the recording is the single biggest factor in transcription accuracy. These five habits will make a measurable difference.
- Get the microphone close to the speaker
iPhone's microphone has a useful range of about 12 to 18 inches. In multi-person meetings, place the phone near the center of the table — not tucked away in a pocket or bag. For one-on-one interviews, put the phone between you and the other person. - Eliminate background noise
Air conditioning hum, coffee shop ambient noise, and background music all compete with speech. Even moving from a busy open-plan office to a conference room can dramatically improve results. When recording outdoors, use a windscreen or lapel mic attachment. - Use speaker diarization when available
Apps that identify individual speakers — Voice Memos (iOS 18+), Otter.ai, Notta Pro — produce transcripts that are much easier to turn into meeting minutes. Enable diarization before recording whenever it's available. - Register custom vocabulary
Industry jargon, product names, and proper nouns are the most common transcription errors. iOS's Text Replacement feature (Settings → General → Keyboard → Text Replacement) lets you register phonetic hints for words the system tends to mishear. - Use AI post-processing to clean up the raw text
Spoken language is full of filler words ("um," "like," "you know"), false starts, and incomplete sentences. Services like Otter.ai and Notta Pro offer AI summaries that extract the key points and action items automatically, turning raw transcript text into usable meeting notes.
Combining a good recording environment with post-processing AI means that even the built-in free tools can produce genuinely useful meeting notes without manual editing.
Common questions
Can I transcribe offline?
Yes — for the built-in iOS features. Keyboard Dictation in iOS 16 and later uses on-device processing and works without an internet connection. Voice Memos transcription in iOS 18+ also runs entirely on your iPhone using Apple Silicon, so it works offline and keeps your audio private.
Third-party apps like Otter.ai, Notta, and Descript require an internet connection because they use cloud-based AI models. If you're in an area with poor connectivity — on a plane, in a basement, or at an outdoor venue — stick with the built-in iOS tools.
How do I switch between English and other languages?
For iOS Keyboard Dictation, the language follows your active keyboard. Switch to a different language keyboard in Settings → General → Keyboard → Keyboards, then tap the globe icon on the keyboard to toggle between languages. The microphone icon will transcribe in whichever language the keyboard is set to.
For third-party apps, language selection is usually done inside the app before you start recording. Most apps don't support mid-recording language switching — choose the primary language of your meeting beforehand. For mixed-language meetings, Notta handles multiple languages better than most competitors.
Can it distinguish between speakers?
Yes, but not all options support it. Here's a quick breakdown:
- Voice Memos (iOS 18+): yes — labels speakers as "Speaker 1," "Speaker 2," etc.
- Otter.ai (free and Pro): yes — can even name speakers based on calendar invites
- Notta Pro: yes — up to 10 speakers
- Descript: yes — speaker labels are editable in the transcript
- Rev.com (human): yes — with speaker identification in the formatted transcript
- iOS Keyboard Dictation / Speechy Lite: no speaker separation
For any multi-person meeting where knowing who said what matters, choose a tool with speaker diarization enabled before you start recording.
Can I transcribe existing audio files (MP3/M4A)?
Yes, with a third-party app. Notta, Descript, and Rev.com all accept uploaded audio files (MP3, MP4, WAV, M4A, and more) and transcribe them. This is useful when you have a backlog of recordings from voice recorders, Zoom, or other sources.
The iOS built-in transcription doesn't support external file import — Voice Memos can only transcribe audio you've recorded in the Voice Memos app itself. If you need to transcribe an existing file, Notta's free tier (120 minutes / month) is a good starting point.
What about privacy and security?
The iOS built-in features — Keyboard Dictation (iOS 16+) and Voice Memos transcription (iOS 18+) — process everything on your device. Your audio never leaves the iPhone, making these the safest choice for confidential conversations, medical discussions, or legally sensitive content.
Third-party apps send your audio to their cloud servers for processing. Most reputable services (Otter.ai, Notta, Descript) publish data retention policies and comply with GDPR and similar frameworks. Rev.com, which also handles legal and medical transcription, has enterprise-grade data security options. Always review a service's privacy policy before recording sensitive meetings.
Summary
iPhone's built-in transcription is more capable than most people realize. iOS 18's Voice Memos can automatically transcribe and separate speakers on-device for free — no account, no subscription, no audio uploaded anywhere. That alone covers a large share of everyday transcription needs, and it's the right place to start.
When the built-in tools hit their limits, the upgrade path is clear: Otter.ai's free plan for English meetings, Notta for multilingual content or file import, Descript for podcast and video production, and Rev.com's human transcription when accuracy is non-negotiable. Most people find that Voice Memos plus Otter.ai's free tier handles everything without spending a cent. From there, move to a paid plan only when the monthly minute cap becomes a genuine bottleneck in your workflow.

