How to Transcribe Audio on iPhone | Built-in Speech-to-Text and the Best Transcription Apps

Whether you need to capture meeting minutes without frantic note-taking, turn an interview recording into a draft article, or follow a lecture hands-free, iPhone already has surprisingly capable transcription built in. iOS 18's Voice Memos can automatically transcribe and separate speakers on-device. Pair that with a dedicated app like Otter.ai or Notta and you get AI-generated summaries, Zoom integration, and batch conversion of existing audio files. This guide covers everything from the free built-in options to the premium services worth paying for — including picks by use case so you can go straight to what you need.

What iPhone transcription can do

When it's useful

iPhone transcription turns spoken audio into editable text — and the range of situations where that's genuinely useful is wider than most people expect.

Meeting and call notes: record first, transcribe later, then share a clean summary with the team
Interviews and journalism: convert recorded conversations into draft text for faster article writing
Lectures and seminars: capture everything when note-taking can't keep up with the speaker
Language practice: speak into Dictation and see exactly how your pronunciation is recognized
Quick idea capture: dictate a thought in 10 seconds instead of typing it out
Captions and subtitles: generate text for video or podcast content without manual typing

People who say they "can't keep up" in fast meetings or interviews often find that switching to a record-then-transcribe workflow cuts their post-meeting work by half.

Built-in vs. third-party apps

It helps to understand what iOS already handles and what dedicated apps add before you commit to anything.

Category	Strengths	Weaknesses
iOS built-in	Real-time dictation, Voice Memos auto-transcription, free, works offline (iOS 16+)	No audio file import, limited AI summary, speaker labels basic
Free third-party apps	Longer recording sessions, cloud sync, better multilingual support	Monthly free-tier caps, ads, limited export formats
Premium services	Speaker diarization, AI summaries, Zoom/Meet integration, batch file processing	$10–$20 / month subscription

For short voice memos, quick dictation, or the occasional meeting, the built-in features cover most of what you need at zero cost. If you're transcribing meetings or interviews daily, a paid service pays for itself in saved time quickly.

iOS built-in transcription (free)

iOS ships with several distinct transcription methods — each suited to a different workflow. Knowing which one to reach for saves a lot of frustration.

Keyboard Dictation (mic icon)

The microphone icon on the iPhone keyboard is the fastest way to get speech into text. It works in any text field on the system — Notes, Mail, Messages, search bars, web forms, everything.

Languages: English, Spanish, French, and dozens more
Accuracy: solid for everyday conversational speech
Internet: works offline in iOS 16 and later (on-device processing)
Punctuation: say "period," "comma," "new line," or "new paragraph" to insert punctuation

For quick idea capture or drafting a short email, Dictation is the fastest option with no setup. If you just want to turn speech into text right now, this is the fastest path.

Voice Memos transcription (iOS 18+)

iOS 18 added automatic transcription directly inside the Voice Memos app — including speaker detection — all processed on-device by Apple Silicon. This is the most significant built-in transcription upgrade in years.

Open the Voice Memos app and record (or tap an existing recording)
Tap the waveform to open the detail view
Tap the "…" menu in the top right and choose "Transcribe"
Wait 30 seconds to a few minutes depending on the recording length

Speaker diarization: yes — labels speakers as "Speaker 1," "Speaker 2," and so on
Processing: fully on-device (your audio never leaves your iPhone)
Languages: English and a growing list of others
Cost: free with iOS 18

Recording a meeting in Voice Memos and transcribing it afterward is the simplest and most private way to create meeting notes without any third-party accounts or subscriptions.

Notes app voice input

Open Notes, start a new note, and tap the mic icon on the keyboard to dictate directly into the note. The result syncs over iCloud to all your Apple devices automatically.

Saved to: iCloud (automatic backup across devices)
Features: format as checklists, add images, mix typed and dictated text
Apple Pencil: handwriting and voice input can live in the same note

This is ideal when you want to capture audio and skip the copy-paste step. Dictate your thoughts directly where they'll live, and Notes handles storage, search, and sharing from there.

Live Captions (iOS 16+)

Live Captions is a system-wide real-time captioning feature that overlays subtitles on anything playing audio — whether that's a FaceTime call, a Zoom meeting, a YouTube video, or ambient sound picked up by the microphone.

Go to Settings → Accessibility → Live Captions
Toggle Live Captions on
A caption box appears at the top of the screen whenever audio is detected
Tap the microphone icon to caption ambient sound; the speaker icon for device audio

Processing: fully on-device (no cloud required)
Languages: optimized for English; other languages vary by iOS version and region
Saving: captions are displayed only — take a screenshot to save text

Live Captions is particularly useful for English-language meetings and video calls. If you need to follow a fast-paced English conversation in real time, it's the closest iOS gets to a live interpreter.

Tips to improve accuracy

The recording environment has a bigger impact on transcription accuracy than the app you choose. A few simple habits make a noticeable difference.

Keep the microphone close to the speaker — within about 12 inches is ideal; don't leave the iPhone in the middle of a large conference table
Minimize background noise — air conditioning, coffee shop ambient noise, and music all degrade accuracy significantly
Speak at a steady pace — rushing through sentences causes more misrecognitions than any other factor
Use punctuation voice commands — "period," "comma," "question mark," and "new paragraph" work in Dictation
Add custom vocabulary — register proper nouns and industry jargon in Settings → General → Keyboard → Text Replacement

Getting these basics right means you can often use the free built-in tools for serious work without needing to upgrade.

Free third-party apps

When the built-in features aren't quite enough — you need longer sessions, cloud backup, or better multilingual support — these free and freemium apps fill the gap without an upfront cost.

Google Docs voice typing

Google Docs has a voice typing feature that lets you dictate directly into a document with Google's speech recognition engine behind it. It's especially strong for long-form dictation where you want results saved to the cloud instantly.

Cost: free (Google account required)
Platforms: iPhone / iPad / Android / web
Highlights: auto-saves to Google Drive, accessible from any device, automatic punctuation
Best for: drafting documents, reports, or blog posts by voice

Because everything saves to Google Drive, you can start a document on your iPhone and continue editing it on a laptop without missing a beat. For voice-driven document creation, this is the most seamless free option.

Otter.ai (free plan)

Otter.ai is the leading name in English transcription, and its free plan is genuinely useful. It records and transcribes in real time, stores the transcript with timestamps, and lets you search across all your past recordings.

Cost: free (300 monthly transcription minutes, 30-minute limit per conversation)
Platforms: iPhone / Android / Web / Chrome extension
Highlights: real-time transcription, speaker identification, keyword search across transcripts
Languages: English primary; limited support for other languages

The free tier is enough for several short meetings a week. If your meetings are in English and you want the most polished free transcription experience, Otter.ai is the place to start.

Notta (free plan)

Notta is a transcription service with strong multilingual support — particularly good at both English and Japanese. The free plan gives you 120 minutes per month and supports importing existing audio files, which the built-in iOS features don't handle.

Cost: free (120 minutes / month); Pro plans from $14.99 / month
Platforms: iPhone / Android / Web / Chrome extension
Highlights: real-time transcription, audio file import (MP3, MP4, WAV, M4A), text export (TXT, DOCX, PDF)
Speaker diarization: available on Pro plan

120 minutes covers two or three short meetings a week comfortably. Try the free tier to gauge the accuracy for your accent and subject matter before committing to a subscription.

Speechy Lite

Speechy Lite keeps things simple: open the app, tap record, and your words appear on screen as you speak. It's a clean, focused real-time transcription app that's a good entry point if you've never used a dedicated transcription tool.

Cost: free (with ads); paid version removes limits
Platforms: iPhone / iPad
Highlights: real-time transcription, save as note, Siri Shortcuts support, minimal interface
Languages: matches your iOS keyboard language settings

There's almost no learning curve — you tap once to start and once to stop. If you want to try transcription without any account setup or configuration, Speechy Lite is the lowest-friction starting point.

Premium apps compared

If transcription is a daily part of your workflow — meetings, interviews, podcasts — the free tier limits will eventually become a friction point. These paid services remove the caps and add features that serious users depend on.

Quick comparison

Service	Price (approx.)	English accuracy	Speaker ID	AI summary	Zoom/Meet bot
Otter.ai Pro	$16.99 / month	◎	◎	◎	◎
Notta Pro	$14.99 / month	◎	◎	◎	◎
Descript	$15 / month	◎	◎	○	△
Rev.com	$1.50 / minute	◎◎ (human)	◎	△	✕

Prices change. Check each service's website for current rates before subscribing.

Otter.ai Pro

Otter.ai Pro is the go-to choice for English-language meetings. Its Zoom and Google Meet bot joins calls automatically, transcribes in real time, and delivers a summary with action items after the call ends — all without you lifting a finger once it's set up.

Cost: $16.99 / month (check otter.ai for current pricing)
Transcription: unlimited minutes (meeting-length limits lifted)
Speaker diarization: yes — names speakers based on meeting participants
AI features: meeting summaries, action item extraction, follow-up suggestions
Integrations: Zoom, Google Meet, Microsoft Teams (bot joins automatically)
Languages: optimized for English; other languages limited

The shared live transcript — where everyone in the meeting can view and highlight text in real time — is a genuine productivity feature. For teams that run on English video calls, Otter.ai Pro is the easiest automation to put in place.

Or try the web app: otter.ai

Notta Pro

Notta Pro differentiates itself with strong multilingual support — it handles English, Japanese, Spanish, and over 50 other languages with consistently high accuracy. If your work involves content in more than one language, Notta's multi-language capability is a meaningful advantage over English-first competitors.

Cost: $14.99 / month (annual plans available; check notta.ai for current pricing)
Transcription: unlimited minutes on Pro
Speaker diarization: up to 10 speakers
AI features: meeting summary, action item extraction, AI chat with transcripts
Integrations: Zoom, Google Meet, Microsoft Teams, Webex
File import: MP3, MP4, WAV, M4A, and more
Export: TXT, DOCX, PDF, SRT

The ability to import existing audio files and process them in bulk makes Notta especially useful for journalists or researchers who record interviews separately. For anyone working across multiple languages or needing to batch-process archived recordings, Notta Pro is the strongest all-around choice.

Web and desktop app: notta.ai

Descript

Descript is a different kind of transcription tool — it's primarily a podcast and video editor where the transcript is the editing interface. Delete a sentence from the text and the audio or video clip disappears with it. For content creators, this workflow is transformative.

Cost: around $15 / month (check descript.com for current plans)
Transcription: automatic with upload; strong English accuracy
Speaker diarization: yes
AI features: filler word removal ("um," "uh"), AI voice overdub, text-based editing
File support: audio (MP3, WAV, M4A) and video (MP4, MOV)
Best for: podcast production, video interviews, YouTube content

Descript doesn't have a real-time meeting transcription mode — it's designed for recorded files. If you produce audio or video content and want to edit by editing text rather than scrubbing a timeline, Descript is in a category of its own.

Web app: descript.com

Rev.com

Rev.com offers both AI-generated transcription (fast and cheap) and human-verified transcription (slower, more expensive, and significantly more accurate). The human option is particularly valuable when accuracy is non-negotiable — legal depositions, medical notes, archival research.

Cost: AI transcription around $0.25 / minute; human transcription around $1.50 / minute (check rev.com for current rates)
Turnaround: AI is near-instant; human transcription typically within a few hours
Accuracy: human transcription routinely achieves 99%+ accuracy
Speaker diarization: yes, on both tiers
Best for: legal, medical, academic, archival use cases where errors are costly

Rev doesn't have a recurring subscription model for individual users — you pay per minute of audio, which makes it economical for occasional high-stakes work. When the transcription absolutely has to be right, human-verified Rev is the safest choice.

Web: rev.com

Japanese-language specialists

Two additional services are worth knowing about if you're working with Japanese-language audio content.

Rimo Voice is a Japanese-specialized transcription service with industry-leading accuracy for Japanese speech. It's a pay-per-use service (around ¥10 per 30 seconds) with speaker diarization and AI summary features. Because it processes audio on servers in Japan, it's a practical choice for organizations with data-sovereignty requirements. rimo.app

CLOVA Note by LINE is a free transcription app optimized for Japanese that includes speaker diarization for up to eight speakers at no cost — which is unusual at the free tier. It's primarily popular in Japan and works well for Japanese meetings. If you need to transcribe Japanese content without a paid subscription, CLOVA Note is worth a look. clovanote.line.me

Picks by use case

Different workflows call for different tools. Here's a quick-reference guide to the best option for each situation.

Use case	Recommended
Quick voice memo or short dictation	iOS Keyboard Dictation (built-in)
Private meeting recording and transcription	Voice Memos (iOS 18+)
Real-time English meeting transcription (free)	Otter.ai free plan
Multilingual transcription or audio file import (free)	Notta free plan
English online meetings with Zoom/Meet automation	Otter.ai Pro
Multilingual meetings or batch file processing	Notta Pro
Podcast or video production	Descript
Legal, medical, or archival accuracy	Rev.com (human)
Academic lectures (real-time)	Live Captions (built-in) or Otter.ai
English pronunciation practice	iOS Keyboard Dictation
Japanese audio content	Notta Pro or Rimo Voice

For most people starting out, Voice Memos (iOS 18+) for recorded audio and Otter.ai's free plan for live meetings is the best zero-cost combination. Add a paid plan only when the free-tier minutes become a consistent bottleneck.

5 tips to improve transcription accuracy

No matter which app you use, the quality of the recording is the single biggest factor in transcription accuracy. These five habits will make a measurable difference.

Get the microphone close to the speaker
iPhone's microphone has a useful range of about 12 to 18 inches. In multi-person meetings, place the phone near the center of the table — not tucked away in a pocket or bag. For one-on-one interviews, put the phone between you and the other person.
Eliminate background noise
Air conditioning hum, coffee shop ambient noise, and background music all compete with speech. Even moving from a busy open-plan office to a conference room can dramatically improve results. When recording outdoors, use a windscreen or lapel mic attachment.
Use speaker diarization when available
Apps that identify individual speakers — Voice Memos (iOS 18+), Otter.ai, Notta Pro — produce transcripts that are much easier to turn into meeting minutes. Enable diarization before recording whenever it's available.
Register custom vocabulary
Industry jargon, product names, and proper nouns are the most common transcription errors. iOS's Text Replacement feature (Settings → General → Keyboard → Text Replacement) lets you register phonetic hints for words the system tends to mishear.
Use AI post-processing to clean up the raw text
Spoken language is full of filler words ("um," "like," "you know"), false starts, and incomplete sentences. Services like Otter.ai and Notta Pro offer AI summaries that extract the key points and action items automatically, turning raw transcript text into usable meeting notes.

Combining a good recording environment with post-processing AI means that even the built-in free tools can produce genuinely useful meeting notes without manual editing.

Common questions

Can I transcribe offline?

Yes — for the built-in iOS features. Keyboard Dictation in iOS 16 and later uses on-device processing and works without an internet connection. Voice Memos transcription in iOS 18+ also runs entirely on your iPhone using Apple Silicon, so it works offline and keeps your audio private.

Third-party apps like Otter.ai, Notta, and Descript require an internet connection because they use cloud-based AI models. If you're in an area with poor connectivity — on a plane, in a basement, or at an outdoor venue — stick with the built-in iOS tools.

How do I switch between English and other languages?

For iOS Keyboard Dictation, the language follows your active keyboard. Switch to a different language keyboard in Settings → General → Keyboard → Keyboards, then tap the globe icon on the keyboard to toggle between languages. The microphone icon will transcribe in whichever language the keyboard is set to.

For third-party apps, language selection is usually done inside the app before you start recording. Most apps don't support mid-recording language switching — choose the primary language of your meeting beforehand. For mixed-language meetings, Notta handles multiple languages better than most competitors.

Can it distinguish between speakers?

Yes, but not all options support it. Here's a quick breakdown:

Voice Memos (iOS 18+): yes — labels speakers as "Speaker 1," "Speaker 2," etc.
Otter.ai (free and Pro): yes — can even name speakers based on calendar invites
Notta Pro: yes — up to 10 speakers
Descript: yes — speaker labels are editable in the transcript
Rev.com (human): yes — with speaker identification in the formatted transcript
iOS Keyboard Dictation / Speechy Lite: no speaker separation

For any multi-person meeting where knowing who said what matters, choose a tool with speaker diarization enabled before you start recording.

Can I transcribe existing audio files (MP3/M4A)?

Yes, with a third-party app. Notta, Descript, and Rev.com all accept uploaded audio files (MP3, MP4, WAV, M4A, and more) and transcribe them. This is useful when you have a backlog of recordings from voice recorders, Zoom, or other sources.

The iOS built-in transcription doesn't support external file import — Voice Memos can only transcribe audio you've recorded in the Voice Memos app itself. If you need to transcribe an existing file, Notta's free tier (120 minutes / month) is a good starting point.

What about privacy and security?

The iOS built-in features — Keyboard Dictation (iOS 16+) and Voice Memos transcription (iOS 18+) — process everything on your device. Your audio never leaves the iPhone, making these the safest choice for confidential conversations, medical discussions, or legally sensitive content.

Third-party apps send your audio to their cloud servers for processing. Most reputable services (Otter.ai, Notta, Descript) publish data retention policies and comply with GDPR and similar frameworks. Rev.com, which also handles legal and medical transcription, has enterprise-grade data security options. Always review a service's privacy policy before recording sensitive meetings.

Summary

iPhone's built-in transcription is more capable than most people realize. iOS 18's Voice Memos can automatically transcribe and separate speakers on-device for free — no account, no subscription, no audio uploaded anywhere. That alone covers a large share of everyday transcription needs, and it's the right place to start.

When the built-in tools hit their limits, the upgrade path is clear: Otter.ai's free plan for English meetings, Notta for multilingual content or file import, Descript for podcast and video production, and Rev.com's human transcription when accuracy is non-negotiable. Most people find that Voice Memos plus Otter.ai's free tier handles everything without spending a cent. From there, move to a paid plan only when the monthly minute cap becomes a genuine bottleneck in your workflow.