This page contains promotions.

How to Fix Character Encoding Issues (Mojibake) on Mac | Text, ZIP, PDF, and Email Diagnosis

Macワークスペースとアクセサリ

Ever opened a text file from a Windows colleague and seen nothing but garbled symbols? Or extracted a ZIP archive only to find the filenames completely broken? Mojibake (文字化け) is the term for garbled text caused by character encoding mismatches — and nearly every instance of it on a Mac traces back to the same root cause: UTF-8 vs. Shift_JIS. macOS has used UTF-8 as its standard encoding since the launch of OS X in 2001. Japanese Windows historically used Shift_JIS (CP932) instead of UTF-8, and while Windows 10 version 1903 added an optional UTF-8 mode, Shift_JIS files are still widespread in business settings. When the two sides use different rules to interpret the same bytes, mojibake happens. This article covers macOS Sonoma / Sequoia and provides fixes that work on both Apple Silicon (M1/M2/M3/M4) and Intel Macs, organized by symptom — from text files and ZIPs to Terminal and Office for Mac. See also: Mac Troubleshooting Guide | Fixes Organized by Symptom.

Table of Contents

  1. How Mojibake Works: Encoding Mismatches Explained
    1. macOS Defaults to UTF-8; Windows Long Used Shift_JIS
    2. Quick-Reference Table of Encoding Names
    3. Tofu (□) and Mojibake Are Different Problems
  2. Diagnose First: Identify Your Symptom
    1. Symptom Quick-Reference Table
  3. Garbled Text Files (.txt / .csv)
    1. Open with a Specific Encoding in CotEditor
    2. Visual Studio Code's "Reopen with Encoding"
    3. Using the mi Text Editor
  4. Garbled Filenames After Extracting a ZIP
    1. Auto-Detect with The Unarchiver
    2. Cross-Platform Compatibility with MacWinZipper
    3. Extract with CP932 Specified in Terminal
  5. Garbled Text When Copying from a PDF
    1. Re-Extract Text in Preview.app
    2. Re-Recognize with Adobe Acrobat Reader
  6. Fixing Garbled Email
    1. Manually Switch Encoding in Apple Mail
    2. Garbled Email in Outlook for Mac
  7. Garbled Text in Terminal / Command Line
    1. Set the LANG Environment Variable to UTF-8
    2. Check the Locale on an SSH Remote Host
    3. Encoding Settings for tmux / screen
  8. Issues Specific to Office for Mac (Excel / Word)
    1. CSV Opens Garbled in Excel for Mac
    2. Old .doc Files Open Garbled in Word
  9. Garbled Text on Specific Websites
    1. Manually Set the Text Encoding in Safari
    2. Add the "Charset" Extension to Chrome
  10. Tofu (□) Caused by Missing Fonts
    1. Install Additional Chinese or Korean Fonts
    2. Fix It with Noto Sans CJK
  11. Summary: Troubleshooting Checklist in Order

How Mojibake Works: Encoding Mismatches Explained

Mojibake occurs when the program that wrote a file and the program reading it use different encoding rules — that is, different mappings between characters and numeric values. A file saved in Shift_JIS but read as UTF-8 produces completely wrong characters because the numeric values mean entirely different things under each encoding.

macOS Defaults to UTF-8; Windows Long Used Shift_JIS

macOS has used UTF-8 across virtually the entire system since the launch of OS X in 2001. Japanese Windows, on the other hand, historically defaulted to Shift_JIS (CP932) as its system encoding. Windows 10 version 1903 introduced an optional UTF-8 system locale, but Shift_JIS files remain common in business environments today — which is why cross-platform file exchange is still a frequent source of mojibake on Mac.

macOS TextEdit and Preview.app do attempt automatic encoding detection, but the accuracy is limited. To reliably open files with a non-UTF-8 encoding, you need an app that lets you specify the encoding explicitly.

Quick-Reference Table of Encoding Names

Encoding NameAliases / IdentifiersPrimary Use
UTF-8UTF8Standard for macOS and the web. Covers all Unicode characters.
Shift_JISSJIS / CP932 / Windows-31JTraditional default for Japanese Windows environments.
EUC-JPEUC_JPJapanese encoding used on Linux and legacy UNIX systems.
ISO-2022-JPJIS / JIS7Japanese email encoding per RFC standards.
CP932Windows-31JMicrosoft's extension of Shift_JIS. Widely used for ZIP filenames.

CP932 is technically a Microsoft extension of Shift_JIS and is the Windows standard. ZIP filenames created on Windows almost always use CP932.

Tofu (□) and Mojibake Are Different Problems

When you see "□" on screen — sometimes called tofu (square boxes for missing fonts) — that is not an encoding problem. It means the encoding was interpreted correctly, but the font in use doesn't contain a glyph for that character. This is especially common with Traditional Chinese, Simplified Chinese, Korean, rare kanji, emoji, and special symbols. The fix is installing the missing font, not converting encodings. See the "Tofu (□) Caused by Missing Fonts" section later in this article.

Diagnose First: Identify Your Symptom

When mojibake occurs, the first step is to identify which app and which file type are involved. The cause and fix differ significantly by scenario — use the quick-reference table below to find your situation, then jump to the relevant section.

Symptom Quick-Reference Table

SymptomLikely CauseGo to Section
Opened a .txt or .csv and it's full of symbolsA Shift_JIS or EUC-JP file opened as UTF-8Garbled Text Files
Extracted a ZIP and filenames are brokenmacOS Archive Utility mishandled CP932 filenamesGarbled ZIP Filenames
Copied text from a PDF and it's garbledFont embedding or encoding issue in the PDFGarbled PDF Text
Email body or subject is garbledAuto-detection failed for ISO-2022-JP or Shift_JISFixing Garbled Email
Japanese text is broken in TerminalLANG / LC_ALL environment variables not setGarbled Terminal Text
CSV opened in Excel comes out garbledExcel for Mac assumes Shift_JIS when reading CSVsOffice for Mac Issues
Only certain websites show garbled textIncorrect or missing <meta charset> in the page HTMLGarbled Browser Text
Squares (□) appear on screenMissing font — not an encoding issueTofu (□) Font Fix

Garbled Text Files (.txt / .csv)

macOS TextEdit has weak auto-detection for Shift_JIS and is one of the most common sources of mojibake on Mac. To reliably handle Shift_JIS or EUC-JP files, switch to an editor that lets you specify the encoding explicitly.

Open with a Specific Encoding in CotEditor

CotEditor is a free, Japanese-friendly text editor available on the Mac App Store. It lets you open files with an explicitly chosen encoding and save them in a different one.

  1. Search for "CotEditor" in the App Store and install it.
  2. Drag and drop the garbled file onto CotEditor to open it.
  3. The current encoding is shown in the status bar at the bottom of the window.
  4. Click the encoding name → select "Japanese (Shift JIS)" or "Japanese (EUC-JP)" from the list.
  5. Choose "Reopen with Encoding" and the text should display correctly.
  6. To save as UTF-8, go to File → Save As and set the encoding to "Unicode (UTF-8)".

For CSV files, convert to UTF-8 first, then open in Numbers or Excel — the mojibake will be gone.

Visual Studio Code's "Reopen with Encoding"

Visual Studio Code (free) has excellent encoding conversion support. You don't need to be a developer to use it for this purpose.

  1. Open the file in VS Code (garbled state is fine).
  2. Click the encoding name shown in the bottom-right status bar (e.g., UTF-8).
  3. A command palette prompt appears — choose "Reopen with Encoding".
  4. Select "Japanese (Shift JIS)" or "Japanese (EUC-JP)" and the text should display correctly.
  5. To save in UTF-8, click the same status bar item → choose "Save with Encoding" → "UTF-8".

Using the mi Text Editor

mi is a long-standing Japanese text editor for Mac. It supports explicitly choosing an encoding via File → Open with Encoding, and can easily convert between Shift_JIS, EUC-JP, and UTF-8. It's a solid alternative if you prefer not to use CotEditor or VS Code.

Garbled Filenames After Extracting a ZIP

macOS's built-in Archive Utility (the default double-click ZIP extractor) has a known limitation: it cannot correctly handle filenames encoded in CP932 (Shift_JIS). ZIPs created on Windows will almost certainly produce garbled filenames when extracted with Archive Utility.

Auto-Detect with The Unarchiver

The Unarchiver is a free archive app available on the Mac App Store. It auto-detects CP932 and many other encodings, extracting filenames correctly.

  1. Search for "The Unarchiver" in the App Store and install it.
  2. Right-click the ZIP file → "Open With" → "The Unarchiver".
  3. Choose a destination folder and extract.
  4. To make The Unarchiver the default for all ZIPs, right-click any ZIP → "Get Info" → "Open With" → select "The Unarchiver" → click "Change All".

Cross-Platform Compatibility with MacWinZipper

If you need to create ZIPs that open without mojibake on both Mac and Windows, MacWinZipper is the right tool. It also handles the reverse problem — ZIPs you create on Mac showing garbled filenames on Windows. If you regularly send files to Windows users, making MacWinZipper your default ZIP creator will save a lot of headaches.

Extract with CP932 Specified in Terminal

If you're comfortable with Terminal, the unzip command accepts a -O cp932 option to correctly extract ZIPs with CP932 filenames.

  1. Open Terminal (Applications → Utilities → Terminal).
  2. Run the following command (replace /path/to/archive.zip with your actual file path):

unzip -O cp932 /path/to/archive.zip -d ~/Desktop/output/

Note: the -O option is not supported in all versions of the macOS system unzip. If it doesn't work, install a newer version via Homebrew or simply use The Unarchiver — it's easier.

Garbled Text When Copying from a PDF

PDF mojibake stems from how fonts are embedded and whether they map correctly to Unicode. Older PDFs and PDFs from print shops often have a mismatch between the visible glyph and the character code that gets copied to the clipboard.

Re-Extract Text in Preview.app

Preview.app in macOS Sonoma / Sequoia includes a machine-learning-based OCR (Live Text) feature. When a font-embedding issue causes copy-paste mojibake, Live Text can sometimes recover the correct text.

  1. Open the PDF in Preview.app.
  2. On the garbled page, select the text cursor tool (the cursor icon in the toolbar).
  3. Select the text you need and copy it.
  4. If it's still garbled, try Tools → Select Text and let Live Text recognize the content as machine-readable text.

For scanned PDFs with no underlying text layer, OCR is required. If Preview.app's Live Text doesn't produce good results, try Adobe Acrobat Reader.

Re-Recognize with Adobe Acrobat Reader

Adobe Acrobat Reader (free) can re-run text recognition via Tools → Edit Text. Its OCR accuracy is often higher than Preview.app's Live Text.

  1. Install Adobe Acrobat Reader and open the PDF.
  2. Click Tools → Edit Text & Images in the menu bar.
  3. If Acrobat prompts you to run OCR, allow it.
  4. After re-recognition, select and copy the text — the mojibake may be resolved.

If mojibake persists, the fundamental fix is to ask whoever created the PDF to re-embed the fonts with proper Unicode mapping.

Fixing Garbled Email

Japanese email mojibake typically occurs when a message sent in ISO-2022-JP (JIS code) or Shift_JIS is misidentified by the receiving mail client. Some older mail clients (particularly legacy Outlook versions) also send messages without a proper encoding declaration.

Manually Switch Encoding in Apple Mail

Apple Mail (Mail.app) handles most cases automatically, but you can manually override the encoding when auto-detection fails.

  1. Select and open the garbled message.
  2. In the menu bar, go to Message → Text Encoding.
  3. Try "Japanese (ISO-2022-JP)", "Japanese (Shift JIS)", and "Japanese (EUC)" in turn.
  4. Select the one that displays the message correctly.

Gmail in a web browser has better auto-detection than Mail.app and is less prone to this problem. If Mail.app can't decode the message, checking the same email in Gmail's browser interface is worth trying.

Garbled Email in Outlook for Mac

Outlook for Mac has had Japanese encoding issues in certain versions. If received messages are garbled, try these steps:

  • Check for Outlook updates and install the latest version (Help → Check for Updates).
  • If only messages from a specific sender are garbled, ask the sender to resend with the encoding explicitly set to UTF-8.
  • Having the sender resend from Windows Outlook sometimes resolves the issue.
  • As a last resort, open the same account in Gmail's browser interface to view the message.

Garbled Text in Terminal / Command Line

When Japanese characters appear as "?" or "□" in Terminal (Terminal.app or iTerm2), or character counts seem off, the cause is almost always a locale (language and encoding) configuration problem.

Set the LANG Environment Variable to UTF-8

In macOS's default shell (zsh), Japanese display and input can break if the environment variables LANG and LC_ALL are not set to UTF-8.

  1. Open Terminal and check the current locale:

locale

  1. If LANG=ja_JP.UTF-8 is not shown, edit ~/.zshrc and add the following lines:

export LANG=ja_JP.UTF-8
export LC_ALL=ja_JP.UTF-8

  1. Restart Terminal or run source ~/.zshrc to apply the changes.

Also check Terminal.app's own encoding setting: go to Settings → Profiles → Advanced → Character Encoding and make sure it is set to "Unicode (UTF-8)".

Check the Locale on an SSH Remote Host

If Japanese characters are garbled on a Linux server you've connected to via SSH, the problem may be on the remote host's locale settings.

  1. After connecting via SSH, run the locale command.
  2. If LANG=ja_JP.UTF-8 is not set, ask the server administrator to configure it, or run export LANG=ja_JP.UTF-8 for the current session.
  3. If your Mac is forwarding its locale to the server, check whether ~/.ssh/config contains SendEnv LANG LC_* and whether the server's /etc/ssh/sshd_config has a matching AcceptEnv directive.

Encoding Settings for tmux / screen

tmux and screen perform their own encoding processing, which can cause Japanese mojibake independent of the shell locale.

  • tmux: Add set -g utf8 on to ~/.tmux.conf (for older versions) or launch with the tmux -u flag. Recent versions of tmux default to UTF-8.
  • GNU screen: Add defutf8 on to ~/.screenrc, or launch with screen -U.

Issues Specific to Office for Mac (Excel / Word)

Microsoft Office for Mac is designed with Windows compatibility in mind, but its encoding behavior differs from Windows in a few important ways.

CSV Opens Garbled in Excel for Mac

Excel for Mac often assumes Shift_JIS (CP932) when opening CSV files, so UTF-8 CSVs opened by double-clicking will frequently display garbled Japanese. Conversely, opening a Shift_JIS CSV received from Windows in a plain text editor and saving it as UTF-8 without BOM before opening in Excel can also cause issues.

The reliable approach:

  1. Open the CSV in CotEditor or VS Code and convert from Shift_JIS to UTF-8 with BOM, then save.
  2. Open the converted CSV in Excel for Mac — it should display correctly.

Alternatively, Apple Numbers has better auto-detection than Excel and is less likely to garble the file. You can open, review, and edit in Numbers, then export as CSV for downstream use.

If you receive CSVs exported as UTF-8 from a web service or tool, use Excel's Import feature (Data → From Text/CSV) and manually specify the delimiter and encoding rather than double-clicking the file.

Old .doc Files Open Garbled in Word

When a Word 97–2003 format (.doc) file opens garbled in Word for Mac, the encoding metadata embedded in the file is likely not being handled correctly by the current version of Word for Mac.

  • Try opening the .doc file in LibreOffice (free). LibreOffice sometimes shows an encoding dialog — select "Japanese (Shift JIS)" to open it correctly.
  • If LibreOffice opens it successfully, use File → Save As → .docx to re-save it, then reopen in Word for Mac.
  • Uploading the .doc to Google Docs and viewing it there is another viable workaround.

Garbled Text on Specific Websites

Modern web browsers auto-detect encoding reliably, so browser mojibake is rare. When it does occur, the website's HTML is usually missing or has an incorrect <meta charset> declaration, causing the browser to guess the wrong encoding.

Manually Set the Text Encoding in Safari

Safari on macOS retains a hidden Text Encoding menu.

  1. Check Safari's menu bar and open the View menu.
  2. If "Text Encoding" is not listed, enable the hidden Developer menu by running the following in Terminal:

defaults write com.apple.Safari IncludeInternalDebugMenu 1

  1. Restart Safari. A Develop → Text Encoding menu will appear — try "Japanese (Shift JIS)" or another encoding as needed.

Add the "Charset" Extension to Chrome

Google Chrome has no built-in encoding switcher, but the "Charset" extension adds one.

  1. Search for "Charset" in the Chrome Web Store and install it.
  2. With the garbled page open, click the extension icon.
  3. Select "Shift_JIS" or "EUC-JP" from the list — the page will reload with the new encoding applied.

Firefox includes a built-in View → Text Encoding menu with no extensions required. If you regularly encounter garbled websites, Firefox is the most convenient browser for this purpose.

Tofu (□) Caused by Missing Fonts

Seeing "□" means the encoding is being interpreted correctly, but the current font has no glyph for that code point. Most commonly this affects specific characters within an otherwise readable Japanese text — the surrounding text is fine, but certain characters show as squares.

Install Additional Chinese or Korean Fonts

macOS ships with Hiragino Sans and Hiragino Mincho, which cover everyday Japanese kanji. However, Traditional Chinese, Simplified Chinese, and Korean Hangul characters will show as □ because those scripts are not bundled by default.

Fix: add the relevant language, which installs the corresponding fonts automatically.

  1. Open System Settings → General → Language & Region.
  2. Click the "+" button under "Preferred Languages".
  3. Add "Chinese (Simplified)", "Chinese (Traditional)", or "Korean" as needed.
  4. Restart if prompted.
  5. The fonts for the added language will be installed automatically, and the □ characters should disappear.

If you prefer not to change your language settings, you can install fonts individually through Font Book.app.

Fix It with Noto Sans CJK

Noto Sans CJK (by Google, open source) covers Japanese, Chinese, and Korean in a single font family, encompassing virtually all CJK Unified Ideographs in Unicode. Installing it is a comprehensive fix for characters that appear as □.

  1. Download Noto Sans CJK from Google Fonts (fonts.google.com) or from GitHub (notofonts/noto-cjk).
  2. Double-click the downloaded font file (.otf or .ttc).
  3. Click "Install Font" in the preview window.
  4. Restart any apps that were showing □ — they should now display the characters correctly.

If □ appears only in a specific app, also check that app's font settings. In Terminal and VS Code, for example, changing the configured font to Noto Sans CJK JP Regular often resolves Japanese display issues entirely.

Summary: Troubleshooting Checklist in Order

Use this checklist alongside the symptom-specific sections above.

  1. Determine whether you're seeing tofu (□) or mojibake (symbols) — tofu is a font problem; symbols are an encoding problem.
  2. Text or CSV mojibake → open in CotEditor or VS Code and specify the correct encoding (Shift_JIS or EUC-JP).
  3. Garbled ZIP filenames → install The Unarchiver and use it to extract.
  4. Garbled PDF copy-paste → try Live Text in Preview.app or re-run OCR in Adobe Acrobat Reader.
  5. Garbled email → use Mail.app's Message → Text Encoding menu to switch manually.
  6. Broken Japanese in Terminal → add export LANG=ja_JP.UTF-8 to ~/.zshrc.
  7. Excel / Word mojibake → convert to UTF-8 with BOM in CotEditor, or open with LibreOffice.
  8. Garbled website → use Safari's Developer → Text Encoding menu or Chrome's Charset extension.
  9. Tofu (□) → add the relevant language in System Settings → Language & Region, or install Noto Sans CJK.

Most cases are resolved by CotEditor, The Unarchiver, or VS Code. For a broader look at Mac issues, see Mac Troubleshooting Guide | Fixes Organized by Symptom.