How-toJune 1, 2026 · 5 min read

How to Fix Garbled Characters (Mojibake) in a CSV

Seeing é, ’, , or � boxes in your CSV? That's mojibake — a decoding mismatch. Here's how to spot it and fix garbled CSV characters fast.


You open a CSV and the names look wrong. José shows up as José, a curly apostrophe became ’, and there's a stray  glued to the very first cell. Some files are even worse — every accented character is a black box. This garbled text has a name: mojibake. The reassuring part is that your data is almost always intact. The bytes are fine; they're just being decoded with the wrong rulebook.

This post is about recognizing the symptoms and fixing them quickly. If you want the deeper background on how encodings work, the companion guide on CSV encoding problems covers that side.

What mojibake actually is

Every text file is just a sequence of bytes. An encoding is the rulebook that turns those bytes back into letters. When a file is saved with one rulebook and opened with another, the letters come out scrambled.

The most common case: a file is written in UTF-8 (where accented and non-Latin characters take two or more bytes) but opened as Windows-1252 (where every byte is treated as a single character). UTF-8's multi-byte sequences get split apart, and each piece is shown as its own stray symbol. That's why one accented letter turns into two or three garbage characters.

So mojibake is never a corrupted-file problem. It's a mismatch between how the file was saved and how it's being read.

A quick reference for common garbled characters

Most mojibake comes from the same handful of UTF-8-read-as-Windows-1252 mismatches. If you spot any of these, you've almost certainly found the cause.

You seeIt should beWhat it is
éée-acute
èèe-grave
ññn-tilde
üüu-umlaut
à àa-grave
’curly apostrophe
“ â€" "curly quotation marks
–en dash
…ellipsis
(nothing)a UTF-8 byte-order mark shown as text
(any character)a byte the current encoding can't decode

A useful tell: if you see a capital à or the letters †scattered through your text, the file is UTF-8 being read as a single-byte encoding. If you see boxes, the reader has hit bytes it genuinely can't interpret.

The special case of 

That  at the start of the first cell is a byte-order mark (BOM) — a few invisible bytes that some programs add to flag a file as UTF-8. Handled correctly, you never see it. Handled wrong, it gets pulled into your first header and breaks column matching. There's a full explainer in what a BOM is and how to handle it.

When characters are truly gone

One important exception: if a file was previously saved with the wrong encoding, the original characters may have been replaced by ? or at write time. No re-opening can bring those back, because the information is no longer in the file. This is why fixing mojibake early — before re-saving — matters. Catch it on read and you lose nothing.

How to fix it

The fix is always the same idea: read the file with the encoding it was actually written in, then save a clean copy. The only hard part is figuring out which encoding that is, and that's mostly trial and error by eye.

CEESVEE is built to take the guesswork out of it. When you open a file it auto-detects the encoding (UTF-8, UTF-16 LE/BE, Windows-1252) along with the delimiter, and it handles the BOM correctly so  never leaks into your data. If the detection is ever wrong, there's an encoding override, and changing it re-decodes the whole grid instantly — so the moment you pick the right one, José snaps back to José in front of you.

Step by step

  1. Open CEESVEE and open the file. In most cases the auto-detected encoding is already right and the text looks correct.
  2. If you still see mojibake, change the Encoding dropdown. Try UTF-8 first, then Windows-1252, then UTF-16 LE/BE. The grid re-decodes live, so the correct choice is obvious — the garbled characters resolve into real letters.
  3. Lock in the fix with Save As → choose UTF-8 as the export encoding. You can also leave the BOM off here so the next tool doesn't show . Now every other program will read the file correctly.

That's the whole loop: open with the right encoding, confirm the text looks right, export as UTF-8.

Why a dedicated viewer helps

A spreadsheet app will often auto-open a CSV with whatever encoding it guesses, give you no easy way to change that guess, and then re-mangle the file when you save. You end up fighting the tool at both ends. A viewer that detects encoding on open, lets you flip it live, and exports explicitly removes the guesswork from the whole round-trip — and because the grid re-decodes instantly, you can see the right answer instead of saving and reopening to check.

It's also worth ruling out a related look-alike: if your columns are split in odd places rather than the characters being garbled, that's a delimiter problem, not encoding. Files that use semicolons instead of commas are a common cause — see opening semicolon-delimited CSVs.

The bottom line

Garbled CSV characters look alarming, but they're a decoding mismatch, not lost data. Match the file to its real encoding and the text snaps back; save it as UTF-8 and it stays fixed for every tool downstream. The only thing to avoid is re-saving while it's still garbled, which can make the loss permanent.

Download CEESVEE for free — it's open source, fully local, and detects the encoding for you, so most of the time you'll never see mojibake at all.

Frequently asked questions

What is mojibake in a CSV file?

Mojibake is garbled text like é or ’ that appears when a file is decoded with the wrong character encoding — for example, a UTF-8 file read as Windows-1252. The underlying bytes are usually fine; only the display is wrong.

Why does my CSV show é instead of é?

Because a UTF-8 file is being read as Windows-1252. The é is stored as two bytes in UTF-8, and reading them one at a time produces à followed by ©. Reopening the file as UTF-8 restores the correct character.

What are the � boxes or question marks in my CSV?

A � (replacement character) marks a byte the current encoding can't interpret. It usually means you're reading the file with the wrong encoding, or characters were already lost in an earlier bad save.

How do I get rid of garbled characters permanently?

Open the file with the correct source encoding so the text looks right, then use Save As and export as UTF-8. Every tool that opens the new file afterward will read it correctly.

Keep reading

All guides