CSV basicsMay 19, 2026 · 5 min read

What Is a BOM in a CSV File (and How to Handle It)?

A plain-English guide to the byte order mark (BOM) in CSV files: what the csv bom is, why UTF-8 files start with one, and how to add or remove it.


If you have ever opened a CSV and seen a strange  glued to the front of your first column name, or watched a lookup quietly fail because the first header did not match anything, you have met the byte order mark. It is tiny, invisible when handled correctly, and the source of a surprising amount of confusion. Here is what it is and how to deal with it.

What a byte order mark is

A text file is just a sequence of bytes, and an encoding is the rulebook that turns those bytes back into characters. A byte order mark (BOM) is an optional set of bytes at the very start of the file that announces which encoding is in use.

The name comes from UTF-16, where the mark also tells the reader the byte order — little-endian (LE) or big-endian (BE) — so it knows how to pair up the two-byte characters. In UTF-8 there is no byte order to signal, so the mark is purely a flag that says "this file is UTF-8."

Each encoding has its own BOM bytes:

  • UTF-8EF BB BF
  • UTF-16 LEFF FE
  • UTF-16 BEFE FF

When a reader understands the BOM, it consumes those bytes and never shows them. The mark does its job silently and you never know it was there.

Why some UTF-8 files start with one

UTF-8 does not need a BOM. It is unambiguous on its own, and most modern tools write UTF-8 with no mark at all. So why do some files have one?

The usual culprit is software that wants to remove guesswork on the reading end. On Windows in particular, several programs — including some versions of Excel — will treat a plain text file as a legacy encoding like Windows-1252 unless something tells them otherwise. Writing a UTF-8 BOM is that something: it is a clear signal that says "decode me as UTF-8," which keeps accented names and other non-ASCII characters from turning into mojibake. So when you save a CSV "as UTF-8" from certain tools, you may quietly get a BOM along with it.

The symptom: when a BOM goes wrong

A BOM only causes trouble when the program reading the file does not know to strip it. Then the BOM bytes get decoded as if they were ordinary text, and two things tend to happen:

  • You see  before the first header. Those three characters are exactly what the UTF-8 BOM bytes (EF BB BF) look like when decoded as Windows-1252. Your data is fine — the reader simply printed the mark instead of removing it.
  • The first column stops matching. This one is sneakier because nothing looks wrong. The BOM gets attached to the first header, so a column that reads id on screen is actually id underneath. A lookup, join, or filter that searches for id finds nothing, and you spend an hour wondering why only the first column is broken.

Both are the same problem: the BOM was written, but the tool reading the file did not handle it.

When you want a BOM — and when you do not

There is no universal right answer; it depends entirely on what will open the file next.

Keep the BOM when a downstream tool needs the hint. Some Excel and Windows workflows open a BOM-less UTF-8 file as a legacy encoding and garble the special characters, but open the same file with a UTF-8 BOM correctly. If that is your situation, the BOM is doing useful work and you should keep it.

Drop the BOM when the file feeds a database import, a script, a programming language's CSV parser, or a web service. Many of these readers do not strip a leading BOM, so it leaks into your first field and causes the silent matching failures above. For most automated pipelines, no BOM is the safer default.

When you are not sure, leave the BOM off and only add it if a specific tool turns out to need it.

How to add or remove a BOM on export

The clean way to handle this is to control the BOM explicitly on read and on write, rather than letting whatever tool you used last decide for you.

CEESVEE handles the BOM correctly in both directions. On open, it auto-detects the encoding (UTF-8, UTF-16 LE/BE, Windows-1252) and strips the BOM so it never shows up as  data or gets stuck to your first header. The grid shows your real column names, and find, sort, and find-and-replace all work against the actual text — no invisible character sabotaging the first column.

On export, the choice is yours:

  1. Open your CSV in CEESVEE. If the file had a BOM, it is already stripped, and the encoding is detected for you.
  2. Choose Save or Save As. Along with the delimiter, quoting style, and line endings (LF or CRLF), there is an explicit option for whether to write a BOM.
  3. Turn the BOM on if you are handing the file to an Excel or Windows workflow that expects one, or leave it off for databases, scripts, and parsers that do not want it.

Because the BOM is a deliberate setting and not an accident of which program saved the file last, you can produce exactly the file each destination needs.

The bottom line

A byte order mark is a few bytes at the start of a file that flag its encoding. Done right, it is invisible. Done wrong, it shows up as  before your first header or silently breaks a match on the first column. The fix is to read it correctly — strip it on open — and to decide deliberately whether to write one on export.

Download CEESVEE for free — it strips the BOM on read so it never pollutes your data, and lets you choose whether to write one when you save.

Frequently asked questions

What is a BOM in a CSV file?

A byte order mark (BOM) is a few invisible bytes at the very start of a text file that signal its encoding. In a UTF-8 file the BOM is the byte sequence EF BB BF. Handled correctly it stays invisible; handled wrong it shows up as  before your first column header.

Should my CSV have a BOM?

It depends on the tool reading it. Some Excel and Windows workflows want a UTF-8 BOM so they open the file as UTF-8 instead of a legacy encoding. Many other tools prefer no BOM. When in doubt, leave it off and only add it if a specific tool needs it.

Why does  appear before my first header?

That is a UTF-8 BOM being read by a tool that does not recognize it, so the three BOM bytes are decoded as the visible characters . The data is fine; the reader just is not stripping the mark.

How do I remove a BOM from a CSV?

Open the file in CEESVEE, which strips the BOM on read so it never appears as data, then use Save As and turn off the write-BOM option. The exported file will have no byte order mark.

Keep reading

All guides