Working with UTF-8-encoded PHP files in web applications, a common, hard-to-track-down error is the following: “Headers already sent” or “Cannot modify header information“. This usually happens during a call to the function header(), which manipulates the HTTP header.
One reason for this is that the UTF-8 file starts with an invisible(!) byte order mark (BOM) consisting of the three bytes 0xEF,0xBB,0xBF
. The BOM can be removed by opening the file in a suitbale text editor and unticking the Add Byte Order Mark (BOM) .option (or similar).
A more convenient way using sed is the following:
sed -i '1 s/^\xef\xbb\xbf//' utf8_file.txt
(-i enables in-place operation of sed; 1 denotes that one replacement should happen; ^ denotes the start of a line)
Example
Let’s consider a file consisting of two lines (‘A’, ‘B’) stored with the BOM:
<BOM>A B
Investigating this file with the hex tool od, :
$ od -t c -t x1 testfile.txt
we obtain the following output:
0000000 357 273 277 A \n B \n ef bb bf 41 0a 42 0a 0000007
The three BOM bytes are clearly visible.
After running
sed -i '1 s/^\xef\xbb\xbf//' testfile.txt
The output looks as follows, proving that the BOM is gone:
0000000 A \n B \n 41 0a 42 0a 0000004
References
- Original source: stackoverflow.com