Analyzing and manipulating hidden characters in text files

When we encounter strange, unexplainable problems with text files, hidden characters may be reason. This article describes several possibilities to tackle line-ending and whitespace problems.

Correcting mixed line endings

If a file has mixed line endings, the standard tool flip may help you:

echo -e "unix\nmicro\r\n" > test.txt
file test.txt
#result: test.txt: ASCII text, with CRLF, LF line terminators

A check with file reveals that the file test.txt has mixed line endings. Flip unifies the line endings to Unix (-u) or Windows (-m) standard:

flip -u test.txt
file test.txt # result: test.txt: ASCII text

flip -m test.txt
file test.txt # result: test.txt: ASCII text, with CRLF line terminators

Examining files

vim can show whitespace characters, if you enable the option list. In command mode, execute the following to show whitespaces like tabs or line endings. Unfortunately, the editor does not differentiate between different types of line endings.

:set list

Use :set nolist to return to normal view. With :set ff the program identifies the line ending standard.

If you need to get a detailed picture of the whitespace characters in your document, the octal file viewer od may be helpful, it displays the file as octal values and (interpreted) ASCII characters:

echo -e "item1\titem2\titem3\r\nline2 (unix)\n" > test.txt
od -c test.txt

The results looks as follows:

000000  69  74  65  6d  31  09  69  74  65  6d  32  09  69  74  65  6d
         i   t   e   m   1  \t   i   t   e   m   2  \t   i   t   e   m
000010  33  0d  0a  6c  69  6e  65  32  20  28  75  6e  69  78  29  0a
         3  \r  \n   l   i   n   e   2       (   u   n   i   x   )  \n
000020  0a
        \n
000021

Using cat -v text.txt, you can see bogus (non-Unix) line endings  being marked with a special symbol: ^M

item1   item2   item3^M
line2 (unix)

Leave a Reply