this has been dumped on me at the last minute.... i am having issue on few files where im getting files from source with BOM mark at the top of every file and i need to check for its existence and remove it.
<feff>
header
Coulmn1|column2......n
i know i can simply do sed on it like this to get rid of 1st line...
sed 1d FilewidFEFF.csv > other.txt
and it works great and removes 1st line.
but my goal is to first check if 1st line has BOM or not and then only delete 1st line. since its unicode i've NOT been able to grep it successfully....
any ideas pleas...
thanks so much for your inputs....
truly appreciate it
Exactly How the BOM is encoded in the file depends on whether it is UTF8, UTF16 or UTF32, plus whether the the Text is big endian or little endian.
The BOM is supposed to be at very beginning of the text, hence bipinajith used the ^ to indicate that. What you show as a BOM denotes UTF16 big endian. Is that in fact what you have? Because what you were given by bipiniajith should have worked. That tells me something is not right. Not all BOM's are 0xFEFF.
Bytes Encoding Form
00 00 FE FF UTF-32, big-endian
FF FE 00 00 UTF-32, little-endian
FE FF UTF-16, big-endian
FF FE UTF-16, little-endian
EF BB BF UTF-8
file xyz.csv
xyz.csv: UTF-8 Unicode text, with very long lines
i tried piconv from UTF-8 to ASCII and it does converts <feff> to ?.
then i can grep ? and delete the 1st line.
is that ideal solution?
i wanted something robust. what if file has ? mark somewhere else in the file etc ...