Problem with output awk and sed

I have file, i am extracting email address from file. but problem is that output is very ugly.

I am using this command

While original filename have no such character. Please suggest.

show us your input and expected output.
awk usage is awk '{...code...}' file

I have file format like this


learnk@gmail.com, , http://www.mymail.com/files/js/js_-jenMcWHoY-_YofME9QdfIdN78Hvtfo2npip2cxdObU.js


expected output is

learnk@gmail.com

I don't think this ^@r^@e^@g^@i^@s^@t^@r^@a^@r^@@^@a^@j^@a^@.^@e^@d^@u^@.^@q^@a^@,^@ is coming from nowhere, it must be in the original file, where did the file come from? If it's from Windows it may be in some odd Unicode character set.


One thing when i am using below command, output is coming fine
awk {'print $1'} filename  | tr -d ,

But when i am using this command 

awk {'print $1'} filename  | tr -d , > 1.txt

and opening a file in vi, i am able to see this @@@


I don't understand where the problem is

Assuming first line in file contains email address

$ awk -F, '{print $1;exit}' file

I ran the command and see the output, i ran command in the terminal


# awk -F, '{print $1;exit}' output-save-2014-01-24-1.txt 
klaus.har33@abcd.com

Where did you get the file? I don't think awk is 'inventing' the garbage characters, I think they are in the original file.

Original file have no such character. when i am opening a file with vi, everything is ok, but after awk operation i am getting problem, Please advise me is it some problem with my environment variable or what, i am totally confuse.

it would take an incredibly sick, faulty version of awk, or bad memory errors, for awk to create all those garbage characters out of nowhere.

hexdump -C inputfile

It is showing file like, how to get my file in original condition.

00027f80  77 00 77 00 77 00 2e 00  71 00 61 00 74 00 61 00  |w.w.w...r.a.t.p.|

---------- Post updated at 12:13 PM ---------- Previous update was at 12:11 PM ----------

I applied awk command on another server output is same. I am using Fedora 17 at my desktop, is it a problem of my desktop?

^@ is a null. The original file is 16 bits per character. When viewed on an 8 bits per character system the nulls do not show up. I had a file like that once. My file had been created on Windows.

Your file is not ASCII at all but 16-bit Unicode, and as I suspected (but you refused to say) must have come from a Windows system. I guess your version of vi either detected and converted it, or stripped out all the nulls before displaying...

Use iconv to convert it into something your UNIX utilities can understand.

iconv -f UTF16LE -t UTF8 < inputfile | awk ...
1 Like

Oh i am sorry, it is generated by some windows software, between how can i verify that file is having 16-bit unicode character? How to check this format in linux?

hexdump is a good way... Look at the output you posted, it shows two bytes to each letter.

I am really sorry i don't understand how i can verify two characters. Can you please explain me some detail, i am so thankful to you.

Actually, I think it's "iconv" rather than "iconf".

1 Like

From your output:

00027f80  77 00 77 00 77 00 2e 00  71 00 61 00 74 00 61 00  |w.w.w...r.a.t.p.|

Normal text:

hexdump -C /etc/gentoo-release
00000000  47 65 6e 74 6f 6f 20 42  61 73 65 20 53 79 73 74  |Gentoo Base Syst|
1 Like

Sorry about that, it's one of those typos my brain refuses to catch.

Wonderfull explanation, I totally understand the problem now. So much thanks..........