I have file, i am extracting email address from file. but problem is that output is very ugly.
I am using this command
While original filename have no such character. Please suggest.
I have file, i am extracting email address from file. but problem is that output is very ugly.
I am using this command
While original filename have no such character. Please suggest.
show us your input and expected output.
awk
usage is awk '{...code...}' file
I have file format like this
learnk@gmail.com, , http://www.mymail.com/files/js/js_-jenMcWHoY-_YofME9QdfIdN78Hvtfo2npip2cxdObU.js
expected output is
learnk@gmail.com
I don't think this ^@r^@e^@g^@i^@s^@t^@r^@a^@r^@@^@a^@j^@a^@.^@e^@d^@u^@.^@q^@a^@,^@ is coming from nowhere, it must be in the original file, where did the file come from? If it's from Windows it may be in some odd Unicode character set.
One thing when i am using below command, output is coming fine
awk {'print $1'} filename | tr -d ,
But when i am using this command
awk {'print $1'} filename | tr -d , > 1.txt
and opening a file in vi, i am able to see this @@@
I don't understand where the problem is
Assuming first line in file contains email address
$ awk -F, '{print $1;exit}' file
I ran the command and see the output, i ran command in the terminal
# awk -F, '{print $1;exit}' output-save-2014-01-24-1.txt
klaus.har33@abcd.com
Where did you get the file? I don't think awk is 'inventing' the garbage characters, I think they are in the original file.
Original file have no such character. when i am opening a file with vi, everything is ok, but after awk operation i am getting problem, Please advise me is it some problem with my environment variable or what, i am totally confuse.
it would take an incredibly sick, faulty version of awk, or bad memory errors, for awk to create all those garbage characters out of nowhere.
hexdump -C inputfile
It is showing file like, how to get my file in original condition.
00027f80 77 00 77 00 77 00 2e 00 71 00 61 00 74 00 61 00 |w.w.w...r.a.t.p.|
---------- Post updated at 12:13 PM ---------- Previous update was at 12:11 PM ----------
I applied awk command on another server output is same. I am using Fedora 17 at my desktop, is it a problem of my desktop?
^@ is a null. The original file is 16 bits per character. When viewed on an 8 bits per character system the nulls do not show up. I had a file like that once. My file had been created on Windows.
Your file is not ASCII at all but 16-bit Unicode, and as I suspected (but you refused to say) must have come from a Windows system. I guess your version of vi either detected and converted it, or stripped out all the nulls before displaying...
Use iconv to convert it into something your UNIX utilities can understand.
iconv -f UTF16LE -t UTF8 < inputfile | awk ...
Oh i am sorry, it is generated by some windows software, between how can i verify that file is having 16-bit unicode character? How to check this format in linux?
hexdump is a good way... Look at the output you posted, it shows two bytes to each letter.
I am really sorry i don't understand how i can verify two characters. Can you please explain me some detail, i am so thankful to you.
Actually, I think it's "iconv" rather than "iconf".
From your output:
00027f80 77 00 77 00 77 00 2e 00 71 00 61 00 74 00 61 00 |w.w.w...r.a.t.p.|
Normal text:
hexdump -C /etc/gentoo-release
00000000 47 65 6e 74 6f 6f 20 42 61 73 65 20 53 79 73 74 |Gentoo Base Syst|
Sorry about that, it's one of those typos my brain refuses to catch.
Wonderfull explanation, I totally understand the problem now. So much thanks..........