Mr_47
August 9, 2011, 12:39am
1
Hi to all,
I got this content/pattern from file http.log.20110808.gz
[07/Aug/2011:07:37:39 +0800] mail1 httpd[14646]: Account Notice: close [192.168.10.128] igchung@abc.com 2011/8/7 7:37:36 0:00:03 0 0 1
[07/Aug/2011:07:37:44 +0800] mail1 httpd[14647]: Account Information: login [192.168.10.131:17187] sastria9@abc.com proxy sid=gFp4DLm5HnU
[07/Aug/2011:07:37:44 +0800] mail1 httpd[14648]: Account Notice: close [192.168.10.131] sastria9@abc.com 2011/8/7 7:37:44 0:00:00 0 0 1
[07/Aug/2011:07:37:45 +0800] mail1 httpd[14647]: Account Information: login [192.168.10.131:17194] sastria9@abc.com proxy sid=gSiaecABc/E
[07/Aug/2011:07:38:37 +0800] mail1 httpd[14646]: Account Information: login [192.168.10.129:2063] pntcdor1@abc.com proxy sid=ZGhAdmqmz3k
[07/Aug/2011:07:38:37 +0800] mail1 httpd[14647]: Account Notice: close [192.168.10.129] pntcdor1@abc.com 2011/8/7 7:38:37 0:00:00 0 0 1
[07/Aug/2011:07:38:38 +0800] mail1 httpd[14646]: Account Information: login [192.168.10.129:2071] pntcdor1@abc.com proxy sid=PtwbGuIk+I4
[07/Aug/2011:07:38:48 +0800] mail1 httpd[14646]: Account Information: login [192.168.10.130:14272] visnet@abc.com proxy sid=4W6xBKPXXvk
[07/Aug/2011:07:38:48 +0800] mail1 httpd[14647]: Account Notice: close [192.168.10.130] visnet@abc.com 2011/8/7 7:38:48 0:00:00 0 0 1
[07/Aug/2011:07:38:48 +0800] mail1 httpd[14646]: Account Information: login [192.168.10.130:14279] visnet@abc.com proxy sid=/qenNd/tps8
[07/Aug/2011:07:38:59 +0800] mail1 httpd[14646]: Account Notice: close [192.168.10.130] visnet@abc.com 2011/8/7 7:38:48 0:00:11 0 0 1
[07/Aug/2011:07:39:06 +0800] mail1 httpd[14647]: Account Information: login [192.168.10.130:14367] animan86@abc.com proxy sid=VdYyCOMtPsQ
how can I generate one new file with content as below, from file above?
igchung@abc.com
sastria9@abc.com
pntcdor1@abc.com
visnet@abc.com
animan86@abc.com
With grep/sort/uniq:
grep -o "[^ ]*@[^ ]*" http.log.20110808.gz | sort | uniq
With awk:
awk ' /@/ { sub("^.*] ",""); sub(" .*", ""); if(!($0 in E)) print; E[$0]} ' http.log.20110808.gz
Note: if file is gzipped as extension seems to imply you man need to pipe output of gzip -d to these solutions.
1 Like
Mr_47
August 9, 2011, 1:46am
3
Hi,
I am using unix solaris 10 for this, is this right?
[root] grep -o "[^ ]*@[^ ]*" http.log.20110801.gz | sort | uniq >1.out
grep: illegal option -- o
Usage: grep -hblcnsviw pattern file . . .
[root] grep -o "[^ ]*@[^ ]*" http.log.20110801.gz | sort
grep: illegal option -- o
Usage: grep -hblcnsviw pattern file . . .
[root] awk ' /@/ { sub("^.*] ",""); sub(" .*", ""); if(!($0 in E)) print; E[$0]} ' http.log.20110801.gz
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1
yazu
August 9, 2011, 1:51am
4
Try to use nawk instead of awk.
1 Like
Mr_47
August 9, 2011, 2:07am
5
still cant, the result is unreadable (binary)
Mr_47
August 9, 2011, 2:32am
7
where should i put the gzip -d?
like this?
[root|reports.tm.net.my:/data2/mail1/201108] grep -o "[^ ]*@[^ ]*" http.log.20110801.gz | sort | uniq | gzip -d
grep: illegal option -- o
Usage: grep -hblcnsviw pattern file . . .
gzip: stdin: unexpected end of file
[root|reports.tm.net.my:/data2/mail1/201108] awk ' /@/ { sub("^.*] ",""); sub(" .*", ""); if(!($0 in E)) print; E[$0]} ' http.log.20110801.gz | gzip -d
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1
gzip: stdin: unexpected end of file
yazu
August 9, 2011, 3:05am
8
zcat http.log.20110801.gz | nawk...
OR
gunzip -c http.log.20110801.gz | nawk...
OR
gzip -dc http.log.20110801.gz | nawk ...
1 Like
You can use zgrep also
/user/ahamed> zgrep -o "[^ ]*@[^ ]*" http.log.20110808.gz
igchung@abc.com
sastria9@abc.com
sastria9@abc.com
sastria9@abc.com
pntcdor1@abc.com
pntcdor1@abc.com
pntcdor1@abc.com
visnet@abc.com
visnet@abc.com
visnet@abc.com
visnet@abc.com
animan86@abc.com
Using sed
gzip -dc http.log.20110801.gz | sed 's/.*] \(.*@.*com\) .*/\1/g' | sort | uniq
regards,
Ahamed
1 Like
Mr_47
August 16, 2011, 5:18am
11
my god, you guys are pro. it work now, every one of it. thx guys
---------- Post updated at 05:18 PM ---------- Previous update was at 03:44 PM ----------
another question, I generated this file a2.out, however how can I generate another file from it with only unique email listed?
more a2.out
116borrul@bx.com
133fird@b.com
147aedzra@.com
152najib@bx.com
154rshakir@bluehyppo.com
154zadzli@bc.com
155buddin@bx.com
Access to this service for 116borrul@bx.com
Access to this service for 133fird@b.com
Access to this service for 147aedzra@b.com
Access to this service for 152najib@bx.com
Access to this service for 154rshakir@b.com
Access to this service for 154zadzli@bc.com
Access to this service for 155buddin@bx.com
should be like this,
more uniqueemail.out
116borrul@bx.com
133fird@b.com
147aedzra@.com
152najib@bx.com
154rshakir@bluehyppo.com
154zadzli@bc.com
155buddin@bx.com
Try this:
gzip -dc http.log.20110808.gz | nawk ' /@/ { sub("^.*] ",""); sub(" .*", ""); if(!($0 in E)) print; E[$0]} ' > uniqueemail.out
1 Like
Mr_47
August 23, 2011, 1:19am
13
my god, its works perfectly thank you so much.
Mr_47
August 25, 2011, 12:04am
14
Hi, I have another question,
How to remove any domains(@something.com ) in the file structure like this one?
-bash-3.00# more 30days.out
user/ris1@yiris.net/INBOX
user/ris2@giris.net/INBOX
user/ris3@iris.net/INBOX
user/ris4@hiris.net/INBOX
user/str1@eamyx.com/INBOX
user/str2@amyx.com/INBOX
user/tg4@titangroup.com/INBOX
output should be like this,
-bash-3.00# more 30days.out
user/ris1/INBOX
user/ris2/INBOX
user/ris3/INBOX
user/ris4/INBOX
user/str1/INBOX
user/str2/INBOX
user/tg4/INBOX
$ sed 's/@.*\//\//' test
user/ris1/INBOX
user/ris2/INBOX
user/ris3/INBOX
user/ris4/INBOX
user/str1/INBOX
user/str2/INBOX
user/tg4/INBOX
1 Like
Mr_47
August 26, 2011, 12:17am
16
works perfectly, thank you so much you help me a lot.
Mr_47
September 11, 2011, 1:33pm
17
Hi,
need help with another question related to manipulation base on 1 file, to extract selected information to a new file base on some conditions.
I got this pattern in a file a.out with 3000 list of users email address, how to extract 1000 of email address with the selected domain @titangroup.com only, to a new file b.out ?
more a.out
user/admin/INBOX
user/ris1@iris.net/INBOX
user/ris2@iris.net/INBOX
user/ris3@iris.net/INBOX
user/ris4@iris.net/INBOX
user/str1@streamyx.com/INBOX
user/str2@streamyx.com/INBOX
user/str3@streamyx.com/INBOX
user/str4@streamyx.com/INBOX
user/tg1@titangroup.com/INBOX
user/tg2@titangroup.com/INBOX
user/tg3@titangroup.com/INBOX
user/tg4@titangroup.com/INBOX
user/tmnet1/INBOX
user/tmnet2/INBOX
user/tmnet3/INBOX
user/tmnet4/INBOX
output should be like this, ( should listed 1000 users email address)
e.g
more b.out
user/tg1@titangroup.com/INBOX-----> e.g number 1
user/tg2@titangroup.com/INBOX
.
.
.
user/tg3@titangroup.com/INBOX
user/tg1000@titangroup.com/INBOX -------->e.g number 1000