Hi all,
i have a file in that N number of domains and subdomains, from that i want to separate only main domains. without duplicate.
for example:
0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia NS AS2.DNS.ASIA.CN.
www.0008.asia NS AS2.DNS.ASIA.CN.
anish.asia NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN
using this command :
awk BEGIN{IGNORECASE=1}/^[^ ]+asia/ && !_[$1]++{print $1; tot++}END{print "Total",tot,"Domains"}' file1
i can get the output like this only:
But i want the output only main domains. only
0008.ASIA
anish.asia
Total 2 domains
Any suggestions Welcome!! to solve this thread
joeyg
September 19, 2011, 3:55pm
2
$ echo 13.14 | awk -F'.' '{print $(NF-1)"." $NF}'
13.14
$ echo 12.13.14 | awk -F'.' '{print $(NF-1)"." $NF}'
13.14
After this, you could do a
sort -u
to get data only once.
sk1418
September 19, 2011, 4:06pm
3
dirty and NOT generic solution:
awk -F' NS' '{ gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[$1]++;}END{for (x in b)print x}' yourFile
You can use a match rule to get the start of the "main domain". Then substr() it...
(i=match($1, "[^.]+.asia")) && (d=tolower(substr($1,i))) && !a[d]++ { print d; tot++ }
END { print "Total",tot,"Domains" }
Another solution:
awk -F'[. ]' 'BEGIN{IGNORECASE=1}$3=="asia" {$1=$2;$2=$3} $2=="asia"&&!_[$1]++{print $1"."$2}
END{print "Total",length(_),"Domains"}' file1
sk1418:
dirty and NOT generic solution:
awk -F' NS' '{ gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[$1]++;}END{for (x in b)print x}' yourFile
Thanks alot,
awk 'BEGIN{IGNORECASE=1}/^[^ ]+asia/ { gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[$1]++;}END{for (x in b)print x}'
i used your command like this but it sometimes skip entire
"www. domains"
;start: 1315288329
;File created: 2011-09-06 05:52:09 IST
;Export host: 199.115.158.5
;Record count: 2330419
;Created by ANISH
$ORIGIN asia.
@ IN SOA A.COM.ANISH.INFO. NOC.ANISH.INFO. (
2008334441 ; serial
10800 ; refresh
3600 ; retry
2592000 ; expire
86400 ; minimum
)
$TTL 86400
0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia. NS AS2.DNS.ASIA.CN.
www.0008.asia. NS AS2.DNS.ASIA.CN.
anish.asia. NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN
ANISH.ASIA. NS AS2.DNS.ASIA.CN.
;End of file: 1315288329
This is the exact format for the file guys
awk -F'[. ]' 'BEGIN{IGNORECASE=1}$3=="asia" {$1=$2;$2=$3} $2=="asia"&&!_[$1]++{print $1"."$2}END{print "Total",length(_),"Domains"}' filename
either this or
any idea guys...iam sticking with this script nearly a week still no luck..
Mine worked fine. Did you not try it?
[mute@geek ~/asia]$ awk '(i=match($1,/[^.]+\.asia/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' zone
0008.asia
anish.asia
Total 2 Domains
neutronscott:
Mine worked fine. Did you not try it?
[mute@geek ~/asia]$ awk '(i=match($1,/[^.]+\.asia/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' zone
0008.asia
anish.asia
Total 2 Domains
its works but here i posted sample zone file only dude..sorry that was my mistake only..
suppose in my zone file this means your code wont woks na?
that time
but using this camel case format it works thanks alot..man for your help
awk '(i=match($1,/[^.]+\.[Aa][Ss][Ii][Aa]/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' filename
How about this:
awk -F'[. ]' 'tolower($3)=="asia" {$1=$2;$2=$3} NF>3&&$2=="asia"&&!_[tolower($1)]++{print $1"."$2}
END{print "Total",length(_),"Domains"}' file1
it works only the tlds are lowercase.. suppose if a zone file contains
this kind of data mean your code wont show this data in count
anishkumarv:
but using this camel case format it works thanks alot..man for your help
awk '(i=match($1,/[^.]+\.[Aa][Ss][Ii][Aa]/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' filename
Oh. My folly. I see now the problem. That is a good solution, or like you had before with gawk's IGNORECASE=1.
gawk '...' IGNORECASE=1 file
or move the tolower() to be first
awk '(d=tolower($1))&&(i=match(d,/[^.]+\.asia/))&&(d=substr(d,i,RLENGTH))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' zone
Many ways...
---------- Post updated at 04:18 PM ---------- Previous update was at 03:57 PM ----------
I found 'scott.asia.asia' would not match correctly. I re-write it to be generic, in that "asia" is not even apart of the code.. this will work best, i think, in the future.
[mute@geek ~/asia]$ cat scr
#!/usr/bin/awk -f
$1 ~ /^[^;@$]+.+\..+/{d=tolower($1);gsub(/\.$/,"",d);n=split(d,a,".");d=a[n-1]"."a[n];if(!_[d]++){tot++;print d}}
END{print "Total",tot,"Domains"}
[mute@geek ~/asia]$ ./scr zone
0008.asia
anish.asia
asia.asia
Total 3 Domains