Using AWK how its possible??

Hi all,

i have a file in that N number of domains and subdomains, from that i want to separate only main domains. without duplicate.

for example:

0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia NS AS2.DNS.ASIA.CN.
www.0008.asia NS AS2.DNS.ASIA.CN.
anish.asia NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN

using this command :

awk BEGIN{IGNORECASE=1}/^[^ ]+asia/ && !_[$1]++{print $1; tot++}END{print "Total",tot,"Domains"}'   file1

i can get the output like this only:

But i want the output only main domains. only

Any suggestions Welcome!! to solve this thread

$ echo 13.14 | awk -F'.' '{print $(NF-1)"." $NF}'
13.14

$ echo 12.13.14 | awk -F'.' '{print $(NF-1)"." $NF}'
13.14

After this, you could do a

sort -u

to get data only once.

dirty and NOT generic solution:

awk -F' NS' '{ gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[$1]++;}END{for (x in b)print x}' yourFile

You can use a match rule to get the start of the "main domain". Then substr() it...

(i=match($1, "[^.]+.asia")) && (d=tolower(substr($1,i))) && !a[d]++ { print d; tot++ }
END { print "Total",tot,"Domains" }

Another solution:

awk -F'[. ]' 'BEGIN{IGNORECASE=1}$3=="asia" {$1=$2;$2=$3} $2=="asia"&&!_[$1]++{print $1"."$2}
END{print "Total",length(_),"Domains"}' file1

Thanks alot,

awk 'BEGIN{IGNORECASE=1}/^[^ ]+asia/ { gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[$1]++;}END{for (x in b)print x}'

i used your command like this but it sometimes skip entire
"www. domains"

;start: 1315288329
;File created: 2011-09-06 05:52:09 IST
;Export host: 199.115.158.5
;Record count: 2330419
;Created by ANISH

$ORIGIN asia.
@ IN SOA A.COM.ANISH.INFO. NOC.ANISH.INFO. (
                                    2008334441 ; serial
                                    10800 ; refresh
                                    3600 ; retry
                                    2592000 ; expire
                                    86400 ; minimum
                                    )
$TTL 86400

0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia. NS AS2.DNS.ASIA.CN.
www.0008.asia. NS AS2.DNS.ASIA.CN.
anish.asia. NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN
ANISH.ASIA. NS AS2.DNS.ASIA.CN.

;End of file: 1315288329

This is the exact format for the file guys

awk -F'[. ]' 'BEGIN{IGNORECASE=1}$3=="asia" {$1=$2;$2=$3} $2=="asia"&&!_[$1]++{print $1"."$2}END{print "Total",length(_),"Domains"}' filename 

either this or

any idea guys...iam sticking with this script nearly a week still no luck..

Mine worked fine. Did you not try it?

[mute@geek ~/asia]$ awk '(i=match($1,/[^.]+\.asia/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' zone

0008.asia
anish.asia
Total 2 Domains

its works but here i posted sample zone file only dude..sorry that was my mistake only..

suppose in my zone file this means your code wont woks na?

that time
but using this camel case format it works thanks alot..man for your help

awk '(i=match($1,/[^.]+\.[Aa][Ss][Ii][Aa]/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' filename

How about this:

awk -F'[. ]' 'tolower($3)=="asia" {$1=$2;$2=$3} NF>3&&$2=="asia"&&!_[tolower($1)]++{print $1"."$2}
END{print "Total",length(_),"Domains"}' file1

it works only the tlds are lowercase.. suppose if a zone file contains

this kind of data mean your code wont show this data in count :frowning:

Oh. My folly. I see now the problem. That is a good solution, or like you had before with gawk's IGNORECASE=1.

gawk '...' IGNORECASE=1 file

or move the tolower() to be first

awk '(d=tolower($1))&&(i=match(d,/[^.]+\.asia/))&&(d=substr(d,i,RLENGTH))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' zone

Many ways...

---------- Post updated at 04:18 PM ---------- Previous update was at 03:57 PM ----------

I found 'scott.asia.asia' would not match correctly. I re-write it to be generic, in that "asia" is not even apart of the code.. this will work best, i think, in the future.

[mute@geek ~/asia]$ cat scr
#!/usr/bin/awk -f
$1 ~ /^[^;@$]+.+\..+/{d=tolower($1);gsub(/\.$/,"",d);n=split(d,a,".");d=a[n-1]"."a[n];if(!_[d]++){tot++;print d}}
END{print "Total",tot,"Domains"}
[mute@geek ~/asia]$ ./scr zone
0008.asia
anish.asia
asia.asia
Total 3 Domains