Using AWK how its possible??

anishkumarv · September 19, 2011, 3:38pm

Hi all,

i have a file in that N number of domains and subdomains, from that i want to separate only main domains. without duplicate.

for example:

0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia NS AS2.DNS.ASIA.CN.
www.0008.asia NS AS2.DNS.ASIA.CN.
anish.asia NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN

using this command :

awk BEGIN{IGNORECASE=1}/^[^ ]+asia/ && !_[$1]++{print $1; tot++}END{print "Total",tot,"Domains"}'   file1

i can get the output like this only:

But i want the output only main domains. only

Any suggestions Welcome!! to solve this thread

joeyg · September 19, 2011, 3:55pm

$ echo 13.14 | awk -F'.' '{print $(NF-1)"." $NF}'
13.14

$ echo 12.13.14 | awk -F'.' '{print $(NF-1)"." $NF}'
13.14

After this, you could do a

sort -u

to get data only once.

sk1418 · September 19, 2011, 4:06pm

dirty and NOT generic solution:

awk -F' NS' '{ gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[$1]++;}END{for (x in b)print x}' yourFile

neutronscott · September 19, 2011, 4:20pm

You can use a match rule to get the start of the "main domain". Then substr() it...

(i=match($1, "[^.]+.asia")) && (d=tolower(substr($1,i))) && !a[d]++ { print d; tot++ }
END { print "Total",tot,"Domains" }

Chubler_XL · September 19, 2011, 5:39pm

Another solution:

awk -F'[. ]' 'BEGIN{IGNORECASE=1}$3=="asia" {$1=$2;$2=$3} $2=="asia"&&!_[$1]++{print $1"."$2}
END{print "Total",length(_),"Domains"}' file1

anishkumarv · September 19, 2011, 9:43pm

Thanks alot,

awk 'BEGIN{IGNORECASE=1}/^[^ ]+asia/ { gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[$1]++;}END{for (x in b)print x}'

i used your command like this but it sometimes skip entire
"www. domains"

anishkumarv · September 22, 2011, 2:38pm

;start: 1315288329
;File created: 2011-09-06 05:52:09 IST
;Export host: 199.115.158.5
;Record count: 2330419
;Created by ANISH

$ORIGIN asia.
@ IN SOA A.COM.ANISH.INFO. NOC.ANISH.INFO. (
                                    2008334441 ; serial
                                    10800 ; refresh
                                    3600 ; retry
                                    2592000 ; expire
                                    86400 ; minimum
                                    )
$TTL 86400

0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia. NS AS2.DNS.ASIA.CN.
www.0008.asia. NS AS2.DNS.ASIA.CN.
anish.asia. NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN
ANISH.ASIA. NS AS2.DNS.ASIA.CN.

;End of file: 1315288329

This is the exact format for the file guys

awk -F'[. ]' 'BEGIN{IGNORECASE=1}$3=="asia" {$1=$2;$2=$3} $2=="asia"&&!_[$1]++{print $1"."$2}END{print "Total",length(_),"Domains"}' filename

either this or

any idea guys...iam sticking with this script nearly a week still no luck..

neutronscott · September 22, 2011, 2:44pm

Mine worked fine. Did you not try it?

[mute@geek ~/asia]$ awk '(i=match($1,/[^.]+\.asia/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' zone

0008.asia
anish.asia
Total 2 Domains

anishkumarv · September 22, 2011, 3:03pm

neutronscott:

Mine worked fine. Did you not try it?

[mute@geek ~/asia]$ awk '(i=match($1,/[^.]+\.asia/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' zone

0008.asia
anish.asia
Total 2 Domains

its works but here i posted sample zone file only dude..sorry that was my mistake only..

suppose in my zone file this means your code wont woks na?

that time
but using this camel case format it works thanks alot..man for your help

awk '(i=match($1,/[^.]+\.[Aa][Ss][Ii][Aa]/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' filename

Chubler_XL · September 22, 2011, 3:37pm

How about this:

awk -F'[. ]' 'tolower($3)=="asia" {$1=$2;$2=$3} NF>3&&$2=="asia"&&!_[tolower($1)]++{print $1"."$2}
END{print "Total",length(_),"Domains"}' file1

anishkumarv · September 22, 2011, 3:43pm

it works only the tlds are lowercase.. suppose if a zone file contains

this kind of data mean your code wont show this data in count

neutronscott · September 22, 2011, 4:18pm

anishkumarv:

but using this camel case format it works thanks alot..man for your help
awk '(i=match($1,/[^.]+\.[Aa][Ss][Ii][Aa]/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' filename

Oh. My folly. I see now the problem. That is a good solution, or like you had before with gawk's IGNORECASE=1.

gawk '...' IGNORECASE=1 file

or move the tolower() to be first

awk '(d=tolower($1))&&(i=match(d,/[^.]+\.asia/))&&(d=substr(d,i,RLENGTH))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}' zone

Many ways...

---------- Post updated at 04:18 PM ---------- Previous update was at 03:57 PM ----------

I found 'scott.asia.asia' would not match correctly. I re-write it to be generic, in that "asia" is not even apart of the code.. this will work best, i think, in the future.

[mute@geek ~/asia]$ cat scr
#!/usr/bin/awk -f
$1 ~ /^[^;@$]+.+\..+/{d=tolower($1);gsub(/\.$/,"",d);n=split(d,a,".");d=a[n-1]"."a[n];if(!_[d]++){tot++;print d}}
END{print "Total",tot,"Domains"}
[mute@geek ~/asia]$ ./scr zone
0008.asia
anish.asia
asia.asia
Total 3 Domains