Hi All,
I have a couple of files ( ascii ) with the following data
File 1
#lport1:dc1:lport2:dc2 - All records were delimited by :
6300:ADEF12:6305:ATNE59
3411:EGFE31:3499:GDEF21
. . . .
. . . .
total of 55,000 Records
File 2
#seqno:lport1:id:dlc1:vid:lport2:nni:dc2:ci - All records delimited by :
60568:3411:98:EGFE31:965:3499:3799:GDEF21:432
. . . . . . . . .
. . . . . . . . .
total of 58,000 Records
I need to Compare lport1, dc1, lport2, dc2 values of file1 with lport1, dc1, lport2, dc2 values of file2 and if there is a match, I need to write the entire line in file2 to another file. I tried writing a Perl script under solaris 2.5.8 which took almost 6 hours to finish.
Could anyone of you help me in getting this task run pretty fast i.e, less than 15 minutes using awk/shell script..
Thanks in Advance.
Assuming:
File 2
#seqno:lport1:id:dlc1:vid:lport2:nni:dc2:ci � All records delimited by :
actually means:
File 2
#seqno:lport1:id:dc1:vid:lport2:nni:dc2:ci � All records delimited by :
nawk -f jsusheel.awk file1 file2
jsusheel.awk:
BEGIN {
FS=OFS=":"
}
NR==FNR { f1[$1, $2, $3, $4]; next }
($2 SUBSEP $4 SUBSEP $6 SUBSEP $8) in f1
Hi Vgersh99,
thanks for the reply. Yes your assumption is correct. It should be dc1 instead of dlc1. Sorry for the typo error.
When i executed the awk script there was no matching output. The body starting with NR==FNR works perfect by reading all the input records from the file1. I just verified using print $0
However i do not have any clue wrt the line ($2 SUBSEP $4 SUBSEP $6 SUBSEP $8 ) in f1. Could you please help me in deciphering this line as i am not much comfortable to awk.
Also please note that a record in file1 will not match a record in file2 on a one to one basis i.e.,the first record in file1 may match 100th record in file2 and the second record in file1 may match 40123th record in file2.
Again i thank you for sparing your time...
Hi,
I have an idea about your reqs, but it maybe very slow when the file contains too much records.
Just for your reference.
Input:
first.txt:
1:a:2:b
3:c:4:d
5:e:6:f
7:g:8:h
second.txt:
60568:1:98:a:965:2:3799:b:432
60568:1:98:f:965:2:3799:b:432
60568:3:98:c:965:4:3799:d:432
60568:3:98:c:965:4:3799:w:432
60568:5:98:e:965:6:3799:f:432
Output:
60568:1:98:a:965:2:3799:b:432
60568:3:98:c:965:4:3799:d:432
60568:5:98:e:965:6:3799:f:432
Code:
awk 'BEGIN{FS=":"}
{
if (NF<=4)
pre[NR]=$0
else
{
a=sprintf("%s:%s:%s:%s",$2,$4,$6,$8)
for (i in pre)
if (pre==a)
print $0
}
}' first.txt second.txt
f1:
6300:ADEF12:6305:ATNE59
3411:EGFE31:3499:GDEF21
f2:
60568:3411:98:EGFE31:965:3499:3799:GDEF21:432
60568:3422:98:EGFE31:965:3499:3799:GDEF21:432
produces:
60568:3411:98:EGFE31:965:3499:3799:GDEF21:432
Looks good to me given your original description of the fields and the matching criteria.
The '($2 SUBSEP $4 SUBSEP $6 SUBSEP $8 )' is the field matching key for file2 - fields 2,4,6 and 8 'concatenated' from file2 records/line represent a matching key to be used to look up in the associative array 'f1'.
Hi,
Many thanks to Summer_cherry and vgresh99 for the responses.
Again these scripts consume lot of cpu utilization and takes longer
to complete. I have desided to run these scripts by midnight.
thanks a lot ...