Unix Shell Scripting : Comparision of two files

Hi,
We need to compare a text file File1.txt and config file File2.txt in a way that it checks if the content of File1.txt exists between the range mentioned in File2.cfg.
The range here is the range between col1 and col2 of File2.cfg
If the content of File1.txt lies between the range of File2.cfg then output should display the count of col3 of File2.cfg in Outputfile.

For Example :
File1.txt

65005
65007
65006
27117
68700
68399

File2.cfg

col1        col2     col3
65005    65008    A
68399    68399    A
68700    68700    A
22980   22999     B
27109    27125    C

Output File :

col3  Count
A        5
B        0
C        1

Could anyone please help me to do the same:confused:.

Thanks in Advance
CFA

With awk:

awk 'FNR==NR{range[$1,$2]=$3;next}
{
for(i in range)
{
 c[range]+=0
 split(i,r,SUBSEP)
 if($1>=r[1] && $1<=r[2])
  c[range]++
}
}
END{
for(i in c)
print i,c|"sort"}' file2 file1
2 Likes

Hi Elixir

Thanks a bunch for your prompt reply.

We tried this command but its providing the output as '0'.
Also the two files that we mentioned are in different paths.

Could you please help:confused:.

Thanks in Advance
CFA

Read what I wrote carefully. file2 is mentioned first. This is important.

cat file1
65005
65007
65006
27117
68700
68399

cat file2
65005 65008 A
68399 68399 A
68700 68700 A
22980 22999 B
27109 27125 C

awk 'FNR==NR{range[$1,$2]=$3;next}
> {
> for(i in range)
> {
>  c[range]+=0
>  split(i,r,SUBSEP)
>  if($1>=r[1] && $1<=r[2])
>   c[range]++
> }
> }
> END{
> for(i in c)
> print i,c|"sort"}' file2 file1
A 5
B 0
C 1

If the 2 files reside in different directories:

awk '<program>' /path/to/file2 /path/to/file1
1 Like

Hi Elixir,

Thanks a lot for your help.

Now its working fine:).
Would deem it a great favour if you could please explain how exactly the provided code works.

Thanks in Advance
CFA:D

In brief:

awk '
### THIS STORES THE RANGES FROM THE FIRST FILE ON THE COMMAND LINE (file2) ###
### IN AN ARRAY IN MEMORY. REQUIRED FOR LOOKUP LATER WHEN WE START READING ###
### file1.                                                                 ###
FNR==NR{range[$1,$2]=$3;next}
##############################################################################
{
### FOR EACH LINE FROM file1, THIS LOOPS THROUGH THE ARRAY STORED IN MEMORY###
### AND CHECKS IF THE LINE READ IS IN ANY OF THE RANGES. IF YES, THE COUNT ###
### OF THE RANGE NAME (A,B,C,D,ETC.) IS INCREMENTED BY ONE.                ###
for(i in range)
{
 c[range]+=0
 split(i,r,SUBSEP)
 if($1>=r[1] && $1<=r[2])
  c[range]++
}
###############################################################################
}
### THIS IS DONE AFTER READING THE 2 FILES COMPLETELY. THIS SIMPLY PRINTS   ###
### THE ARRAY INDEX (RANGE NAME) AND THE CORRESPONDING VALUE (COUNT). THE   ###
### OUTPUT IS PIPED TO THE SORT COMMAND TO GET THE OUTPUT DESIRED.          ###
END{
for(i in c)
print i,c|"sort"}
################################################################################
' file2 file1
1 Like

Hi Elixir

Thanks a tonne...!!!! Appreciate your quick response:).
The explanation was very clear:D. Got a clear picture how the code is working.

Thanks and Regards,
CFA:)

Hi

For the same example.
What should we do if we need to count and print all the contents of File1.txt which do not fall in the range of File2.cfg?

Thanks in Advance,
CFA :slight_smile:

---------- Post updated at 06:39 AM ---------- Previous update was at 06:33 AM ----------

Hi

For the same example.

What should we do if need to get the count and print the content of File1.txt which do not fall in the range of File2.cfg.

Please Help !!! :S

Thanks in Advance
CFA:confused:

awk 'FNR==NR{range[$1,$2]=$3;next}
{found="N";totlines=FNR
for(i in range)
{
 split(i,r,SUBSEP)
 if($1>=r[1] && $1<=r[2])
 {
  found="Y"
  count++
  break
 }
}
}found=="N"
END{print "Not Found (Total) : " (totlines-count)}' file2 file1

Hi Elixir,

Thanks a lot for your reply.:slight_smile:

We tried this but the output is incorrect.
Also we need both the matching content and the non matching content to be displayed together.

For Example :

File1.txt

65005
65007
65006
27117
68700
68399
27382
27481
27382

File2.cfg

65005 65008 A
68399 68399 A
68700 68700 A
22980 22999 B
27109 27125 C

Like here 27382 and 27481 of File1.txt does not fall anywhere in the range of File2.cfg. Then it should display Output as :

Output File:

A 5
B 0
C 1
Unmatched 3

Thanks in Advance
CFA:confused:

awk 'FNR==NR{range[$1,$2]=$3;next}
{totlines=FNR
for(i in range)
{
 c[range]+=0
 split(i,r,SUBSEP)
 if($1>=r[1] && $1<=r[2])
  c[range]++
}}
END{
for(i in c){print i,c|"sort";totfound+=c}
close("sort")
print "Unmatched " (totlines-totfound)}' File2.cfg File1.txt

Hi Elixir,

Thanks a lot for your reply.
This one works fine.

The only problem being our cfg file consist of some descriptive headers before each range specified. So while running the code we are also getting that in the count. Thus in output for all the descriptive header it is giving us the total count.

Example:

File2.cfg :

#Start End session
#----- ----- -------
65005 65008 A
68399 68399 A
68700 68700 A
22980 22999 B
This is for your information
27109 27125 C

File1.txt

65005
65007
65006
27117
68700
68399
27382
27481
27382

Output File Now:
7
----- 0
A 5
B 0
C 1
sessions 7
for 7
Unmatched -20

Output File Needed
A 5
B 0
C 1
Unmatched 1

Please help !!!

Thanks and Regards,
CFA

Assuming your "data" lines in File2.cfg start with a number, change

awk 'FNR==NR{range[$1,$2]=$3;next}

to

awk 'FNR==NR{if(/^[0-9]/) range[$1,$2]=$3;next}