awk command to compare a file with set of files in a directory using 'awk'

anandek · May 30, 2012, 5:43am

Hi,

I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files.

To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the directory will have many fields seperated by '~'.

I tried 'awk' command as below

awk -F"~" 'FILENAME=="file1.txt"{A[$1]=$1} FILENAME=="SBL_LOYALTY_SALE_TXNS*.txt"{if(A[$1]){print}}' file1.txt SBL_LOYALTY_SALE_TXNS*.txt > common.txt

This did not generate any output.

Please provide your valuable suggestions.If any modifications needed to be done in 'awk' or any other unix command will help.

guruprasadpr · May 30, 2012, 6:07am

Hi

Hoping you want to print the matching first lines, and not the filename itself:

$ cat f1.txt
guru~abc~efg
ammi
abc

$ cat f2.txt
abc~guru~ammi
bcd
efg

$ cat f3.txt
guru~abc~ammi
bcd
efg

$ awk -F '~' 'NR==1 && NR==FNR{x=$1;next}FNR==1{if ($1==x)print;}' f1.txt f2.txt f3.txt
guru~abc~ammi

anandek · May 30, 2012, 6:26am

Hi guruprasad,

Many thanks for the reply.

The requirement is to compare the first column, not first line of all the records in the first file(Txn_queued.txt) with the first column of the files in the

directory(SBL_LOYALTY_SALE_TXNS_20120405.txt,SBL_LOYALTY_SALE_TXNS_20120420.txt  ).

Output should be the matching records from the files in the directory.

I tried using the above mentioned command as

awk -F '~' 'NR==1 && NR==FNR{x=$1;next}FNR==1{if ($1==x)print;}
  ' Txn_queued.txt SBL_LOYALTY_SALE_TXNS_20120405.txt SBL_LOYALTY_SALE_TXNS_20120420.txt > new.txt

This did not give any output.

guruprasadpr · May 30, 2012, 6:28am

Hi
Post your sample input files and output file.

anandek · May 30, 2012, 6:33am

sample records in file1.txt

271:13-APR-12:1:6189:1
1183:13-APR-12:1:6689:1
1183:13-APR-12:2:8993:10
1183:13-APR-12:2:8993:11
1183:13-APR-12:2:8993:12

sample records from files in directory

1183:13-APR-12:1:6689:1~1148141~380392198~1183~04-13-12~0~1~1~12.49~0~0.00~S~~0.00~~1~12.49~~N~
1183:13-APR-12:2:8993:10~863432~380391909~1183~04-13-12~0~1~1~0.95~0~0.00~S~~0.00~~12~92.46~~N~
2769:14-APR-12:2:5385:1~669725~399469944~2769~04-14-12~0~1~1~21.99~0~0.00~S~~0.00~~12~128.88~~N~
2769:14-APR-12:2:5385:2~1352601~399469944~2769~04-14-12~0~1~1~10.99~0~0.00~S~~0.00~~12~128.88~~N~
2769:14-APR-12:2:5385:3~1035266~399469944~2769~04-14-12~0~1~1~9.99~0~0.00~S~~0.00~~12~128.88~~N~

Output file should contain values like

1183:13-APR-12:1:6689:1~1148141~380392198~1183~04-13-12~0~1~1~12.49~0~0.00~S~~0.00~~1~12.49~~N~
1183:13-APR-12:2:8993:10~863432~380391909~1183~04-13-12~0~1~1~0.95~0~0.00~S~~0.00~~12~92.46~~N~

guruprasadpr · May 30, 2012, 6:40am

Hi

$ cat f1.txt
271:13-APR-12:1:6189:1
1183:13-APR-12:1:6689:1
1183:13-APR-12:2:8993:10
1183:13-APR-12:2:8993:11
1183:13-APR-12:2:8993:12

$ cat f2.txt
1183:13-APR-12:1:6689:1~1148141~380392198~1183~04-13-12~0~1~1~12.49~0~0.00~S~~0.00~~1~12.49~~N~
1183:13-APR-12:2:8993:10~863432~380391909~1183~04-13-12~0~1~1~0.95~0~0.00~S~~0.00~~12~92.46~~N~
2769:14-APR-12:2:5385:1~669725~399469944~2769~04-14-12~0~1~1~21.99~0~0.00~S~~0.00~~12~128.88~~N~
2769:14-APR-12:2:5385:2~1352601~399469944~2769~04-14-12~0~1~1~10.99~0~0.00~S~~0.00~~12~128.88~~N~
2769:14-APR-12:2:5385:3~1035266~399469944~2769~04-14-12~0~1~1~9.99~0~0.00~S~~0.00~~12~128.88~~N~

$ grep -f f1.txt f2.txt
1183:13-APR-12:1:6689:1~1148141~380392198~1183~04-13-12~0~1~1~12.49~0~0.00~S~~0.00~~1~12.49~~N~
1183:13-APR-12:2:8993:10~863432~380391909~1183~04-13-12~0~1~1~0.95~0~0.00~S~~0.00~~12~92.46~~N~

anandek · May 30, 2012, 6:48am

Hi,

this is fine if we are comparing only two files.

But the requirement is to compare the first file with all the files in a directory.All those files in the directory will have records similar to the one in the second file.

jothi_basu · May 30, 2012, 6:50am

awk -f "~" `FNR==NR{A[$1,$1]=1;next} A[$1,$1]' file1.txt SBL_LOYALTY_SALE_TXNS*.txt >comman.txt

Not sure about this but you can try...

Scrutinizer · May 30, 2012, 7:08am

You could try:

awk -F~ 'NR==FNR{A[$1];next}$1 in A' file1.txt SBL_LOYALTY_SALE_TXNS*.txt

Or if there are too many files and total line length is exceeded:

for f in SBL_LOYALTY_SALE_TXNS*.txt; do
  cat "$f"
done | awk -F~ 'NR==FNR{A[$1];next}$1 in A' file1.txt -

anandek · May 30, 2012, 8:50am

Hi Scrutinizer,

Many thanks for your help!!!!!!!!
It works absolutely fine.

awk -F~ 'NR==FNR{A[$1];next}$1 in A' file1.txt SBL_LOYALTY_SALE_TXNS*.txt

It would be very helpful if you could explain the logic briefly.Just to understand.

Scrutinizer · May 30, 2012, 9:21am

Sure:

awk -F~ '                                # Use ~ as the input field separator
  NR==FNR{                               # If FNR is equal to NR, in other words, if we are reading the first file file1.txt
    A[$1]                                # Then create an empty associative array element in array "A" with field 1 as the index
    next                                 # Proceed with the next record and start over, do not process the rest of the script.
  }                                     
  $1 in A                                # If we get here we are reading the rest of the files. If the first field is present in array A,
                                         # then perform the default action, which is {print $0}, i.e. print the record..
' file1.txt SBL_LOYALTY_SALE_TXNS*.txt   # Use file1.txt as the first file and the files with the specified pattern in the directory as the rest of the files