Join, GREP

Hi All,

I have 2 file. ACC_NUM contains only account numbers & ACC_DETAIL contains all information including account number seperated by ~ delimiter.

i am serching the account number in ACC_DETAIL from ACC_NUM. If it is exist, then copy all information in ACC_DETAIL_NEW file. For that i have written the following logic. It is working but taking too long time.

for i in 'cat ACC_NUM'
do
grep -w $i ACC_DETAIL > ACC_DETAIL_NEW
done

====

ACC Number is Alpha Numeric, which is a first column in the file. for exampple.

ACC_NUM

ABC4567
JUB7546

ACC_DETAIL

ABC4567~ameet~bangalore~savings
UIO8907~rakesh~delhi~xxxxx
JUB7546~aup~mangalore~insurance

Now we need to copy following information into ACC_DETAIL_NEW

ABC4567~ameet~bangalore~savings
JUB7546~aup~mangalore~insurance

As grep will search entire file for each Account Number. It is taking time. Could comeone suggest other way to do effectively. ie by using AWK or JOIN. I have tried using JOIN, but i not expert in UNIX & no0t getting result.

ACC_NUM has 6780 record (ie account number) & ACC_DETAIL has 20000 records

Look like a "homework", please read The UNIX and Linux Forums - Forum Rules
Tip: Search the forum for "compare file awk"

Try using the join commd, first sort both the files on the field which is
common in both files.
so in your case the account number which is alpha numeric.

sort -d ACC_NUM > a.num
sort -d ACC_DETAIL +0 -1 > acc.num
join -1 1 -2 1 -t~ a.num acc.num -o 2.1 2.2 2.3 2.4 > output

To make the join commd you must sort the files on the key value.

Cheers

Hi Mahesh,

Thanks for your reply. It is working fine. But i also need to know some of the command option you have given below.

sort -d : is used to sort alpha-numeric filed. But also you have given +0 -1 option. What does it mean?

join -1 1 -2 1 -t~ a.num acc.num -o 2.1 2.2 2.3 2.4 > output : This coomand is working fine. But in file2, i have more than 4 column (ie 10 columns). If i do not write 2.1....2.4 or 2.10 will it work same? is there any other option where we can point out all column in file2 for join?

Please advise me so that i can understand it properly

Dear Amit,
The command : sort -d +0 -1 :-> it says to sort the input file on the field no
one (1) in directory order . The first field in your case is the key field and
that is alpha numeric hence we require to sort on directory order.

Secondly the number 2.1 2.2 2.3 etc are indicative of the field we require
in the output file.The same as per your requirement and you may alter to
suit your need.

I hope the doubts are cleared.

Hi Mahesh,

Thanks for your help. It works fine for me. It takes only 25 seconds to complete my task compare to 5 hrs with grep.

But still there is issue with me.

a.num contains 13155 account number
acc.num contains 660469 account's data. it has 75 columns with delimiter ~

while issuing command

join -1 1 -2 1 -t~ a.num acc.num > output, there is difference of 3 accounts.

in output file, total number of records should be 13155. but it shows only 13152.

Both files are in sorted format. Also i have checked all these 3 acc number in acc.num file. It is available in both.

Dear Amit
Just today only i came back .and read your feedback .To make join
cmmd successful the record length of the key record must be same
just check those three records and you will know where the fault is.
I hope I made the things clear.

Cheers