Adding Leading Zero with Right Justified and Removing Duplicates

hello everyone. I'm hoping someone can help me out here. I have 2 files. It looks like this:

File 1:

abc1, defg, 50.00, mno,990
abc2, cats, 100.00, pops,991
abc3, dogs, 1.00, treat,992

File 2:

990, airplanes, runway, doctor
991, jets, birds, much
990, *airplanes,runway,doctor
992, cars, ford, toyota

Here is what I would like to do:

  1. In file 1, column 3, the amount, i would like it to be right justified with leading zero

  2. In file 2, all the duplicate need to be removed. Like 990, *airplanes

  3. After file 1 and 2 are done, then join them by using the key in column 5 in file 1 and column 1 in file 2.

If you need more clarification, let me know. I appreciate it greatly if someone can help me out. Thank you.

by the way, the final file would be something like this:

abc1, defg, 000050.00, mno ,990, airplanes, runway,doctor
abc2, cats, 000100.00, pops ,991, jets,birds, much
abc3, dogs, 000001.00, treat,992, cars,ford, toyota

for number 3, I would assume use the join command?

your first file can be arranged in this way

 
awk -F, '{printf "%s,%s,%09.2f,%s,%s\n",$1,$2,$3,$4,$5}' filename

Hi, hope below can help you.
But really do not know how many leading '0' you are expecting.

awk -F"," '{
        if(_[$1]==0){
                _[$1]=1
                print $0
        }
}' a > t1
join -t"," -11 -25 t1 b

thank you for the response so far. The field will be 9 digit. So it will vary on how many zero are place. Trying to make it look like this:

000000000
000000100
000000090
000001100
000012000

The last 2 digit represent the cents. Example 5.50 will be 000000550

awk -F"," '{
        if(_[$1]==0){
                _[$1]=1
                print $0
        }
}' a > t1
join -t"," -11 -25 t1 b | nawk -F"," '$3=sprintf("%011.2f",$3);gsub(/\./,"",$3)'

would you be so kind to explain the code a bit for me. Thank you for your help.

any help on explaining the code for me. Thank you

I tried that code however it doesn't work...can someone please help me out. Thank you

I thought that it was working for you till the join command. So I just added the nawk part. I think you need to review the awk/join cmd.

i tried

awk -F"," '{
if(_[$1]==0){
_[$1]=1
print $0
}
}' a > t1

with file a and outputting it to t1 however it doesn't do anything. The output file looks the same as the input file.

when I do this code, "awk -F, '{printf "%s,%s,%09.2f,%s,%s\n",$1,$2,$3,$4,$5}' filename"

it actually does output to something I can use however it duplicate it. Example.

File A:

abc1, defg, 50.00, mno,990
abc2, cats, 100.00, pops,991
abc3, dogs, 1.00, treat,992

running the command:

awk -F"," '$1=sprintf("%s,%s,%011.2f,%s,%s",$1,$2,$3,$4,$5)'

get the following results:

abc1, defg,00000050.00, mno,990 defg 50.00 mno 990
abc2, cats,00000100.00, pops,991 cats 100.00 pops 991
abc3, dogs,00000001.00, treat,992 dogs 1.00 treat 992

how do i not create the duplicate

any help thanks

Try this:

awk -F, '
NR==FNR{a[$1]=$0;next}
a[$5]{$3=sprintf("%011.2f",$3);$5=a[$5]}1' OFS="," file2 file1

Regards