compare the column from 3 files and merge that line

ganesh_mak · April 10, 2008, 7:24am

I have 3 file, each of has got 80000 records.

file1.txt
-----------------------
ABC001;active;modify;accept;
ABC002;notactive;modify;accept;
ABC003;notactive;no-modify;accept;
ABC004;active;modify;accept;
ABC005;active;no-modify;accept;

file2.txt
---------------------------
ABC001;change;modify;numbers;
ABC002;no-change;print;fractions;
ABC003;different;color;accept;
ABC005;done;modify;accept;

file3.txt
---------------------------
ABC001;got it;modify;numbers;
ABC002;happening;print;fractions;
ABC003;different;color;accept;
ABC004;classify;modify;accept;

Now i have to compare the first field in the file1 (ABX001/002/003...)
with the file2 , whether file2 has got the first field of file1 , same comparison with file3.
if exists then merge all three to get output like below.

ABC001;active;modify;accept;change;modify;numbers;got it;modify;numbers;

Merge such that

Line1 from file1 and filed2.. to field_n of the matched line from file2 and file3.

thanks
Ganesh

ghostdog74 · April 10, 2008, 8:27am

you can combine them into one file first.

cat file file1 file2 > newfile
awk 'BEGIN{OFS=FS=";"}
{
  org=$1
  $1=""
  a[org]=a[org]";"$0  
} 
END {
 for (i in a) print i,a
}' newfile

krishmaths · April 10, 2008, 8:41am

Try join command. It would be simple. But the limitation is only two files can be joined at a time.

This works like join in SQL.

man join

ganesh_mak · April 11, 2008, 5:51am

ghostdog74:

you can combine them into one file first.

cat file file1 file2 > newfile
awk 'BEGIN{OFS=FS=";"}
{
  org=$1
  $1=""
  a[org]=a[org]";"$0  
} 
END {
 for (i in a) print i,a
}' newfile

thanks for the reply

with the above code i am getting output like this

ABC001;;;active;modify;accept;;;change;modify;numbers;;;got t;modify;numbers;
ABC002;;;notactive;modify;accept;;;no-change;print;fractions;;;happening;print;fractions;
ABC003;;;notactive;no-modify;accept;;;different;color;accept;;;different;color;accept;
ABC004;;;active;modify;accept;;;classify;modify;accept;
ABC005;;;active;no-modify;accept;;;done;modify;accept;

but i need like this

ABC001;active;modify;accept;change;modify;numbers;got it;modify;numbers;
ABC002;notactive;modify;accept;no-change;print;fractions;happening;print;fractions;
ABC003;notactive;no-modify;accept;different;color;accept;different;color;accept;
ABC004;active;modify;accept;;;;classify;modify;accept;
ABC005;active;no-modify;accept;done;modify;accept;;;;

No two semicolons after ABCXXX (; .

ABC004 is there in file1 and file3 but not in file2. so output should be 3 blank semicolons .ABC004;active;modify;accept;;;;classify;modify;accept;

same for ABC005 blank in place of output of file3.

if ABCxxx is there in file2 and file3 and not in file1 then output should be
ABCxxx;;;out21;out22;out23;out31;out32;out33;

thanks
Ganesh

radoulov · April 11, 2008, 8:57am

awk '{ s = $1; sub(/^[^;]*;/,"")
  if (FILENAME == "file1") 
    _ = $0
  if (FILENAME == "file2" && (s in _)) 
    f = _ $0
  if (FILENAME == "file3" && (s in f)) 
    f_ = f $0
} END {
for (k in f_)
  print k FS f_[k]
}' FS=\; file1 file2 file3

Use nawk or /usr/xpg4/bin/awk on Solaris.

ghostdog74 · April 11, 2008, 8:57am

its left as exercise for you, on purpose

ganesh_mak · April 14, 2008, 6:14am

thanks for the reply its working.
i need to add few more conditions to this . i ll try these.

but i am not able to append numbers to the string

for(i=0;i<max;i++) {
_[s_i] = i;
print _[s_i]
}

like s_1 = 1;s_2=2; ........

it does not work ?
how do i make it work

era · April 14, 2008, 6:42am

Try ["s_" i] with quotes around the string part. As it is, you are declaring and using a variable called s_i which is not what you want.

ganesh_mak · April 14, 2008, 7:56am

sorry i forgot to mention

s is also a variable holding the sting value

s_i

format it