Merging fields --- Join is not working

repinementer · May 22, 2009, 5:35am

Hi GUYS sorry for putting simple query. I have tried the methods posted previously in this site but I'm unable to join the similar values in different columns of different files.
I used sort -u file1 and join but no use.??
I'm attaching my inputfiles.Plz chek them
I have two files.
1st file
1234
133
1345
134
23
4555
secondfile
1234 tab kshgjkghj
23 tab drjghfg
134 tab drkhgjgj
1345 tab djghf
4555 tab khdjhgjfg
133tabjghhdgf

output
1234tab1234 tab kshgjkghj
133tabjghhdgf
1345tabdjghf
134tabdrkhgjgj
23tabdrjghfg
4555tab

vidyadhar85 · May 22, 2009, 5:49am

if both files has same no of lines then sort them then paste them side by side

sort -k1 file1
sort -k1 file2
paste -d" " file1 file2

repinementer · May 22, 2009, 5:51am

no they are not having same no of lines. Second file is huge. You can see the attachments I have posted.
I tried those commands
sort, print, egrep, awk etc. but nopes.
Thanx anyways for your time

vidyadhar85 · May 22, 2009, 5:59am

then you can try

while read line ; do
awk '$1'==$line'{print '$line'"\t"$0}' file2
done < file1

repinementer · May 22, 2009, 6:09am

Hey sorry it's not running at all.
Could you please try if you have any script on my attachment files.

cambridge · May 22, 2009, 6:19am

You haven't made it entirely clear whether your output should only include data from the q1 file, or include data from both files even where it was missing from one. So, here are a few alternatives, pick the one which suits your needs.

The first one uses grep, the \t in the grep command may not work with your version of grep, so replace it with a tab character instead. This one only outputs two columns (from the q2 file) after matching it from q1:

sed "s/.*/grep '^&\t' q2.txt/" q1.txt | sh

The second one uses awk, so it will be quicker, and outputs three columns. It also only outputs lines found in q1 (in the order they appear in q2):

awk 'BEGIN { while (getline < "q1.txt") data[$1]=1 } { if (data[$1]) print $1 "\t" $0 }' q2.txt

The third one is very much like the second, but includes lines from q2 that were not found in q1:

awk 'BEGIN { while (getline < "q1.txt") data[$1]=1 } { if (data[$1]) print $1 "\t" $0; else print "\t" $0 }' q2.txt

vidyadhar85 · May 22, 2009, 6:22am

its running fine

 
fnsonlu1-/home/> cat file1
1234
133
1345
134
23
4555
fnsonlu1-/home/> cat file2
1234 tab kshgjkghj
23 tab drjghfg
134 tab drkhgjgj
1345 tab djghf
4555 tab khdjhgjfg
133tabjghhdgf
fnsonlu1-/home/l> while read line ; do
awk '$1'==$line'{print '$line'"\t"$0}' file2
done < vv
1234    1234 tab kshgjkghj
1345    1345 tab djghf
134     134 tab drkhgjgj
23      23 tab drjghfg
4555    4555 tab khdjhgjfg

panyam · May 22, 2009, 6:35am

@repinementer ,

It would be difficult to understand wht your requiremnt is unless you post and mention ur requiemnt clearly

Pls post a simple and valid input and output your expecting

repinementer · May 22, 2009, 6:56am

Thank you Vidhya and Cambridge for you valuble time and suggestions.
The codes are working great.
But vidhya ur code is good but not robust (I mean too slow!)
Anyways thanx once again.
Great Work

ghostdog74 · May 22, 2009, 7:35am

vidyadhar85:

its running fine

 
fnsonlu1-/home/> cat file1
1234
133
1345
134
23
4555
fnsonlu1-/home/> cat file2
1234 tab kshgjkghj
23 tab drjghfg
134 tab drkhgjgj
1345 tab djghf
4555 tab khdjhgjfg
133tabjghhdgf
fnsonlu1-/home/l> while read line ; do
awk '$1'==$line'{print '$line'"\t"$0}' file2
done < vv
1234    1234 tab kshgjkghj
1345    1345 tab djghf
134     134 tab drkhgjgj
23      23 tab drjghfg
4555    4555 tab khdjhgjfg

usually, there's no need to use while read loop again if you are using awk.

# awk 'FNR==NR{ a[$1]=$0; next}a[$1]!=""{print $1,a[$1]}' file2 file1
# awk 'FNR==NR{ a[$1]; next}($1 in a){  print $1,$0}' file1 file2

repinementer · May 26, 2009, 1:07am

Hi Vidhya

Thanks for great script
How could I run the same script if I have more columns in the second file??
Could you plz help me on this.

Hi Ghost dog

Your commands are working like heaven Thank you so much:)