Match a line in File 1 with Column in File 2 and print whole line in file 2 when matched

mustafa.abdulsa · June 7, 2011, 8:31am

Hi Experts,

I am very new to scripting and have a prb since few days and it is urgent to solve so much appreciated if u help me.

i have 2 files

file1.txt

9647810043118
9647810043126
9647810043155
9647810043161
9647810043166
9647810043185
9647810043200
9647810043203
9647810043250
9647810043277

file2.txt

418200100045528,2100,A,ALL,T11,9647810042955,9647802999956
418200100045549,2100,A,ALL,T11,9647810042976,9647802999956
418200100045652,2100,A,ALL,T11,9647810043081,9647802999956
418200100045665,2100,A,ALL,T11,9647810043094,9647802999956
418200100045688,2100,A,ALL,T11,9647810043118,9647802999956
418200100045696,2301,A,ALL,T11,9647810043126,9647802999978
418200100045724,2100,A,ALL,T11,9647810043155,9647802999956
418200100045730,2100,A,ALL,T11,9647810043161,9647802999956
418200100045735,2100,A,ALL,T11,9647810043166,9647802999977
418200100045754,2301,A,ALL,T11,9647810043185,9647802999956
418200100045769,2100,A,ALL,T11,9647810043200,9647802999938
418200100045772,2100,A,ALL,T11,9647810043203,9647802999902
418200100045817,2100,A,ALL,T11,9647810043250,9647802999902
418200100045844,2100,A,ALL,T11,9647810043277,9647802999902
418200100045851,2100,A,ALL,T11,9647810043284,9647802999981
418200100045886,2100,A,ALL,T11,9647810043320,9647802999938
418200100045912,2301,A,ALL,T11,9647810043356,9647802999980
418200100045946,2100,A,ALL,T11,9647810043393,9647802999938
418200100046067,2100,A,ALL,T11,9647810043515,9647802999904
418200100046068,0,A,ALL,T11,9647810043516,9647802999938
418200100046091,2301,A,ALL,T11,9647810043540,9647802999938
418200100046132,2100,A,ALL,T11,9647810043582,9647802999980
418200100046135,2100,A,ALL,T11,9647810043585,9647802999980
418200100046168,2100,A,ALL,T11,9647810043618,9647802999923
418200100046239,2100,A,ALL,T11,9647810043691,9647802999938
418200100046285,2100,A,ALL,T11,9647810043738,9647802999980
418200100046364,2301,A,ALL,T11,9647810043818,9647802999980
418200100046382,2100,A,ALL,T11,9647810043836,9647802999980

this is just a sample, the original files are much bigger (file1 is 100K records and file 2 is 3750K records)

the result.txt should be:

418200100045688,2100,A,ALL,T11,9647810043118,9647802999956
418200100045696,2301,A,ALL,T11,9647810043126,9647802999978
418200100045724,2100,A,ALL,T11,9647810043155,9647802999956
418200100045730,2100,A,ALL,T11,9647810043161,9647802999956
418200100045735,2100,A,ALL,T11,9647810043166,9647802999977
418200100045754,2301,A,ALL,T11,9647810043185,9647802999956
418200100045769,2100,A,ALL,T11,9647810043200,9647802999938
418200100045772,2100,A,ALL,T11,9647810043203,9647802999902
418200100045817,2100,A,ALL,T11,9647810043250,9647802999902
418200100045844,2100,A,ALL,T11,9647810043277,9647802999902

which is doing matching line in 1st file to the 6th column (, separated) in file2; when matched prints the full line in file2 to a new file

the script i am using is

for line in `awk '{print $1 }' $1`   
do                                   
grep $line $2 results_$1             
done

but it is taking too long, would any one please help me in this?

Thank you in advance and much appreciated

panyam · June 7, 2011, 9:00am

awk -F"," 'NR==FNR {a[$0]=$0;next} ($6 in a) { print $0 }' file1 file2

mustafa.abdulsa · June 7, 2011, 9:04am

Thank u very much for ur reply,

i got the error shown:

root@intestn2> awk -F"," 'NR==FNR {a[$0]=$0;next} ($6 in a) { print $0 }' ccc.txt ddd.txt
awk: syntax error near line 1
awk: bailing out near line 1
root@intestn2>

panyam · June 7, 2011, 9:06am

 
Use nawk or /usr/xpg4/bin/awk on Solaris.

mustafa.abdulsa · June 7, 2011, 9:24am

Hi, the awk worked, my files names were wrong, now there is no output

panyam · June 7, 2011, 9:33am

I believe your files are not "," separated then; For the input files you provided , I got the desired output.

Please also make sure that , file1 does not have trailing spaces!!

mustafa.abdulsa · June 7, 2011, 9:37am

Hi, the files are the same as i pasted ..

file1 has no separators it has just those lines with numbers and the , separator has no space before or after the word or the number in file 2, am i missing something?

panyam · June 7, 2011, 10:20am

 
 
Can you try this:
 
awk -F"," 'NR==FNR {a[$1]=$1;next} ($6 in a) { print $0 }' file1 file2

mustafa.abdulsa · June 7, 2011, 10:42am

no more help

the command worked on a colleague machine , why not for my machines? i am using:

686 i686 i386 GNU/Linux
and second is
SunOS intestn2 5.9 Generic_122300-32 sun4u sparc SUNW,Sun-Fire-V890

vgersh99 · June 7, 2011, 10:49am

Once again, as indicated previously, use /usr/bin/nawk or /usr/xpg4/bin/awk on Sun/Solaris.

Also your file1 might have leading/trailing space on each line - make sure that you don't.
You could also try:

nawk 'NR==FNR {a[$1]=$1;next} $6 in a' file1 FS=, OFS=, file2

mustafa.abdulsa · June 7, 2011, 10:53am

linux, no NAWK
unix, no output

i am sorry i am very new to these things

vgersh99 · June 7, 2011, 10:58am

copy/paste (using code tags) the following output:

head -10 file1 | cat -vet

Do the same for file2.

mustafa.abdulsa · June 7, 2011, 11:01am

cat -vet is not found

nsn@linux-nsn:~/flipflop> head -10 scc.txt | cat-vet
bash: cat-vet: command not found

root@intestn2> head -10 scc.txt | cat-vet
cat-vet: not found

vgersh99 · June 7, 2011, 11:03am

cat -vet - not cat-vet
And, please code tags as previously requested!

mustafa.abdulsa · June 7, 2011, 11:06am

from linux

nsn@linux-nsn:~/flipflop> head -10 scc.txt | cat -vet
9647810043118^M$
9647810043126^M$
9647810043155^M$
9647810043161^M$
9647810043166^M$
9647810043185^M$
9647810043200^M$
9647810043203^M$
9647810043250^M$
9647810043277^M$

nsn@linux-nsn:~/flipflop> head -10 sdd.txt | cat -vet
418200100045528,2100,A,ALL,T11,9647810042955,9647802999956^M$
418200100045549,2100,A,ALL,T11,9647810042976,9647802999956^M$
418200100045652,2100,A,ALL,T11,9647810043081,9647802999956^M$
418200100045665,2100,A,ALL,T11,9647810043094,9647802999956^M$
418200100045688,2100,A,ALL,T11,9647810043118,9647802999956^M$
418200100045696,2301,A,ALL,T11,9647810043126,9647802999978^M$
418200100045724,2100,A,ALL,T11,9647810043155,9647802999956^M$
418200100045730,2100,A,ALL,T11,9647810043161,9647802999956^M$
418200100045735,2100,A,ALL,T11,9647810043166,9647802999977^M$
418200100045754,2301,A,ALL,T11,9647810043185,9647802999956^M$

and from unix

root@intestn2> head -10 scc.txt | cat -vet
9647810043118^M$
9647810043126^M$
9647810043155^M$
9647810043161^M$
9647810043166^M$
9647810043185^M$
9647810043200^M$
9647810043203^M$
9647810043250^M$
9647810043277^M$

root@intestn2> head -10 sdd.txt | cat -vet
418200100045528,2100,A,ALL,T11,9647810042955,9647802999956^M$
418200100045549,2100,A,ALL,T11,9647810042976,9647802999956^M$
418200100045652,2100,A,ALL,T11,9647810043081,9647802999956^M$
418200100045665,2100,A,ALL,T11,9647810043094,9647802999956^M$
418200100045688,2100,A,ALL,T11,9647810043118,9647802999956^M$
418200100045696,2301,A,ALL,T11,9647810043126,9647802999978^M$
418200100045724,2100,A,ALL,T11,9647810043155,9647802999956^M$
418200100045730,2100,A,ALL,T11,9647810043161,9647802999956^M$
418200100045735,2100,A,ALL,T11,9647810043166,9647802999977^M$
418200100045754,2301,A,ALL,T11,9647810043185,9647802999956^M$

Thank u for ur patience

panyam · June 7, 2011, 11:12am

So the awk command I suggested giving the result OK on unix but not on Linux?

Please paste here as it is , how you are executing the awk command line in Linux.

mustafa.abdulsa · June 7, 2011, 11:16am

It did not work on my machines, it worked on my colleagues one and it is not with me to get the result, he just tried for me remotely

vgersh99 · June 7, 2011, 11:19am

your files contain ^M-s - they've been edited under Windows and ftp-ed in bin mode to Unix/Linux (most likely).

Run tr -d '\015' < scc.txt > scc.txt_new and tr -d '\015' < sdd.txt > sdd.txt_new .
Or the same with dos2unix myFile myNewFile for both files.

panyam · June 7, 2011, 11:19am

Nothing much we can do then.

Since , awk will work fine either in unix/linux.

Ask your friend/colleague to try with gawk/nawk on the box and see if that does work.

mustafa.abdulsa · June 8, 2011, 5:06am

That's amazing.. thank you so much so much guys it is very fast command

---------- Post updated 06-08-11 at 12:06 PM ---------- Previous update was 06-07-11 at 06:54 PM ----------

Hi guys,

thanks again for ur support.

the below command also gave me the same result as urs but it prints only column 6 and 7 instead of all line in file 2.

 gawk -F ',' 'FNR==NR{ a[$1]=$1;next } ($6 in a){print $6","$7}'

what if i need the reverse? I mean print the 6th and 7th column from file 2 when only the 6th column of file 2 is not found in file 1

Thanks.