Stuck on a matching, copying type problem

derek3131 · April 4, 2008, 8:26am

Hi all, I am new to these forum and to scripting in general for that matter. i have been looking through the forums for something that could explain my problem. I have be come pretty familiar with sed and awk but I can not seem to figure this out... I am trying to match data from 3 files but every thing I have tried doesn't seem to do what i need it to. And maybe this is something better suited for perl or python ( which i have no experience with ).. ok here goes...

Problem:

Want to match $1b to $2a, if match copy $2b to $4a
Want to match $1c to $2a, if match copy $2c to $4a (and overwrite the data in $4a)

file_a is comma seperated and has 4 fields

$1a,$2a,$3a,$4a

file_b is comma seperated and has 2 fields

$1b,$2b

file_c is comma seperated and has 2 field

$1c,$2c

Here is example of the data in the files:

berserker# more file_a

gpf135cm,gpf090079cs,purple,
gpf136cm,gpf100002cs,blue,
gpf138cm,gpf100065cs,purple,
gpf140cm,gpf110005cs,purple,
gpf141cm,gpf110037cs,purple,
gpf139cm,gpf100101cs,purple,
gpf139cm,gpf100119cs,purple,

 
berserker# more file_b

hou020067cs, synopsis = Urgent message file problems reported
hou090022cs, synopsis = Urgent message file problems reported
hou090056cs, synopsis = Node was detected as causing a spike by job 16811167
hou090064cs, synopsis = Node was disabled by Jeff ext8592
gpf090079cs, synopsis = Excessive ECC errors detected
gpf100002cs, synopsis = Excessive ECC errors detected
gpf090079cs,Apr  1 20:28:20 gpf090079cs kernel: MC1: CE page 0x1ac685 offset 0x240 grain 8 syndrome 0x4a row 0 channel 0 label "": bluesmoke_k8
gpf100002cs, 


berserker# more file_c

gpf090079cs MC1: row 0 channel 0

So far I have figured out how to extract all this information and organize it in these files using AWK and SED, but I am not sure that I can get it to do this type of problem... Any help on this problem that i have been beating on my keyboard for would be greatly appreciated. Thank you

era · April 5, 2008, 10:50am

The join command is a completely generic solution to this type of problem. However, it requires input files to be sorted. You could create awk scripts which read in one file in the BEGIN part and then processes the other as its regular input. Perl or Python might suit themselves better to this type of problem, though. Learning the basics of Perl is not particularly challenging if you are already familiar with sed and awk.

derek3131 · April 7, 2008, 9:52am

Cool, thanks for you reply Era. I in fact went to a half-priced book store this weekend and picked up O'reilly's Learning Perl Third edition for 5 bucks, I think the newest edition is four. So I am sure it will be find.

ag79 · April 7, 2008, 10:05am

i see that at any point of time, you'll be handling two files. since you're familiar with awk, the following may work for you, even though I agree perl might be a better tool for the job.

for FILE_1 in file_b file_c
do

#Read file_a line by line
while read file1line
do
#Extract the column from file_b or file_c that you want to compare to in file a
coltocompare_bc=`awk blah blah`

#Here, get the column from file a
coltocompare_a=`awk blah blah`

#Here compare coltocompare_bc and coltocompare_a and do what you like.
echo "Maybe this will work, maybe not"

done < FILE_1

done

derek3131 · April 14, 2008, 5:18pm

Good gravy, why and i not getting how to do this seemingly simple problem? I really feel that I am making this harder then it needs to be..

#!/usr/bin/perl
#
$data = 'hou100cm,hou010008cs,purple
hou132cm,hou090026cs,purple
hou133cm,hou090057cs,purple
hou134cm,hou090064cs,blue
hou190cm,hou230095cs,blue
hou193cm,hou240058cs,purple
hou195cm,hou240124cs,purple
hou195cm,hou240125cs,purple
gpf132cm,gpf090013cs,purple
gpf132cm,gpf090028cs,purple
gpf133cm,gpf090036cs,purple
gpf133cm,gpf090051cs,blue
gpf133cm,gpf090059cs,purple
gpf134cm,gpf090067cs,purple
gpf134cm,gpf090079cs,blue
gpf136cm,gpf100002cs,blue
gpf136cm,gpf100003cs,purple
gpf136cm,gpf100024cs,blue
gpf141cm,gpf110059cs,blue
gpf139cm,gpf100099cs,blue
gpf139cm,gpf100100cs,purple
gpf139cm,gpf100101cs,purple';

$blue = 'gpf100024cs, MC1: row 1 channel 1
gpf100002cs, Bad DIMM slot (motherboard)
gpf090079cs, MC1: row 0 Channel 0
hou230095cs, Urgent message file problems reported
hou090064cs,  NODE USE DISABLED BY Jeff';

(master, node, status) = split(',', $data);
(node2,desc) = split(',', $blue);

I have tried to match $blue{$node2} =~ $data{$node} ;

then join($blue{$desc}) to $data for days now, i think i am confusing myself sense I think I have come up with at least 5 different ways that I thought would work on this... but no luck...

Can someone tell me what I am doing wrong here, my eyes have grown tired and my mind numb, please help if you can.

Basically two files $data and $Blue
Data has 0,1,2 fields and Blue has 0 and 1 field.
If $blue{field0} =~ $data{field1}, then copy or move $blue{field1} to $data{field3}. And of course field3 is newly created.

thank you for anyhelp .