Jahn
December 9, 2010, 11:18am
1
Hi all,
I have a text file consisting of 4 columns. What I am trying to do is see whether column 2 repeats multiple times, and collapse those repeats into one row. For example, here is a snippet of the file I am trying to analyze:
1 Gamble_Win 14.282 0.502
1 Sure_Thing 14.858 0.174
1 Sure_Thing 15.043 0.39
1 Gamble_Loss 15.496 1.236
1 Gamble_Loss 16.982 0.402
1 Gamble_Loss 17.647 0.19
1 Gamble_Win 17.914 0.236
1 Arrow 18.203 0.371
Here is what I am trying to do: For the conditions where "Sure_Thing" and "Gamble_Loss" repeat, I want to collapse it into a single line, adding up all of column 4 over the repeats. So after I gawk it, I want it to look something like this:
1 Gamble_Win 14.282 0.502
1 Sure_Thing 14.858 0.564
1 Gamble_Loss 15.496 1.828
1 Gamble_Win 17.914 0.236
1 Arrow 18.203 0.371
Here is the code I have used to analyze it so far, but it only works for 2 adjacent repeats; I want to generalize it for multiple repeats:
igawk '
BEGIN{
OFS=" "
prevTrial = "-";
prevTime = "0";
prevDur = "0";
}
{
if ($2 == prevTrial)
print $1, prevTrial, prevTime, prevDur+$4;
else if ($2 != prevTrial)
print $0;
prevTrial = $2;
prevTime = $3;
prevDur = $4;
}
' $*
I appreciate any input!
Tytalus
December 9, 2010, 11:59am
2
this close to what you want ? :
# awk '$2==t{s+=$4}$2!=t{print x,s;x=$1" "$2" "$3;t=$2;s=$4}END{print x,s}' infile
1 Gamble_Win 14.282 0.502
1 Sure_Thing 14.858 0.564
1 Gamble_Loss 15.496 1.828
1 Gamble_Win 17.914 0.236
1 Arrow 18.203 0.371
you need to modify as below:-
gawk '
BEGIN{
OFS=" "
prevTrial = "-";
prevTime = "0";
prevDur = "0";
}
{
if ($2 == prevTrial) { next ;}
else if ($2 != prevTrial) {print $0; prevTrial = $2 ; prevTime = $3; prevDur = $4}
}
' infile.txt
O/P:-
1 Gamble_Win 14.282 0.502
1 Sure_Thing 14.858 0.174
1 Gamble_Loss 15.496 1.236
1 Gamble_Win 17.914 0.236
1 Arrow 18.203 0.371
---------- Post updated at 19:03 ---------- Previous update was at 18:59 ----------
Even better you can use the below short code.
gawk '($2==p){ next ; }{print $0 ; p=$2 }' infile.txt
O/P:
1 Gamble_Win 14.282 0.502
1 Sure_Thing 14.858 0.174
1 Gamble_Loss 15.496 1.236
1 Gamble_Win 17.914 0.236
1 Arrow 18.203 0.371
;)
Jahn
December 9, 2010, 12:59pm
4
Thanks Tytalus, that was exactly what I was looking for.
my $val="---";
while(<DATA>){
my @tmp = split;
if($val eq $tmp[1]){
$suffix+=$tmp[3];
}
else{
print $prefix," ",$suffix,"\n" unless $.==1;
$prefix=$tmp[0]." ".$tmp[1]." ".$tmp[2];
$suffix=$tmp[3];
$val=$tmp[1];
}
}
print $prefix," ",$suffix,"\n";
__DATA__
1 Gamble_Win 14.282 0.502
1 Sure_Thing 14.858 0.174
1 Sure_Thing 15.043 0.39
1 Gamble_Loss 15.496 1.236
1 Gamble_Loss 16.982 0.402
1 Gamble_Loss 17.647 0.19
1 Gamble_Win 17.914 0.236
1 Arrow 18.203 0.371
1 Arrow 18.203 0.371