Input
#GEO-1-type-1-fwd-Initial 890 1519
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFV
#GEO-1-type-2-fwd-Terminal 1572 2030
HIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK
#GEO-2-type-1-rev-Terminal 2734 2475
EFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ
#GEO-2-type-2-rev-Internal 3041 2804
TEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPK
#GEO-2-type-3-rev-Terminal 4050 3990
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPK
Output
#GEO-1-fwd 890 1519 1572 2030
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFVHIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK
#GEO-2-rev 4050 3990 3041 2804 2734 2475
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKTEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKEFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ
I would like to concatenating and appending the string content based on its header description. For those header description got "fwd",it append its content ascending. For those header description got "rev",it append its content descending. I trying the awk and perl do archive my desired goal now. Thanks a lot for any advice and suggestion.
Straight forward approach:
awk -F '[ -]' '{if (NF>1){r=$1"-"$2"-"$5; m=$5;
if (m=="fwd"){A[r]=A[r]" "$8" "$9}
else if (m=="rev"){A[r]=$8" "$9" "A[r]} }
else if (!/^$/){
if (m=="fwd") {B[r]=B[r]$1}
else {if (m=="rev") B[r]=$1B[r]} } }
END{for (i in A) {print i, A; print B }}' infile
Thanks a lot, Scrutinizer.
Your code works perfectly.
But it will give the output result like this:
#GEO-2-rev 4050 3990 3041 2804 2734 2475
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKTEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKEFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ
#GEO-1-fwd 890 1519 1572 2030
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFVHIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK
In between, can I ask you about the meaning of A/B[r] and what is the $9 represent in your awk code?
What I understand is the header only from $1-$8,right?
Thanks again first, Scrutinizer.
my $key;
while(<DATA>){
chomp;
if(/-/){
my @tmp = split(/[- ]/,$_,6);
$key=$tmp[4];
if($hash{$tmp[4]}->{TITLE} == ""){
$hash{$key}->{TITLE}=$tmp[0]."-".$tmp[0]. "-".$tmp[4];
}
else{
$hash{$key}->{TITLE}=$hash{$key}->{TITLE}. " ".$tmp[6];
}
}
else{
$hash{$key}->{DATA}=$hash{$key}->{DATA}.$_;
}
}
foreach my $key( keys %hash){
print $hash{$key}->{TITLE},"\n";
print $hash{$key}->{DATA},"\n";
}
__DATA__
#GEO-1-type-1-fwd-Initial 890 1519
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFV
#GEO-1-type-2-fwd-Terminal 1572 2030
HIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK
#GEO-2-type-1-rev-Terminal 2734 2475
EFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ
#GEO-2-type-2-rev-Internal 3041 2804
TEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPK
#GEO-2-type-3-rev-Terminal 4050 3990
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPK
patrick87:
Thanks a lot, Scrutinizer.
Your code works perfectly.
But it will give the output result like this:
#GEO-2-rev 4050 3990 3041 2804 2734 2475
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKTEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKEFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ
#GEO-1-fwd 890 1519 1572 2030
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFVHIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK
In between, can I ask you about the meaning of A/B[r] and what is the $9 represent in your awk code?
What I understand is the header only from $1-$8,right?
Thanks again first, Scrutinizer.
Hi Patrick,
That is because in awk the order of associative array elements is undetermined. On my computer it gets printed in the right order but that is by chance. If that is important we'd have to something to ensure the right order. Single spaces and - are used as separation characters so there are more fields, hence the $9. We could improve the robustness by using * in the -F specification and then using the proper field number.
Thanks for your explanation, Scrutinizer.
I get what you mean now
I will try to fix the problem by make sure they are in the right order.
I got try your script few times just now.
All give the "rev" result first then only "fwd"
Thanks again, Scrutinizer.
Perhaps you could give this a try then:
awk -F '[ -]*' '{ if (NF>1){
r=$1"-"$2"-"$5; m=$5;
if (!A[r]) O[i++]=r
if (m=="fwd") A[r]=A[r]" "$7" "$8
else if (m=="rev") A[r]=$7" "$8" "A[r]
}
else if (NF>0)
if (m=="fwd") B[r]=B[r]$1
else if (m=="rev") B[r]=$1B[r]
}
END{for (j=0;j<i;j++) {k=O[j];print k, A[k]; print B[k] }}' infile
---------- Post updated 16-12-09 at 00:24 ---------- Previous update was 15-12-09 at 11:47 ----------
Slightly simplified
awk -F '[ -]*' 'NF>1 { r=$1"-"$2"-"$5; m=$5; if (!A[r]) O[i++]=r
if (m=="fwd") A[r]=A[r]" "$7" "$8
else if (m=="rev") A[r]=$7" "$8" "A[r] }
NF==1 { if (m=="fwd") B[r]=B[r]$1
else if (m=="rev") B[r]=$1B[r] }
END { for (j=0;j<i;j++) {k=O[j];print k, A[k]; print B[k]} }' infile
ichigo
December 16, 2009, 6:47am
8
gawk '/^#.*fwd.*/{
o=$0
gsub(/-type.*/,"",o)
fh=o
fstr=$(NF-1) OFS $NF OFS fstr
getline
fl=$0fl
}
/^#.*rev.*/{
o=$0
gsub(/-type.*/,"",o)
rh=o
rstr=$(NF-1) OFS $NF OFS rstr
getline
rl=$0rl
}
END{
split(fstr,F," ")
fidx=asort(F,farr)
for(i=1;i<=fidx;i++){
fs=fs OFS farr
}
print fh"-fwd "fs
print fl
print ""
split(rstr,R," ")
ridx=asort(R,rarr)
for(i=ridx;i>=0;i--){
rs=rs OFS rarr
}
print rh"-rev "rs
print rl
}