What I would like to do is if the lines with % have the same name, then combine the last 9 letters of the string underneath the last occurrence of that ID with the first 9 letters of the string underneath the first occurrence of that ID.
Notice how %GOGG now has "nfungoson" put together with "doggocatc" and the same with %SONSON. It's worth mentioning that the ID patterns can exist more than 2 times.
Is this classwork/homework?
It can be perfectly done with awk, using its associative arrays and its substr and getline functions.
What have you tried already?
This is not homework actually. It's just an annoying problem I've run into and can't figure out. What I have tried is the following but it is no where near useful in my opinion.
awk '
{
if ( $1 ~ /^%/ )
{
A[$1]++
i = $1
}
else
R[A,i] = $0
}
END {
for ( k in A )
{
print k
print substr(R[A[k],k],length(R[A[k],k])-8) substr(R[1,k],1,9)
}
}
' file
Scrutinizer, yours seems to work only when there are multiple occurrences of the string ID and not in events when the string ID exists only once. It should still perform the same for those situations.
Your solution is actually what I was looking for, expect that for single occurring %ID's are as long as the input. So something like this would be the same in the output as in the input.
Assuming we have say 4 occurrences of the same ID and want to combine the end of ID 4 with the beginning of ID4, the beginning of ID 3, and the beginning of ID 2, but not with the beginning of ID1. How can this be done?
Basically, the combinations should look like this.
end of 4 with beginning of 4
with beginning of 3
with beginning of 2
end of 3 with beginning of 3
with beginning of 2
end of 2 with beginning of 2
end of 1 with beginning of 1