Please guide if you know how to solve this.
I have a tab delimited INPUT FILE where each record is separated by -----
-----
ABC 4935402 4936680 Pattern=Cheers07080.1
ABC 4932216 4932368 Pattern=Cheers07080.1
ABC 4931932 4932122 Pattern=Cheers07080.1
-----
ABC 4675209 4676057 Pattern=Cheers06520.1
ABC 4676269 4676713 Pattern=Cheers06520.1
ABC 4682346 4682510 Pattern=Cheers06520.1
ABC 4682606 4682796 Pattern=Cheers06520.1
-----
ABC 48341587 48344548 Pattern=Cheers45590.1
-----
ABC 34297519 34298743 Pattern=Cheers31410.1
ABC 34298957 34299678 Pattern=Cheers31410.1
-----
The OUTPUT file required is :
-----
Xyz (4935402-4932368)-1 Pattern=Cheers07080.1
Xyz (4932216-4932122)-1 Pattern=Cheers07080.1
-----
Xyz (4676269-4676057)-1 Pattern=Cheers06520.1
Xyz (4682346-4676713)-1 Pattern=Cheers06520.1
Xyz (4682606-4682510)-1 Pattern=Cheers06520.1
-----
Xyz 0 Pattern=Cheers45590.1
-----
Xyz (34298957-34298743)-1 Pattern=Cheers31410.1
-----
Output is based on this criteria:
In a record, If column2(row1) > column2(row2) then subtract row2(column3) from row1(column2) and so on till the rows are found. But if column2(row1) < column2(row2) then subtract row1(column3) from row2(column2) and so on.
If there is only 1 row in a record then print 'Xyz 0 value of Column4'
(4935402-4932368)-1 has been written only for clarity but the value of this expression is required.
Thanks in advance.
Something like this?
$ cat file
-----
ABC 4935402 4936680 Pattern=Cheers07080.1
ABC 4932216 4932368 Pattern=Cheers07080.1
ABC 4931932 4932122 Pattern=Cheers07080.1
-----
ABC 4675209 4676057 Pattern=Cheers06520.1
ABC 4676269 4676713 Pattern=Cheers06520.1
ABC 4682346 4682510 Pattern=Cheers06520.1
ABC 4682606 4682796 Pattern=Cheers06520.1
-----
ABC 48341587 48344548 Pattern=Cheers45590.1
-----
ABC 34297519 34298743 Pattern=Cheers31410.1
ABC 34298957 34299678 Pattern=Cheers31410.1
-----
$
$ awk '/-----/{
if(f){
print "Xyz\t0" "\t" s
}
print; getline
a=$2; s=$NF; f=1
next
}
/ABC/{
print "Xyz\t" a-$3-1 "\t" $NF
a=$2; f=0
}' file
-----
Xyz 3033 Pattern=Cheers07080.1
Xyz 93 Pattern=Cheers07080.1
-----
Xyz -1505 Pattern=Cheers06520.1
Xyz -6242 Pattern=Cheers06520.1
Xyz -451 Pattern=Cheers06520.1
-----
Xyz 0 Pattern=Cheers45590.1
-----
Xyz -2160 Pattern=Cheers31410.1
-----
$
Thanks for your response Franklin. I'll take care of the text formatting. But there is some problem with the output as there are negative values in the output, whereas a smaller number has to be subtracted from a larger number each time.
Can you post the desired output from the given input file?
The desired OUTPUT File is :
-----
Xyz 3033 Pattern=Cheers07080.1
Xyz 93 Pattern=Cheers07080.1
-----
Xyz 211 Pattern=Cheers06520.1
Xyz 5632 Pattern=Cheers06520.1
Xyz 97 Pattern=Cheers06520.1
-----
Xyz 0 Pattern=Cheers45590.1
-----
Xyz 213 Pattern=Cheers31410.1
-----
The difference between records is - numbers in row 2 are either in descending order or ascending order and the subtraction varies accordingly.
Thanks.
If I understand your question then this should be the criteria:
In that case you can't get the desired output as you posted.
This command uses the criteria above:
awk '/-----/{
if(f){
print "Xyz\t0" "\t" s
}
print; getline
a=$2; b=$3; s=$NF; f=1 # a = column2(row1), b = row1(column3)
next
}
/ABC/{
if(a>$2){ # if column2(row1) > column2(row2)
print "Xyz\t" a-$3-1 "\t" $NF # + print row1(column2)-row2(column3)-1
}
else {
print "Xyz\t" $2-b-1 "\t" $NF # else print row2(column2)-row1(column3)-1
}
a=$2; f=0
}' file
and the output is:
-----
Xyz 3033 Pattern=Cheers07080.1
Xyz 93 Pattern=Cheers07080.1
-----
Xyz 211 Pattern=Cheers06520.1
Xyz 6288 Pattern=Cheers06520.1
Xyz 6548 Pattern=Cheers06520.1
-----
Xyz 0 Pattern=Cheers45590.1
-----
Xyz 213 Pattern=Cheers31410.1
-----
Regards
I have tried to simplify my problem. Please see if you can help.
Now there is only increasing numbers in column.
INPUT FILE
-----
ABC 4675209 4676057 Pattern01
ABC 4676269 4676713 Pattern01
ABC 4682346 4682510 Pattern01
ABC 4682606 4682796 Pattern01
-----
ABC 48341587 48344548 Pattern09
-----
ABC 34297519 34298743 Pattern10
ABC 34298957 34299678 Pattern10
-----
OUTPUT FILE
-----
Xyz 212 [4676269 - 4676057] Pattern01
Xyz 5633 [4682346 - 4676713] Pattern01
Xyz 96 [4682606 - 4682510] Pattern01
-----
Xyz 0 Pattern09
-----
Xyz 214 [34298957 - 34298743] Pattern10
-----
values written in [ ] are only for explanation purpose.
Thanks in advance.
Where is the -1? Anyway, there was a bug in the code (forgot to set a variable at the end: b=$3) but this should work:
$ cat file
-----
ABC 4675209 4676057 Pattern01
ABC 4676269 4676713 Pattern01
ABC 4682346 4682510 Pattern01
ABC 4682606 4682796 Pattern01
-----
ABC 48341587 48344548 Pattern09
-----
ABC 34297519 34298743 Pattern10
ABC 34298957 34299678 Pattern10
-----
$ awk '/-----/{
if(f){
print "Xyz\t0" "\t" s
}
print; getline
a=$2; b=$3; s=$NF; f=1 # a = column2(row1), b = row1(column3)
next
}
/ABC/{
if(a>$2){ # if column2(row1) > column2(row2)
print "Xyz\t" a-$3 "\t" $NF # + print row1(column2)-row2(column3)-1
}
else {
print "Xyz\t" $2-b "\t" $NF # else print row2(column2)-row1(column3)-1
}
a=$2; b=$3; f=0
}' file
-----
Xyz 212 Pattern01
Xyz 5633 Pattern01
Xyz 96 Pattern01
-----
Xyz 0 Pattern09
-----
Xyz 214 Pattern10
-----