I have a file in below format:
file01.txt
TERM
TERM
TERM
ABC 12315 68.53 12042013 165144
ABC 12315 62.12 12042013 165145
ABC 12315 122.36 12052013 165146
ABC 12315 582.18 12052013 165147
ABC 12316 2.36 12052013 165141
ABC 12316 68.53 12042013 165142
ABC 12316 62.12 12042013 165143
ABC 12316 122.36 12052013 165144
ABC 12316 122.36 12052013 165145
my desired output will be:
ABC 12315 68.53 12042013 165144
ABC 12315 582.18 12052013 165147
ABC 12316 2.36 12052013 165141
ABC 12316 122.36 12052013 165145
In this file all the values are sorted by column 2 and 5.
I've tried the following command:
awk '/^ABC/ {if (lastval != $5-1 ) { print line;print $0} lastval = $5; line = $0 }' file01.txt
which adds an extra line at the beginning and skips the last row as well:
ABC 12315 68.53 12042013 165144
ABC 12315 582.18 12052013 165147
ABC 12316 2.36 12052013 165141
Seeking for your assistance regarding on how to modify the one-liner in order to:
ABC 12315 68.53 12042013 165144
ABC 12315 582.18 12052013 165147 4
ABC 12316 2.36 12052013 165141
ABC 12316 122.36 12052013 165145 5
- if a value is missing between first and last value do not split e.g.:
ABC 12316 62.12 12042013 165143
is missing from the file01.txt
The final output should be:
ABC 12315 68.53 12042013 165144
ABC 12315 582.18 12052013 165147 4
ABC 12316 2.36 12052013 165141
ABC 12316 122.36 12052013 165145 4
Thank you in advance for your help
So you want the first and last line of each group (as determined by $2) plus a count of how many lines there were in the group?
It will be difficult to make this a "one-liner" as printing the count requires it to read ahead, to know when the "group" ends.
Yes, that's correct.
It doesn't have to be a one-liner. I use the one-liner only on my trials.
$ cat grp2.awk
!/^ABC/ { next }
!($2 in A) { if(LAST) print LAST,A[LID] ; print }
{ A[$2]++; LAST=$0; LID=$2 }
END { if(LAST) print LAST, A[LID] }
$ awk -f grp2.awk data
ABC 12315 68.53 12042013 165144
ABC 12315 582.18 12052013 165147 4
ABC 12316 2.36 12052013 165141
ABC 12316 122.36 12052013 165145 5
$
Your input data includes five lines for 12316, not four.
1 Like
Thank you, worked very well.
The output with 4 lines was when
ABC 12316 62.12 12042013 165143
was missing. I tested your script and works well even if a value is missing from group.
How ?
I am also getting result like corona, with assumption file is sorted
$ awk '!/^ABC/{next}p!=$5-1{printf last ? last FS x[l]++ RS $0 RS : $0 RS}{p=$5;last=$0;l=$2;x[$2]++}END{print last FS x[l]++}' file
ABC 12315 68.53 12042013 165144
ABC 12315 582.18 12052013 165147 4
ABC 12316 2.36 12052013 165141
ABC 12316 122.36 12052013 165145 5
1 Like
Hi,
Thank you for your reply.
I wanted to be able to use the script even if the values in column $5 are not consecutive,
For example row "
ABC 12316 62.12 12042013 165143
" is missing
The
file01.txt
would become:
TERM
TERM
TERM
ABC 12315 68.53 12042013 165144
ABC 12315 62.12 12042013 165145
ABC 12315 122.36 12052013 165146
ABC 12315 582.18 12052013 165147
ABC 12316 2.36 12052013 165141
ABC 12316 68.53 12042013 165142
ABC 12316 122.36 12052013 165144
ABC 12316 122.36 12052013 165145
Here is the result of your one-liner:
awk '!/^ABC/{next}p!=$5-1{printf last ? last FS x[l]++ RS $0 RS : $0 RS}{p=$5;last=$0;l=$2;x[$2]++}END{print last FS x[l]++}' file02.txt
ABC 12315 68.53 12042013 165144
ABC 12315 582.18 12052013 165147 4
ABC 12316 2.36 12052013 165141
ABC 12316 62.12 12042013 165143 3
ABC 12316 122.36 12052013 165145
ABC 12316 122.36 12052013 165145 5
My desired output would be:
ABC 12315 68.53 12042013 165144
ABC 12315 582.18 12052013 165147 4
ABC 12316 2.36 12052013 165141
ABC 12316 122.36 12052013 165145 4
Sorry if I couldn't describe more accurate from the first trial.
Best Regards
In your first post you were checking $5
this made confusion, anyways this will work and corona's solution also, it checks $2
$ cat file
TERM
TERM
TERM
ABC 12315 68.53 12042013 165144
ABC 12315 62.12 12042013 165145
ABC 12315 122.36 12052013 165146
ABC 12315 582.18 12052013 165147
ABC 12316 2.36 12052013 165141
ABC 12316 68.53 12042013 165142
ABC 12316 122.36 12052013 165144
ABC 12316 122.36 12052013 165145
$ awk '!/^ABC/{next}p!=$2{print l ? l FS x[p] RS $0 : $0}{p=$2;l=$0;x[$2]++}END{print l FS x[p] }' file
ABC 12315 68.53 12042013 165144
ABC 12315 582.18 12052013 165147 4
ABC 12316 2.36 12052013 165141
ABC 12316 122.36 12052013 165145 4
1 Like
Thank you so much for your time.