Organize output with awk

Ophiuchus · June 28, 2018, 4:28pm

Hello, maybe someone could help me with this.

I'm trying to print in a more ordered way this input.

the input file is:

AB   XY  UT   ZW                                      PRAT   
  0   3    4  7214800                                    93
  0   3    4  730770                                     93
         VLL  VMI  VKO  VER
          39         5

  0   1    4  7318                                       93
  0   2    4  87477                                     187
         VLL  VMI  VKO  VER
         102    7       0-0

  0   9    4  49365                                     185    
  0   1    4  5307                                       93

For some lines there are 2 related lines that contain "VLL VMI VKO VER" and in next line a corresponding value.
For example:

The line where $4=730770 (header is ZW), has the value VLL=39 and VKO=5
The line where $4=87477 has the value VLL=102, VMI=7 and VER=0-0

So, I would like to print this related values in the sale line when happens.

The output I'm looking for is like this:

AB   XY  UT   ZW                PRAT   VLL  VMI  VKO  VER 
  0   3    4  7214800              93
  0   3    4  730770               93   39         5
  0   1    4  7318                 93
  0   2    4  87477               187  102    7       0-0
  0   9    4  49365               185    
  0   1    4  5307                 93

I've was able to get this awk script but is far from desired output since basically is removing the lines with "VLL VMI VKO VER" and leaving the values in same place (in next line).

awk 'BEGIN{print "AB   XY  UT   ZW        PRAT    VLL  VMI  VKO  VER"}
NR>1 && $0!~"VLL" {print}
/VLL  VMI  VKO  VER/{getline v; print v}' inputfile

AB   XY  UT   ZW        PRAT    VLL  VMI  VKO  VER
  0   3    4  7214800                                    93
  0   3    4  730770                                     93
          39         5

  0   1    4  7318                                       93
  0   2    4  87477                                     187
         102    7       0-0

  0   9    4  49365                                     185
  0   1    4  5307                                       93

Thanks in advance

Scrutinizer · June 29, 2018, 4:01am

Try something like this:

awk '{i=1} NR==1{print $1,$4; i=2}{print $i; print $(i+1), $(i+3)}' RS=  FS='\n' file

RudiC · June 29, 2018, 4:07am

How about

awk '
NR == 1         {HD = "      VLL  VMI  VKO  VER"
                 printf "%s %s", $0, HD
                 next
                }
$0 ~ HD         {getline
                 printf "%s", $0
                 getline
                 next
                }
                {printf RS "%s", $0
                }
END             {printf RS
                }
' file
AB   XY  UT   ZW                                      PRAT          VLL  VMI  VKO  VER
  0   3    4  7214800                                    93
  0   3    4  730770                                     93          39         5
  0   1    4  7318                                       93
  0   2    4  87477                                     187         102    7       0-0
  0   9    4  49365                                     185    
  0   1    4  5307                                       93

Ophiuchus · June 29, 2018, 9:58am

Hello Scrutinizer and RudiC,

Thanks for the help. I noticed something I didn't considered. If I want to pass the output for example to Excel and I say the delimiter is space, the values for VLL, VMI, VKO, VER are located in wrong columns when for example VMI or VKO is empty in input file. Can be added an output delimiter to your solutions?

Thanks again

RudiC · June 29, 2018, 10:54am

Sliding in an output delimiter is not a difficult task, but finding empty fields in the input when the input delimiters are a couple of spaces, and the input fields are space padded, too, is NOT a deterministic task. In this case, I'll take 5 spaces (two as field delimiters, three for field width) as an empty field identifier, but YMMV. Try

awk '
NR == 1         {HD = " VLL  VMI  VKO  VER"
                 $0 = $0 HD
                 $1 = $1
                 printf "%s", $0
                 next
                }
$0 ~ HD         {getline
                 sub (/^ */, _)
                 gsub (/     /, OFS)
                 $1 = $1
                 printf OFS "%s", $0
                 getline
                 next
                }
                {$1 = $1
                 printf RS "%s", $0
                }
END             {printf RS
                }
' OFS="," file
AB,XY,UT,ZW,PRAT,VLL,VMI,VKO,VER
0,3,4,7214800,93
0,3,4,730770,93,39,,5
0,1,4,7318,93
0,2,4,87477,187,102,7,,0-0
0,9,4,49365,185
0,1,4,5307,93

Ophiuchus · June 29, 2018, 1:37pm

Thanks again RudiC. It works with the sample file but as you said different results could be obtained. Trying other input the output is not correct for some values of 4 last columns, but I understand the reason you explain. The input is not clearly determined in its fields so is difficult to parse it.

I think I could use your 3 solutions and manually separate the values in 4 last columns.

Many thanks for your help.

---------- Post updated at 01:37 PM ---------- Previous update was at 12:22 PM ----------

Hello RudiC,

May you explain me please how this part of your first solution works?

$0 ~ HD   {getline
           printf "%s", $0
           getline
           next
           }
          {printf RS "%s", $0}
'

I understand with getline reads the next line after the matched string, but I don't see clear how if you already printed a line after the first getline, then why you apply another getline and a next and how you arrange the correct output with another print.

Thanks again.

RudiC · June 29, 2018, 2:14pm

Look at lines 4 - 6 in your sample in post#1:

.
.
.
VLL  VMI  VKO  VER        # $0 ~ HD   {getline  --> found the header (HD); discard and read new line
      39         5        #            printf "%s", $0
                          #            getline  --> empty line; skip with getline
                          #            next     --> go read  and process next input line

Scrutinizer · June 30, 2018, 12:36am

Alternatively, there is also an option in Excel to use "fixed width" instead of a field separator when reading data. When I tried it only one of the thin dividing lines needed to be moved by one position and it lead to this result:

AB	XY	UT	ZW	PRAT	VLL	VMI	VKO	VER
0	3	4	7214800	93				
0	3	4	730770	93	39		5	
0	1	4	7318	93				
0	2	4	87477	187	102	7		0-0
0	9	4	49365	185				
0	1	4	5307	93

Ophiuchus · July 1, 2018, 2:30am

rudic:

Look at lines 4 - 6 in your sample in post#1:

.
.
.
VLL  VMI  VKO  VER        # $0 ~ HD   {getline  --> found the header (HD); discard and read new line
   39         5        #            printf "%s", $0
   #            getline  --> empty line; skip with getline
   #            next     --> go read  and process next input line

Excellent. Thanks for the explanation.

---------- Post updated at 02:30 AM ---------- Previous update was at 02:28 AM ----------

scrutinizer:

Alternatively, there is also an option in Excel to use "fixed width" instead of a field separator when reading data. When I tried it only one of the thin dividing lines needed to be moved by one position and it lead to this result:
AB	XY	UT	ZW	PRAT	VLL	VMI	VKO	VER
0	3	4	7214800	93				
0	3	4	730770	93	39		5	
0	1	4	7318	93				
0	2	4	87477	187	102	7		0-0
0	9	4	49365	185				
0	1	4	5307	93				

Hi Scrutinizer, Thanks for the tip. I did it in that way and it seems was the final step to parse it as I was looking for.