Organize output with awk

Hello, maybe someone could help me with this.

I'm trying to print in a more ordered way this input.

the input file is:

AB   XY  UT   ZW                                      PRAT   
  0   3    4  7214800                                    93
  0   3    4  730770                                     93
         VLL  VMI  VKO  VER
          39         5

  0   1    4  7318                                       93
  0   2    4  87477                                     187
         VLL  VMI  VKO  VER
         102    7       0-0

  0   9    4  49365                                     185    
  0   1    4  5307                                       93

For some lines there are 2 related lines that contain "VLL VMI VKO VER" and in next line a corresponding value.
For example:

  • The line where $4=730770 (header is ZW), has the value VLL=39 and VKO=5
  • The line where $4=87477 has the value VLL=102, VMI=7 and VER=0-0

So, I would like to print this related values in the sale line when happens.

The output I'm looking for is like this:

AB   XY  UT   ZW                PRAT   VLL  VMI  VKO  VER 
  0   3    4  7214800              93
  0   3    4  730770               93   39         5
  0   1    4  7318                 93
  0   2    4  87477               187  102    7       0-0
  0   9    4  49365               185    
  0   1    4  5307                 93

I've was able to get this awk script but is far from desired output since basically is removing the lines with "VLL VMI VKO VER" and leaving the values in same place (in next line).

awk 'BEGIN{print "AB   XY  UT   ZW        PRAT    VLL  VMI  VKO  VER"}
NR>1 && $0!~"VLL" {print}
/VLL  VMI  VKO  VER/{getline v; print v}' inputfile

AB   XY  UT   ZW        PRAT    VLL  VMI  VKO  VER
  0   3    4  7214800                                    93
  0   3    4  730770                                     93
          39         5

  0   1    4  7318                                       93
  0   2    4  87477                                     187
         102    7       0-0

  0   9    4  49365                                     185
  0   1    4  5307                                       93

Thanks in advance

Try something like this:

awk '{i=1} NR==1{print $1,$4; i=2}{print $i; print $(i+1), $(i+3)}' RS=  FS='\n' file
1 Like

How about

awk '
NR == 1         {HD = "      VLL  VMI  VKO  VER"
                 printf "%s %s", $0, HD
                 next
                }
$0 ~ HD         {getline
                 printf "%s", $0
                 getline
                 next
                }
                {printf RS "%s", $0
                }
END             {printf RS
                }
' file
AB   XY  UT   ZW                                      PRAT          VLL  VMI  VKO  VER
  0   3    4  7214800                                    93
  0   3    4  730770                                     93          39         5
  0   1    4  7318                                       93
  0   2    4  87477                                     187         102    7       0-0
  0   9    4  49365                                     185    
  0   1    4  5307                                       93

Hello Scrutinizer and RudiC,

Thanks for the help. I noticed something I didn't considered. If I want to pass the output for example to Excel and I say the delimiter is space, the values for VLL, VMI, VKO, VER are located in wrong columns when for example VMI or VKO is empty in input file. Can be added an output delimiter to your solutions?

Thanks again

Sliding in an output delimiter is not a difficult task, but finding empty fields in the input when the input delimiters are a couple of spaces, and the input fields are space padded, too, is NOT a deterministic task. In this case, I'll take 5 spaces (two as field delimiters, three for field width) as an empty field identifier, but YMMV. Try

awk '
NR == 1         {HD = " VLL  VMI  VKO  VER"
                 $0 = $0 HD
                 $1 = $1
                 printf "%s", $0
                 next
                }
$0 ~ HD         {getline
                 sub (/^ */, _)
                 gsub (/     /, OFS)
                 $1 = $1
                 printf OFS "%s", $0
                 getline
                 next
                }
                {$1 = $1
                 printf RS "%s", $0
                }
END             {printf RS
                }
' OFS="," file
AB,XY,UT,ZW,PRAT,VLL,VMI,VKO,VER
0,3,4,7214800,93
0,3,4,730770,93,39,,5
0,1,4,7318,93
0,2,4,87477,187,102,7,,0-0
0,9,4,49365,185
0,1,4,5307,93
1 Like

Thanks again RudiC. It works with the sample file but as you said different results could be obtained. Trying other input the output is not correct for some values of 4 last columns, but I understand the reason you explain. The input is not clearly determined in its fields so is difficult to parse it.

I think I could use your 3 solutions and manually separate the values in 4 last columns.

Many thanks for your help.

---------- Post updated at 01:37 PM ---------- Previous update was at 12:22 PM ----------

Hello RudiC,

May you explain me please how this part of your first solution works?

$0 ~ HD   {getline
           printf "%s", $0
           getline
           next
           }
          {printf RS "%s", $0}
'

I understand with getline reads the next line after the matched string, but I don't see clear how if you already printed a line after the first getline, then why you apply another getline and a next and how you arrange the correct output with another print.

Thanks again.

Look at lines 4 - 6 in your sample in post#1:

.
.
.
VLL  VMI  VKO  VER        # $0 ~ HD   {getline  --> found the header (HD); discard and read new line
      39         5        #            printf "%s", $0
                          #            getline  --> empty line; skip with getline
                          #            next     --> go read  and process next input line
1 Like

Alternatively, there is also an option in Excel to use "fixed width" instead of a field separator when reading data. When I tried it only one of the thin dividing lines needed to be moved by one position and it lead to this result:

AB	XY	UT	ZW	PRAT	VLL	VMI	VKO	VER
0	3	4	7214800	93				
0	3	4	730770	93	39		5	
0	1	4	7318	93				
0	2	4	87477	187	102	7		0-0
0	9	4	49365	185				
0	1	4	5307	93				
1 Like

Excellent. Thanks for the explanation.

---------- Post updated at 02:30 AM ---------- Previous update was at 02:28 AM ----------

Hi Scrutinizer, Thanks for the tip. I did it in that way and it seems was the final step to parse it as I was looking for.:b: