Converting Single Column into Multiple rows, but with strings to specific tab column

Dear fellows, I need your help.

I'm trying to write a script to convert a single column into multiple rows.
But it need to recognize the beginning of the string and set it to its specific Column number.
Each Line (loop) begins with digit [0-9] (RANGE).

At this moment it's kind of working, but it is skipping a line each time [in red] (double loop I think).

Would ready appreciate your help. fine tuning this one or simpler version.
This one is an example of the real file.

Thanks

x.txt

20201
OBB-9
BO-11
VER-2
CBAA-80
CBAZA-70
20205
OBB-10
BO-11
VER-2
CBAA-88
CBAZA-66
20209
OBB-8
BO-11
VER-2
MVER-2
PER-2
CBAA-44
CBAZA-25
20210
OBB-8
BO-11
VER-2
MVER-2
PER-2
CBAA-41
CBAZA-22
202111
OBB-9
BO-11
VER-2
ALP-11
CBAA-88
CBAZA-66

output.txt

Range	OBB	BO	VER	MVER	PER	CBAA	CBAZA	ALP
20201	OBB-9	BO-11	VER-2	             CBAA-44	CBAZA-25
20205	OBB-10	BO-11	VER-2		         CBAA-88  CBAZA-66
20209 	OBB-8	BO-11 	VER-2 	MVER-2 	PER-2 	CBAA-44 CBAZA-25 
20210 	OBB-8 	BO-11 	VER-2 	MVER-2 	PER-2 	CBAA-41 CBAZA-22  
20211 	OBB-9 	BO-11 	VER-2  		         CBAA-88 CBAZA-66 ALP-11

awk.awk

BEGIN {
# Print the Headers
printf "%s", "RANGE\t"
printf "%s", "BBO\t"
printf "%s", "BO\t"
printf "%s", "VER\t"
printf "%s", "MVER\t"
printf "%s", "PER\t"
printf "%s", "CBAA\t"
printf "%s", "CBAZA\t"
printf "%s", "ALP\r\n"

}

# Loop every line with digit [0-9]
$1 ~ /^[0-9]/ {

# For every loop clean empty column value
vRANGE = ""
vOBB = ""
vBO = ""
vVER = ""
vMVER = ""
vPER = ""
vCBAA = ""
vCBAZA = ""
vALP = ""

#set vRANGE to current RANGE 
vRANGE = $1

# set column counter to 1
i = 1
# interate to columns with max of 20
while (i <= 20) {
	# read columns value for current RANGE
	getline
	
	# if value is a number than we have hit a new RANGE, stop processing
	if ( $1 ~ /^[0-9]/ ) {
		# kick column counter to column max
		i = 20
	} else {
		# increase column counter
		i++
	}
	
	
	# set specific column values
	
		#vOBB
		if ( $1 ~ /^OBB-/ ) {
			vOBB = $1
		}
		#vBO 
		if ( $1 ~ /^BO-/ ) {
			vBO = $1
		}
		#vVER
		if ( $1 ~ /^VER-/ ) {
			vVER = $1
		}
		#vMVER
		if ( $1 ~ /^MVER-/ ) {
			vMVER = $1
		
		##vPER
		if ( $1 ~ /^PER-/ ) {
			vPER = $1
		}
		##CBAA
		if ( $1 ~ /^CBAA-/ ) {
			vCBAA = $1
		}
		#vCBAZA
		if ( $1 ~ /^CBAZA-/ ) {
			vCBAZA = $1
		}
		#vALP
		if ( $1 ~ /^ALP-/ ) {
			vALP = $1
		}

}

# Display all the columns for the current RANGE
printf "%s\t", vRANGE
printf "%s\t", vOBB
printf "%s\t", vBO
printf "%s\t", vVER
printf "%s\t", vMVER
printf "%s\t", vPER
printf "%s\t", vCBAA
printf "%s\t", vCBAZA
printf "%s\t", vALP
printf "\r\n"

}


20201
OBB-9
BO-11
VER-2
CBAA-80
CBAZA-70

How does this become:

20201	OBB-9	BO-11	VER-2	MVER-2	PER-2	CBAA-44	CBAZA-25

I do not see MVER-2 PER-2 for 20201

ohh sorry you're right. bad copy paste

Range	OBB	BO	VER	MVER	PER	CBAA	CBAZA	 ALP
20201	OBB-9	BO-11   VER-2	                CBAA-44	CBAZA-25

---------- Post updated at 02:21 PM ---------- Previous update was at 12:38 PM ----------

no body ? :confused::slight_smile:

Try this if it fits your needs:

awk     'function printall (RES) {
                         for (i=1; i<=8; i++) printf "%s\t", RES; printf "%s\n", RES[9]
                        }
         NR==1          {split ("RANGE BBO BO VER MVER PER CBAA CBAZA ALP", RES)}

         $0==$0+0       {printall (RES)
                         delete RES
                         RES[1]=$0}
         /^OBB-/        {RES[2]=$1}
         /^BO-/         {RES[3]=$1}
         /^VER-/        {RES[4]=$1}
         /^MVER-/       {RES[5]=$1}
         /^PER-/        {RES[6]=$1}
         /^CBAA-/       {RES[7]=$1}
         /^CBAZA-/      {RES[8]=$1}
         /^ALP-/        {RES[9]=$1}
         END            {printall (RES)}
        ' file
RANGE   BBO     BO      VER     MVER    PER     CBAA    CBAZA   ALP
20201   OBB-9   BO-11   VER-2                   CBAA-80 CBAZA-70
20205   OBB-10  BO-11   VER-2                   CBAA-88 CBAZA-66
20209   OBB-8   BO-11   VER-2   MVER-2  PER-2   CBAA-44 CBAZA-25
20210   OBB-8   BO-11   VER-2   MVER-2  PER-2   CBAA-41 CBAZA-22
202111  OBB-9   BO-11   VER-2                   CBAA-88 CBAZA-66        ALP-11
1 Like

Tnx.
But I'm receiving this error :confused:

[~/scripts/test]$ awk -f format.awk
awk: format.awk:1: awk     'function printall (RES) {
awk: format.awk:1:         ^ invalid char ''' in expression
awk: format.awk:1: awk     'function printall (RES) {
awk: format.awk:1:         ^ syntax error

try also:

awk '
BEGIN {
   cols="RANGE,OBB,BO,VER,MVER,PER,CBAA,CBAZA,ALP";
   cc=split(cols, oc, ",");
   for (j=1; j<=cc; j++) printf oc[j] (j<cc ? "\t":"\n");
}
{if ($1 ~ /^[0-9]+$/) {
   if (d[oc[1]]) for (j=1; j<=cc; j++) { printf d[oc[j]] (j<cc ? "\t":"\n"); delete d[oc[j]] ; }
   d[oc[1]]=$1;
 } else {
   ln=$0; sub("[-].*", "", ln); d[ln]=$0;
 }
}
END { for (j=1; j<=cc; j++) printf d[oc[j]] (j<cc ? "\t":"\n"); }
' infile
1 Like

RudiC & rdrtx1 @ Thank man...really helpful!! There both working :smiley: Hardcore sripting ;):):b:.

---------- Post updated at 03:53 PM ---------- Previous update was at 08:31 AM ----------

@ rdrtx1 : there's a small issue. It doesn't clear the column variable after every loop.