Rows to Columns

Hi Guru's,
I have a requirement where i need to convert rows to column based on a key column.

Input:
Account_id|Trip_Org|Trip_Dest|City|Hotel_Nm
123|DFW|CHI|Dallas|Hyatt
123|LAS|LPA|Vegas|Hyatt Palace

Output:
123|Trip_Org1=DFW,Trip_Org2=LAS,Trip_Dest1=CHI,Trip_Dest2=LPA|City1=Dallas,City2=Vegas|Hotel_Nm1=Hyatt,Hotel_Nm2=Hyatt Palace

Please help me with the code to achieve this.

Note:
There are no MAX limit on any column values

Does it matter which is 1 ? Rows have no order, strictly speaking.

Is it always 2 for 1?

You can achieve this in any shell, awk, perl, C, C++, JAVA or SQL:

select
  Account_id,
  "Trip_Org1="
   || a.Trip_Org
   || ",Trip_Org2="
   || b.Trip_Org
   AS Trip_Org,
  "Trip_Dest1=" 
   || a.Trip_Dest
   || ",Trip_Dest2="
   || b.Trip_Dest
   AS Trip_Dest,
  "City1="
   || a.City
   || ",City2="
   || b.City
   AS City,
  "Hotel_Nm1="
   || a.Hotel_Nm
   || ",Hotel_Nm2="
   || b.Hotel_Nm
   AS Hotel_Nm
 from
  input a
   join input b
    using Account_id
 where
  a.Trip_Org < b.Trip_Org
 order by 1

This will work with nawk for exactly the sample file you have given (but I guess there will come more sophisticated input files):

awk 'NR==1 {n = split ($0, Hd, "|")}
     NR>1  {m = split ($0, Tmp, "|"); for (i=2;i<=n;i++) Out=(Out?Out",":"")HdNR-1"="Tmp}
     END   {printf "%s",Tmp[1]; for (i=2;i<=n;i++) printf "|%s", Out; printf "\n"}
    ' file
123|Trip_Org1=DFW,Trip_Org2=LAS|Trip_Dest1=CHI,Trip_Dest2=LPA|City1=Dallas,City2=Vegas|Hotel_Nm1=Hyatt,Hotel_Nm2=Hyatt Palace

If e.g. the key value in col 1 will change, pls make up your own solution checking for col 1, print out values so far, delete the Out array and start over...

You can do something similar in bash with indexed arrays for the input, saved header, and output, if in a subshell you add '|' to the front of $IFS (If the word is double-quoted, ${name
[*]} expands to a single word with the value of each array member separated by the first character of the IFS special variable), read -a, etc. that will accomodate any nnumber and name columns and rows dynamically (assuming they are sorted on the key column). The usual challenges on starting and ending the loop do not go away. That is why they invented SQL, text ODBC and JDBC, Excel, etc.

Try

awk -F \| 'NR==1{n=split($0,P,"|")}
NR>1{A[$1]++;for(i=2;i<=NF;i++){arr[$1,P]=arr[$1,P]?arr[$1,P]","P""A[$1]"="$i:P""A[$1]"="$i}}
END{for(j in A){printf j; for(i=2;i<=n;i++){printf "|%s", arr[j,P]}print "";}}' file

Highlighted one may cause problem if input changes. Assume $1 has has total more than 2 different values.

like

Account_id|Trip_Org|Trip_Dest|City|Hotel_Nm
123|DFW|CHI|Dallas|Hyatt
123|LAS|LPA|Vegas|Hyatt Palace
1f3|DFW|CHI|Dallas|Hyatt
1f3|LAS|LPA|Vegas|Hyatt Palace
1 Like
while(<DATA>){
	chomp;
	my @tmp = split(/\|/,$_);
	if($. == 1){
		@names = @tmp[1..$#tmp];
		next;
	}
	else{
		for(my $i=1;$i<=$#tmp;$i++){
			push @{$hash{$tmp[0]}->{$i}}, $tmp[$i];
		}
	}
}
foreach my $k (sort {$a<=>$b }keys %hash){
	print $k," ";
	my %h = %{$hash{$k}};
	foreach my $kk (sort {$a<=>$b} keys %h){
		my @arr = @{$h{$kk}};
		for(my $j=0;$j<=$#arr;$j++){
			print $names[$kk-1].'_'.$j.'='.$arr[$j].",";
		}
	}
	print "\n";
}
__DATA__
Account_id|Trip_Org|Trip_Dest|City|Hotel_Nm
123|DFW|CHI|Dallas|Hyatt
123|LAS|LPA|Vegas|Hyatt Palace
456|DFW|CHI|Dallas|Hyatt
456|LAS|LPA|Vegas|Hyatt Palace
456|DFW|CHI|Dallas|Hyatt
456|LAS|LPA|Vegas|Hyatt Palace
1 Like

On sample file :

Account_id|Trip_Org|Trip_Dest|City|Hotel_Nm
123|DFW|CHI|Dallas|Hyatt
123|LAS|LPA|Vegas|Hyatt Palace
124|4_DFW|4_CHI|4_Dallas|4_Hyatt
124|4_LAS|4_LPA|4_Vegas|4_Hyatt Palace
BEGIN {FS="|";fmt="%s|Trip_Org1=%s,Trip_Org2=%s,Trip_Dest1=%s,Trip_Dest2=%s|City1=%s,City2=%s|Hotel_Nm1=%s,Hotel_Nm2=%s\n"}
NR==1{next}
!NR-1%2{z=$0;getline}
{$0=z"|"$0;printf fmt, $1,$2,$7,$3,$8,$4,$9,$5,$10}
1 Like