Convert rows to columns

hi folks,

I have a sample data like what is shown below:

1,ID=1000
1,Org=CedarparkHospital
1,cn=john
1,sn=doe
1,uid=User001
2,uid=User002
2,ID=2000
2,cn=steve
2,sn=jobs
2,Org=Providence

I would like to convert it into the below format:

1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002

Thanks for your help.

Kumar

Kumar,

Is the data ordered? That is, all the information for record 1 is followed by all the information for record 2, record 3, etc. Are the key=value pairs in column order? Or is there a preferred column for each key?

(From your small example, each record is ordered, but the key=value pairs are not)

Can you post what you have tried to solve this on your own...

First attempt in perl - assumes records and keys are unordered, as of yet untested:

use strict;
use warnings;

my @KEYS = qw{ recid };
my %KEYS = ( recid => 1 );
my @DATA = ();

$\ = "\n";

while (<>) {
  chomp;
  my ($recid, $key, $value) = m{^(\d+),([^=]+)=(.*)$};

  unless (defined $value) {
    print STDERR $ARGV, ': invalid record - ', $_;
    next;
  }

  # keys are ordered by occurrence

  unless (defined $KEYS{$key}) {
    push @KEYS, $key;
    $KEYS{$key}++;
  }

  # initial each data-record with the record-id

  unless (defined $DATA[$recid]) {
    $DATA[$recid]->{recid} = $recid;
  }

  $DATA[$recid]->{$key} = $value;
}

$, = ',';

print 0, @KEYS;

foreach my $D (@DATA) {
  next unless defined $D;
  print map { defined $D->{$_} ? $D->{$_} : '' } @KEYS;
}

which generates

0,recid,ID,Org,cn,sn,uid
1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002

Yes, I can ensure that the dataand keys are ordered. That should not be an issue.

Kumar

The keys are not ordered: "ID" is the first key of record 1 and the second of record 2; "uid" is the fifth key of record 1 and the first of record 2.

A possible solution for scrambled data

#!/bin/bash

sort -k 1 col2rodata -o /dev/null
ro=1
lines=
while read line
do
    lines=$(( lines + 1 ))

    case ${line:2:2} in
        ID) ID=${line##*=} ;;
        Or) Org=${line##*=} ;;
        cn) cn=${line##*=} ;;
        sn) sn=${line##*=} ;;
        ui) uid=${line##*=} ;;
    esac

    if [ $(( lines % 5 )) = 0 ]; then
        echo "$ro,$ID,$Org,$cn,$sn,$uid"
        ro=$(( $ro + 1 ))
    fi
done < col2rodata

# output
# ------
# 1,1000,CedarparkHospital,john,doe,User001
# 2,2000,Providence,steve,jobs,User002

Another one, using the first record as a model for the order:

awk -F= '
  {
    split($1,K,/,/)
    O[K[1]]
    A[$1]=$2
  } 

  K[1]==1 {
    F[NR]=K[2]
    n=NR
  }  

  END {
    for(i in O) {
      s=i
      for(j=1; j<=n; j++)
        s=s OFS A[i OFS F[j]]
      print s
    }
  }
' OFS=, file

Output:

2,2000,Providence,steve,jobs,User002
1,1000,CedarparkHospital,john,doe,User001

Hello kumar,

Following may help you also.

awk -F, '{gsub(/.*\=/,X,$2);A[$1]=A[$1]?A[$1] OFS $2:$2} END{for(u in A){print u OFS A}}' OFS=, Input_file

Output will be as follows.

1,1000,CedarparkHospital,john,doe,User001
2,User002,2000,steve,jobs,Providence

Thanks,
R. Singh

Thank you all for your inputs/solutions.

Try

$ cat infile
1,ID=1000
1,Org=CedarparkHospital
1,cn=john
1,sn=doe
1,uid=User001
2,uid=User002
2,ID=2000
2,cn=steve
2,sn=jobs
2,Org=Providence
$ cat parse.awk
BEGIN{
	# Modify this variable if you want to change field order or if you want to remove any field
	order = "id,org,cn,sn,uid"

	# Header variable
	header = "RECORD"

	# Field Sep
	FS="[,=]"


	hn    = split(order,ORDER,/,/)
	
}
{
	A[$1,tolower($2)] = $3
	if(!tmp[$1]++)
	{
		SEQ[++c] = $1
	}
}
END{
	delete tmp;
	
	for(i=1; i<=c; i++)
	{
		str = SEQ
		for(j=1; j<=hn; j++)
		{
			if(!Is_header_printed)
			{
				header = header OFS toupper(ORDER[j])
			}
			
			key = SEQ SUBSEP ORDER[j] 
	
			if( key in A )
			{
				str = str OFS A[key]
				delete A[key]
			}else
			{
				str = str OFS "NaN"
			}    
		}
		
		if(!Is_header_printed)
		{
			print header
			Is_header_printed = !Is_header_printed
		}

		print str
	}
}
$ awk -vOFS="," -f parse.awk infile
RECORD,ID,ORG,CN,SN,UID
1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002

For the sort, did you want:

sort -k 1 col2rodata -o col2rodata

but that would alter the input data, which may or may not work depending on file size, file permissions, filesystem options, best practices, and alien intervention. Well, maybe not the last one. At any rate, may I suggest:

ro=1
lines=
sort -k 1 "${@}" | while read line
do