Convert rows to columns

vskr72 · November 24, 2014, 2:26pm

hi folks,

I have a sample data like what is shown below:

1,ID=1000
1,Org=CedarparkHospital
1,cn=john
1,sn=doe
1,uid=User001
2,uid=User002
2,ID=2000
2,cn=steve
2,sn=jobs
2,Org=Providence

I would like to convert it into the below format:

1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002

Thanks for your help.

Kumar

derekludwig · November 24, 2014, 2:29pm

Kumar,

Is the data ordered? That is, all the information for record 1 is followed by all the information for record 2, record 3, etc. Are the key=value pairs in column order? Or is there a preferred column for each key?

(From your small example, each record is ordered, but the key=value pairs are not)

shamrock · November 24, 2014, 2:34pm

Can you post what you have tried to solve this on your own...

derekludwig · November 24, 2014, 2:51pm

First attempt in perl - assumes records and keys are unordered, as of yet untested:

use strict;
use warnings;

my @KEYS = qw{ recid };
my %KEYS = ( recid => 1 );
my @DATA = ();

$\ = "\n";

while (<>) {
  chomp;
  my ($recid, $key, $value) = m{^(\d+),([^=]+)=(.*)$};

  unless (defined $value) {
    print STDERR $ARGV, ': invalid record - ', $_;
    next;
  }

  # keys are ordered by occurrence

  unless (defined $KEYS{$key}) {
    push @KEYS, $key;
    $KEYS{$key}++;
  }

  # initial each data-record with the record-id

  unless (defined $DATA[$recid]) {
    $DATA[$recid]->{recid} = $recid;
  }

  $DATA[$recid]->{$key} = $value;
}

$, = ',';

print 0, @KEYS;

foreach my $D (@DATA) {
  next unless defined $D;
  print map { defined $D->{$_} ? $D->{$_} : '' } @KEYS;
}

which generates

0,recid,ID,Org,cn,sn,uid
1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002

vskr72 · November 24, 2014, 4:24pm

Yes, I can ensure that the dataand keys are ordered. That should not be an issue.

Kumar

derekludwig · November 24, 2014, 7:15pm

The keys are not ordered: "ID" is the first key of record 1 and the second of record 2; "uid" is the fifth key of record 1 and the first of record 2.

ongoto · November 24, 2014, 9:42pm

A possible solution for scrambled data

#!/bin/bash

sort -k 1 col2rodata -o /dev/null
ro=1
lines=
while read line
do
    lines=$(( lines + 1 ))

    case ${line:2:2} in
        ID) ID=${line##*=} ;;
        Or) Org=${line##*=} ;;
        cn) cn=${line##*=} ;;
        sn) sn=${line##*=} ;;
        ui) uid=${line##*=} ;;
    esac

    if [ $(( lines % 5 )) = 0 ]; then
        echo "$ro,$ID,$Org,$cn,$sn,$uid"
        ro=$(( $ro + 1 ))
    fi
done < col2rodata

# output
# ------
# 1,1000,CedarparkHospital,john,doe,User001
# 2,2000,Providence,steve,jobs,User002

Scrutinizer · November 24, 2014, 11:32pm

Another one, using the first record as a model for the order:

awk -F= '
  {
    split($1,K,/,/)
    O[K[1]]
    A[$1]=$2
  } 

  K[1]==1 {
    F[NR]=K[2]
    n=NR
  }  

  END {
    for(i in O) {
      s=i
      for(j=1; j<=n; j++)
        s=s OFS A[i OFS F[j]]
      print s
    }
  }
' OFS=, file

Output:

2,2000,Providence,steve,jobs,User002
1,1000,CedarparkHospital,john,doe,User001

RavinderSingh13 · November 25, 2014, 12:10am

vskr72:

hi folks,

I have a sample data like what is shown below:
1,ID=1000
1,Org=CedarparkHospital
1,cn=john
1,sn=doe
1,uid=User001
2,uid=User002
2,ID=2000
2,cn=steve
2,sn=jobs
2,Org=Providence
I would like to convert it into the below format:
1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002
Thanks for your help.

Kumar

Hello kumar,

Following may help you also.

awk -F, '{gsub(/.*\=/,X,$2);A[$1]=A[$1]?A[$1] OFS $2:$2} END{for(u in A){print u OFS A}}' OFS=, Input_file

Output will be as follows.

1,1000,CedarparkHospital,john,doe,User001
2,User002,2000,steve,jobs,Providence

Thanks,
R. Singh

vskr72 · November 25, 2014, 12:50am

Thank you all for your inputs/solutions.

Akshay_Hegde · November 25, 2014, 5:58am

Try

$ cat infile
1,ID=1000
1,Org=CedarparkHospital
1,cn=john
1,sn=doe
1,uid=User001
2,uid=User002
2,ID=2000
2,cn=steve
2,sn=jobs
2,Org=Providence

$ cat parse.awk
BEGIN{
	# Modify this variable if you want to change field order or if you want to remove any field
	order = "id,org,cn,sn,uid"

	# Header variable
	header = "RECORD"

	# Field Sep
	FS="[,=]"


	hn    = split(order,ORDER,/,/)
	
}
{
	A[$1,tolower($2)] = $3
	if(!tmp[$1]++)
	{
		SEQ[++c] = $1
	}
}
END{
	delete tmp;
	
	for(i=1; i<=c; i++)
	{
		str = SEQ
		for(j=1; j<=hn; j++)
		{
			if(!Is_header_printed)
			{
				header = header OFS toupper(ORDER[j])
			}
			
			key = SEQ SUBSEP ORDER[j] 
	
			if( key in A )
			{
				str = str OFS A[key]
				delete A[key]
			}else
			{
				str = str OFS "NaN"
			}    
		}
		
		if(!Is_header_printed)
		{
			print header
			Is_header_printed = !Is_header_printed
		}

		print str
	}
}

$ awk -vOFS="," -f parse.awk infile
RECORD,ID,ORG,CN,SN,UID
1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002

derekludwig · November 25, 2014, 8:56am

For the sort, did you want:

sort -k 1 col2rodata -o col2rodata

but that would alter the input data, which may or may not work depending on file size, file permissions, filesystem options, best practices, and alien intervention. Well, maybe not the last one. At any rate, may I suggest:

ro=1
lines=
sort -k 1 "${@}" | while read line
do