vskr72
November 24, 2014, 2:26pm
1
hi folks,
I have a sample data like what is shown below:
1,ID=1000
1,Org=CedarparkHospital
1,cn=john
1,sn=doe
1,uid=User001
2,uid=User002
2,ID=2000
2,cn=steve
2,sn=jobs
2,Org=Providence
I would like to convert it into the below format:
1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002
Thanks for your help.
Kumar
Kumar,
Is the data ordered? That is, all the information for record 1 is followed by all the information for record 2, record 3, etc. Are the key=value pairs in column order? Or is there a preferred column for each key?
(From your small example, each record is ordered, but the key=value pairs are not)
Can you post what you have tried to solve this on your own...
First attempt in perl - assumes records and keys are unordered, as of yet untested:
use strict;
use warnings;
my @KEYS = qw{ recid };
my %KEYS = ( recid => 1 );
my @DATA = ();
$\ = "\n";
while (<>) {
chomp;
my ($recid, $key, $value) = m{^(\d+),([^=]+)=(.*)$};
unless (defined $value) {
print STDERR $ARGV, ': invalid record - ', $_;
next;
}
# keys are ordered by occurrence
unless (defined $KEYS{$key}) {
push @KEYS, $key;
$KEYS{$key}++;
}
# initial each data-record with the record-id
unless (defined $DATA[$recid]) {
$DATA[$recid]->{recid} = $recid;
}
$DATA[$recid]->{$key} = $value;
}
$, = ',';
print 0, @KEYS;
foreach my $D (@DATA) {
next unless defined $D;
print map { defined $D->{$_} ? $D->{$_} : '' } @KEYS;
}
which generates
0,recid,ID,Org,cn,sn,uid
1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002
vskr72
November 24, 2014, 4:24pm
5
Yes, I can ensure that the dataand keys are ordered. That should not be an issue.
Kumar
The keys are not ordered: "ID" is the first key of record 1 and the second of record 2; "uid" is the fifth key of record 1 and the first of record 2.
ongoto
November 24, 2014, 9:42pm
7
A possible solution for scrambled data
#!/bin/bash
sort -k 1 col2rodata -o /dev/null
ro=1
lines=
while read line
do
lines=$(( lines + 1 ))
case ${line:2:2} in
ID) ID=${line##*=} ;;
Or) Org=${line##*=} ;;
cn) cn=${line##*=} ;;
sn) sn=${line##*=} ;;
ui) uid=${line##*=} ;;
esac
if [ $(( lines % 5 )) = 0 ]; then
echo "$ro,$ID,$Org,$cn,$sn,$uid"
ro=$(( $ro + 1 ))
fi
done < col2rodata
# output
# ------
# 1,1000,CedarparkHospital,john,doe,User001
# 2,2000,Providence,steve,jobs,User002
Another one, using the first record as a model for the order:
awk -F= '
{
split($1,K,/,/)
O[K[1]]
A[$1]=$2
}
K[1]==1 {
F[NR]=K[2]
n=NR
}
END {
for(i in O) {
s=i
for(j=1; j<=n; j++)
s=s OFS A[i OFS F[j]]
print s
}
}
' OFS=, file
Output:
2,2000,Providence,steve,jobs,User002
1,1000,CedarparkHospital,john,doe,User001
vskr72:
hi folks,
I have a sample data like what is shown below:
1,ID=1000
1,Org=CedarparkHospital
1,cn=john
1,sn=doe
1,uid=User001
2,uid=User002
2,ID=2000
2,cn=steve
2,sn=jobs
2,Org=Providence
I would like to convert it into the below format:
1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002
Thanks for your help.
Kumar
Hello kumar,
Following may help you also.
awk -F, '{gsub(/.*\=/,X,$2);A[$1]=A[$1]?A[$1] OFS $2:$2} END{for(u in A){print u OFS A}}' OFS=, Input_file
Output will be as follows.
1,1000,CedarparkHospital,john,doe,User001
2,User002,2000,steve,jobs,Providence
Thanks,
R. Singh
vskr72
November 25, 2014, 12:50am
10
Thank you all for your inputs/solutions.
Try
$ cat infile
1,ID=1000
1,Org=CedarparkHospital
1,cn=john
1,sn=doe
1,uid=User001
2,uid=User002
2,ID=2000
2,cn=steve
2,sn=jobs
2,Org=Providence
$ cat parse.awk
BEGIN{
# Modify this variable if you want to change field order or if you want to remove any field
order = "id,org,cn,sn,uid"
# Header variable
header = "RECORD"
# Field Sep
FS="[,=]"
hn = split(order,ORDER,/,/)
}
{
A[$1,tolower($2)] = $3
if(!tmp[$1]++)
{
SEQ[++c] = $1
}
}
END{
delete tmp;
for(i=1; i<=c; i++)
{
str = SEQ
for(j=1; j<=hn; j++)
{
if(!Is_header_printed)
{
header = header OFS toupper(ORDER[j])
}
key = SEQ SUBSEP ORDER[j]
if( key in A )
{
str = str OFS A[key]
delete A[key]
}else
{
str = str OFS "NaN"
}
}
if(!Is_header_printed)
{
print header
Is_header_printed = !Is_header_printed
}
print str
}
}
$ awk -vOFS="," -f parse.awk infile
RECORD,ID,ORG,CN,SN,UID
1,1000,CedarparkHospital,john,doe,User001
2,2000,Providence,steve,jobs,User002
For the sort, did you want:
sort -k 1 col2rodata -o col2rodata
but that would alter the input data, which may or may not work depending on file size, file permissions, filesystem options, best practices, and alien intervention. Well, maybe not the last one. At any rate, may I suggest:
ro=1
lines=
sort -k 1 "${@}" | while read line
do