Split columns into rows

Any one can help me in converting columns into rows.
example
I have input file

10000|[12080000000]
10002|[13075200000]
10003|[13939200000]
10004|[1347200000,133600000,1152000000,106400000,12800000,117200000,145180000,1451000000,148400000,14240000]
10005|[16000000]

I want output in below format

PARTY|PART_DT
10000|12080000000
10002|13075200000
10003|13939200000
10004|1347200000
10004|133600000
10004|1152000000
10004|106400000
10004|12800000
10004|117200000
10004|145180000
10004|1451000000
10004|148400000
10004|14240000
10005|16000000

Hello syd,

Welcome to the forums, special thanks for using code tags for your code/inputs/samples into your post. Following may help you in same.

awk -F"|" 'BEGIN{print "PARTY|PART_DT"} {gsub(/\[|\]/,X,$NF);num=split($NF, array,",");for(i=1;i<=num;i++){print $1 OFS array}}' OFS="|"  Input_file

Output will be as follows.

PARTY|PART_DT
10000|12080000000
10002|13075200000
10003|13939200000
10004|1347200000
10004|133600000
10004|1152000000
10004|106400000
10004|12800000
10004|117200000
10004|145180000
10004|1451000000
10004|148400000
10004|14240000
10005|16000000
 

Thanks,
R. Singh

1 Like

Also try :

[akshay@localhost tmp]$ cat file.txt 
10000|[12080000000]
10002|[13075200000]
10003|[13939200000]
10004|[1347200000,133600000,1152000000,106400000,12800000,117200000,145180000,1451000000,148400000,14240000]
10005|[16000000]
[akshay@localhost tmp]$ awk -F\| 'BEGIN{print "PARTY|PART_DT"}{gsub(/[\[\]]/,""); gsub(/,/,"\n"$1"|")}1' file.txt 

Output

PARTY|PART_DT
10000|12080000000
10002|13075200000
10003|13939200000
10004|1347200000
10004|133600000
10004|1152000000
10004|106400000
10004|12800000
10004|117200000
10004|145180000
10004|1451000000
10004|148400000
10004|14240000
10005|16000000
1 Like

Hi Ravinder and Akshay

its working.thanks for you prompt reply.

10000|latDate:1442457679000|lat:-99
10001|latDate:1448538549000|lat:3213374
10002|latDate:144558585|lat:34848006
10004|latDate:1442455248000|lat:35872402070386
10005|latDate:35193106096435 

in above input file I have -99 value.

if iam using grep value
grep -o 'lat:\w\+' input.txt

iam getting only positive value , if not negative value

output file

lat:3213374
lat:34848006
lat:35872402070386

can you please let me know how to fetch negative value by using
grep -o 'lat:\w\+' input.txt command

Something like this

[akshay@localhost tmp]$ cat f
10000|latDate:1442457679000|lat:-99
10001|latDate:1448538549000|lat:3213374
10002|latDate:144558585|lat:34848006
10004|latDate:1442455248000|lat:35872402070386
10005|latDate:35193106096435

[akshay@localhost tmp]$ grep -Po 'lat:.*' f
lat:-99
lat:3213374
lat:34848006
lat:35872402070386

---------- Post updated at 06:52 PM ---------- Previous update was at 06:47 PM ----------

lat: matches the characters lat: literally (case sensitive)
.* matches any character (except newline)

Hi Akshay,

its not working if I have any columns next to negative value, its fetching all value .

Ok, try this

[akshay@localhost tmp]$ cat f
10000|latDate:1442457679000|lat:-99|co1|c2
10001|latDate:1448538549000|lat:3213374
10002|latDate:144558585|lat:34848006
10004|latDate:1442455248000|lat:35872402070386
10005|latDate:35193106096435

[akshay@localhost tmp]$ grep -oP 'lat:[-+]?\w+' f
lat:-99
lat:3213374
lat:34848006
lat:35872402070386

---------- Post updated at 07:03 PM ---------- Previous update was at 06:59 PM ----------

lat:[-+]?\w+
 ``` lat: ```  matches the characters lat: literally \(case sensitive\)
 ``` [-+]? ```  match a single character present in the list below
    Quantifier: ? Between zero and one time, as many times as possible
 ``` \w+ ```  match any word character [a-zA-Z0-9_]

PFB requirements I have

INPUT FILE

10000|1442448000000:[-99]|latChangeDate:14457679000|latest:-99
10001|1442448000000:[-99]|latChangeDate:14438549000|latest:351592032174|1448496000000:[35159203213374]
10002|1442448000000:[35578404848006],latChangeDate:1442468000|latest:35578404846
10003
10004|1442448000000:[35872402070386]|latChangeDate:14425248000|latest:358722070386
10005|1448755200000:[35193106096435]|latest:351931096435

code I have used to get value is

sed -i 's/"//g' input.txt
printf "ID|latChangeDate|latest\n" >> output1.csv
while read -r line
do
id=`echo $line | grep -o '^[0-9]\+'`
printf "$id|"
if [[ $line =~ latChangeDate ]]
then
lt_cre=`echo $line | grep -o 'latChangeDate:\w\+'`
lt_cre_val=`echo $lt_cre | cut -d: -f2`
printf "${lt_cre_val}|"
else
printf "XXXXXXXXX|"
fi
if [[ $line =~ latest ]]
then
lt_cre_ac_dt=`echo $line | grep -o 'latest:\w\+'`
lt_cre_ac_dt_val=`echo $lt_cre_ac_dt | cut -d: -f2`
printf "${lt_cre_ac_dt_val}\n"
else
printf "XXXXXXXXX\n"
fi
done < /input.txt

output I got --where -99 value is missing

ID|latChangeDate|latest
10000|14457679000|
10001|14438549000|35159203214
10002|1442468000|35578404846
10003|XXXXXXXXX|XXXXXXXXX
10004|14425248000|358722070386
10005|XXXXXXXXX|351931096435

but I should get

10000|14457679000|-99
10001|14438549000|35159203214
10002|1442468000|35578404846
10003|XXXXXXXXX|XXXXXXXXX
10004|14425248000|358722070386
10005|XXXXXXXXX|351931096435

where -99 value is missing

---------- Post updated 03-29-16 at 12:23 AM ---------- Previous update was 03-28-16 at 11:56 PM ----------

Hi Ravinder,
this is an another file with negative value.

Hello syd,

Not clear still, I am not seeing value 35159203214 in your Input_file, could you please elaborate more about your requirement so that we could help you more on this.

Thanks,
R. Singh

PFb sample data

Input file

10000|1442000000:[-99]|latChangeDate:1400|latest:-99
10001|1446:[-99]|latChangeDate:144385400|latest:351592174|1448400:[35374]
10002|1424:[38006],latChangeDate:144246|latest:355746
10003
10004|14424:[358386]|latChangeDate:14425200|latest:35872386
10005|14480:[351435]|latest:351935

Output I should get is

ID|latChangeDate|latest
10000|1400|-99
10001|144385400|351592174
10002|144246|355746
10003|XXXX|XXXX
10004|14425200|35872386
10005|XXXX|351935

I have used code

sed -i 's/"//g' input.txt
printf "ID|latChangeDate|latest\n" >> output1.csv
while read -r line
do
id=`echo $line | grep -o '^[0-9]\+'`
printf "$id|"
if [[ $line =~ latChangeDate ]]
then
lt_cre=`echo $line | grep -o 'latChangeDate:\w\+'`
lt_cre_val=`echo $lt_cre | cut -d: -f2`
printf "${lt_cre_val}|"
else
printf "XXXXXXXXX|"
fi
if [[ $line =~ latest ]]
then
lt_cre_ac_dt=`echo $line | grep -o 'latest:\w\+'`
lt_cre_ac_dt_val=`echo $lt_cre_ac_dt | cut -d: -f2`
printf "${lt_cre_ac_dt_val}\n"
else
printf "XXXXXXXXX\n"
fi
done < /input.txt

for above script iam getting output as

ID|latChangeDate|latest
10000|1400|
10001|144385400|351592174
10002|144246|355746
10003|XXXX|XXXX
10004|14425200|35872386
10005|XXXX|351935

iam not getting negative value -99.

Hello syd,

Could you please try following and let me know if this helps you.

awk -F"|" 'BEGIN{print "ID|latChangeDate|latest"}{for(i=2;i<=NF;i++){if($i ~ /latChangeDate/){m=sub(/.*:/,X,$i);l++;Q=Q?Q OFS $i:$i};if($i ~ /latest/){n=sub(/.*:/,X,$i);Q=Q?Q OFS $i:$i};};if(!m && !n){Q="XXXX" OFS "XXXX"} else if(!m){Q="XXXX" OFS Q} else if(!n){Q=Q OFS "XXXX"};print $1 OFS Q;Q=m=n=""}' OFS="|"  Input_file

Output will be as follows.

ID|latChangeDate|latest
10000|1400|-99
10001|144385400|351592174
10002|144246|355746
10003|XXXX|XXXX
10004|14425200|35872386
10005|XXXX|351935
 

EDIT: Adding a non-one liner form of solution on same now.

awk -F"|" 'BEGIN{
                        print "ID|latChangeDate|latest"
                }
                {
                        for(i=2;i<=NF;i++){
                                                if($i ~ /latChangeDate/){
                                                                                m=sub(/.*:/,X,$i);
                                                                                Q=Q?Q OFS $i:$i
                                                                        };
                                                if($i ~ /latest/){
                                                                                n=sub(/.*:/,X,$i);
                                                                                Q=Q?Q OFS $i:$i};
                                                                        };
                                                if(!m && !n)            {
                                                                                Q="XXXX" OFS "XXXX"
                                                                        }
                                                else if(!m)             {
                                                                                Q="XXXX" OFS Q
                                                                        }
                                                else if(!n)             {
                                                                                Q=Q OFS "XXXX"
                                                                        };
                                                print $1 OFS Q;
                                                Q=m=n=""
                }
           ' OFS="|"   Input_file
 

Hope this helps you.

Thanks,
R. Singh

1 Like

Thank you Ravinder.

its working :slight_smile:

Try

Input

[akshay@localhost tmp]$ cat f
10000|1442000000:[-99]|latChangeDate:1400|latest:-99
10001|1446:[-99]|latChangeDate:144385400|latest:351592174|1448400:[35374]
10002|1424:[38006],latChangeDate:144246|latest:355746
10003
10004|14424:[358386]|latChangeDate:14425200|latest:35872386
10005|14480:[351435]|latest:351935

command

[akshay@localhost tmp]$ awk -vext='latChangeDate,latest' '
BEGIN{
    FS=OFS="|"
    split(ext,s,/,/)
    print "ID|latChangeDate|latest"
}
{ 
    str=$1 
    for(i=1; i in s; i++)
    str = str OFS ((match($0,s":[^|]*"))? (substr($0,RSTART+length(s)+1,RLENGTH-(length(s)+1))) :"XXXX") 
    print str 
}'  f

output

ID|latChangeDate|latest
10000|1400|-99
10001|144385400|351592174
10002|144246|355746
10003|XXXX|XXXX
10004|14425200|35872386
10005|XXXX|351935

Here's a quick and dirty shell script to do the job. Handles any data.

 #!/bin/ksh
 infile="convertdata"
 outfile="converteddata"
 function filecleanup
 {
 if [[ -f $1 ]] ; then
 rm $1
 fi
 }
 function writeoutput
 {
 i=1
 while :;
 do
 print $part_dt | cut -d, -f$i | read value
 if [[ $value == '' ]] then
 return;
 fi
 set +x
 print $party"|"$value
 ((i += 1))
 done
 }
 filecleanup $outfile
 echo "Party|Part_dt"
 cat $infile | while read rec
 do
 echo $rec | cut -d'|' -f1 | read party
 echo $rec | cut -d'|' -f2 | read part_dt
 part_dt=${part_dt#[} # Strip initial bracket
 part_dt=${part_dt%]} # Strip trailing bracket
 part_dt=${part_dt}"," # cut has strange behavior when no delimiter, so make sure each record has a delimiter
 writeoutput #$party $part_dt
 done
 

---------- Post updated at 01:55 PM ---------- Previous update was at 01:53 PM ----------

Output from above script:

Party|Part_dt
10000|12080000000
10002|13075200000
10002|-999
10003|13939200000
10004|1347200000
10004|133600000
10004|1152000000
10004|106400000
10004|12800000
10004|117200000
10004|145180000
10004|1451000000
10004|148400000
10004|14240000
10005|16000000