syd
March 28, 2016, 4:23am
1
Any one can help me in converting columns into rows.
example
I have input file
10000|[12080000000]
10002|[13075200000]
10003|[13939200000]
10004|[1347200000,133600000,1152000000,106400000,12800000,117200000,145180000,1451000000,148400000,14240000]
10005|[16000000]
I want output in below format
PARTY|PART_DT
10000|12080000000
10002|13075200000
10003|13939200000
10004|1347200000
10004|133600000
10004|1152000000
10004|106400000
10004|12800000
10004|117200000
10004|145180000
10004|1451000000
10004|148400000
10004|14240000
10005|16000000
Hello syd,
Welcome to the forums, special thanks for using code tags for your code/inputs/samples into your post. Following may help you in same.
awk -F"|" 'BEGIN{print "PARTY|PART_DT"} {gsub(/\[|\]/,X,$NF);num=split($NF, array,",");for(i=1;i<=num;i++){print $1 OFS array}}' OFS="|" Input_file
Output will be as follows.
PARTY|PART_DT
10000|12080000000
10002|13075200000
10003|13939200000
10004|1347200000
10004|133600000
10004|1152000000
10004|106400000
10004|12800000
10004|117200000
10004|145180000
10004|1451000000
10004|148400000
10004|14240000
10005|16000000
Thanks,
R. Singh
1 Like
Also try :
[akshay@localhost tmp]$ cat file.txt
10000|[12080000000]
10002|[13075200000]
10003|[13939200000]
10004|[1347200000,133600000,1152000000,106400000,12800000,117200000,145180000,1451000000,148400000,14240000]
10005|[16000000]
[akshay@localhost tmp]$ awk -F\| 'BEGIN{print "PARTY|PART_DT"}{gsub(/[\[\]]/,""); gsub(/,/,"\n"$1"|")}1' file.txt
Output
PARTY|PART_DT
10000|12080000000
10002|13075200000
10003|13939200000
10004|1347200000
10004|133600000
10004|1152000000
10004|106400000
10004|12800000
10004|117200000
10004|145180000
10004|1451000000
10004|148400000
10004|14240000
10005|16000000
1 Like
syd
March 28, 2016, 8:01am
4
Hi Ravinder and Akshay
its working.thanks for you prompt reply.
10000|latDate:1442457679000|lat:-99
10001|latDate:1448538549000|lat:3213374
10002|latDate:144558585|lat:34848006
10004|latDate:1442455248000|lat:35872402070386
10005|latDate:35193106096435
in above input file I have -99 value.
if iam using grep value
grep -o 'lat:\w\+' input.txt
iam getting only positive value , if not negative value
output file
lat:3213374
lat:34848006
lat:35872402070386
can you please let me know how to fetch negative value by using
grep -o 'lat:\w\+' input.txt command
Something like this
[akshay@localhost tmp]$ cat f
10000|latDate:1442457679000|lat:-99
10001|latDate:1448538549000|lat:3213374
10002|latDate:144558585|lat:34848006
10004|latDate:1442455248000|lat:35872402070386
10005|latDate:35193106096435
[akshay@localhost tmp]$ grep -Po 'lat:.*' f
lat:-99
lat:3213374
lat:34848006
lat:35872402070386
---------- Post updated at 06:52 PM ---------- Previous update was at 06:47 PM ----------
lat:
matches the characters lat: literally (case sensitive)
.*
matches any character (except newline)
syd
March 28, 2016, 8:26am
6
Hi Akshay,
its not working if I have any columns next to negative value, its fetching all value .
Ok, try this
[akshay@localhost tmp]$ cat f
10000|latDate:1442457679000|lat:-99|co1|c2
10001|latDate:1448538549000|lat:3213374
10002|latDate:144558585|lat:34848006
10004|latDate:1442455248000|lat:35872402070386
10005|latDate:35193106096435
[akshay@localhost tmp]$ grep -oP 'lat:[-+]?\w+' f
lat:-99
lat:3213374
lat:34848006
lat:35872402070386
---------- Post updated at 07:03 PM ---------- Previous update was at 06:59 PM ----------
lat:[-+]?\w+
``` lat: ``` matches the characters lat: literally \(case sensitive\)
``` [-+]? ``` match a single character present in the list below
Quantifier: ? Between zero and one time, as many times as possible
``` \w+ ``` match any word character [a-zA-Z0-9_]
syd
March 29, 2016, 1:23am
8
PFB requirements I have
INPUT FILE
10000|1442448000000:[-99]|latChangeDate:14457679000|latest:-99
10001|1442448000000:[-99]|latChangeDate:14438549000|latest:351592032174|1448496000000:[35159203213374]
10002|1442448000000:[35578404848006],latChangeDate:1442468000|latest:35578404846
10003
10004|1442448000000:[35872402070386]|latChangeDate:14425248000|latest:358722070386
10005|1448755200000:[35193106096435]|latest:351931096435
code I have used to get value is
sed -i 's/"//g' input.txt
printf "ID|latChangeDate|latest\n" >> output1.csv
while read -r line
do
id=`echo $line | grep -o '^[0-9]\+'`
printf "$id|"
if [[ $line =~ latChangeDate ]]
then
lt_cre=`echo $line | grep -o 'latChangeDate:\w\+'`
lt_cre_val=`echo $lt_cre | cut -d: -f2`
printf "${lt_cre_val}|"
else
printf "XXXXXXXXX|"
fi
if [[ $line =~ latest ]]
then
lt_cre_ac_dt=`echo $line | grep -o 'latest:\w\+'`
lt_cre_ac_dt_val=`echo $lt_cre_ac_dt | cut -d: -f2`
printf "${lt_cre_ac_dt_val}\n"
else
printf "XXXXXXXXX\n"
fi
done < /input.txt
output I got --where -99 value is missing
ID|latChangeDate|latest
10000|14457679000|
10001|14438549000|35159203214
10002|1442468000|35578404846
10003|XXXXXXXXX|XXXXXXXXX
10004|14425248000|358722070386
10005|XXXXXXXXX|351931096435
but I should get
10000|14457679000|-99
10001|14438549000|35159203214
10002|1442468000|35578404846
10003|XXXXXXXXX|XXXXXXXXX
10004|14425248000|358722070386
10005|XXXXXXXXX|351931096435
where -99 value is missing
---------- Post updated 03-29-16 at 12:23 AM ---------- Previous update was 03-28-16 at 11:56 PM ----------
Hi Ravinder,
this is an another file with negative value.
Hello syd,
Not clear still, I am not seeing value 35159203214
in your Input_file, could you please elaborate more about your requirement so that we could help you more on this.
Thanks,
R. Singh
syd
March 29, 2016, 1:41am
10
PFb sample data
Input file
10000|1442000000:[-99]|latChangeDate:1400|latest:-99
10001|1446:[-99]|latChangeDate:144385400|latest:351592174|1448400:[35374]
10002|1424:[38006],latChangeDate:144246|latest:355746
10003
10004|14424:[358386]|latChangeDate:14425200|latest:35872386
10005|14480:[351435]|latest:351935
Output I should get is
ID|latChangeDate|latest
10000|1400|-99
10001|144385400|351592174
10002|144246|355746
10003|XXXX|XXXX
10004|14425200|35872386
10005|XXXX|351935
I have used code
sed -i 's/"//g' input.txt
printf "ID|latChangeDate|latest\n" >> output1.csv
while read -r line
do
id=`echo $line | grep -o '^[0-9]\+'`
printf "$id|"
if [[ $line =~ latChangeDate ]]
then
lt_cre=`echo $line | grep -o 'latChangeDate:\w\+'`
lt_cre_val=`echo $lt_cre | cut -d: -f2`
printf "${lt_cre_val}|"
else
printf "XXXXXXXXX|"
fi
if [[ $line =~ latest ]]
then
lt_cre_ac_dt=`echo $line | grep -o 'latest:\w\+'`
lt_cre_ac_dt_val=`echo $lt_cre_ac_dt | cut -d: -f2`
printf "${lt_cre_ac_dt_val}\n"
else
printf "XXXXXXXXX\n"
fi
done < /input.txt
for above script iam getting output as
ID|latChangeDate|latest
10000|1400|
10001|144385400|351592174
10002|144246|355746
10003|XXXX|XXXX
10004|14425200|35872386
10005|XXXX|351935
iam not getting negative value -99.
Hello syd,
Could you please try following and let me know if this helps you.
awk -F"|" 'BEGIN{print "ID|latChangeDate|latest"}{for(i=2;i<=NF;i++){if($i ~ /latChangeDate/){m=sub(/.*:/,X,$i);l++;Q=Q?Q OFS $i:$i};if($i ~ /latest/){n=sub(/.*:/,X,$i);Q=Q?Q OFS $i:$i};};if(!m && !n){Q="XXXX" OFS "XXXX"} else if(!m){Q="XXXX" OFS Q} else if(!n){Q=Q OFS "XXXX"};print $1 OFS Q;Q=m=n=""}' OFS="|" Input_file
Output will be as follows.
ID|latChangeDate|latest
10000|1400|-99
10001|144385400|351592174
10002|144246|355746
10003|XXXX|XXXX
10004|14425200|35872386
10005|XXXX|351935
EDIT: Adding a non-one liner form of solution on same now.
awk -F"|" 'BEGIN{
print "ID|latChangeDate|latest"
}
{
for(i=2;i<=NF;i++){
if($i ~ /latChangeDate/){
m=sub(/.*:/,X,$i);
Q=Q?Q OFS $i:$i
};
if($i ~ /latest/){
n=sub(/.*:/,X,$i);
Q=Q?Q OFS $i:$i};
};
if(!m && !n) {
Q="XXXX" OFS "XXXX"
}
else if(!m) {
Q="XXXX" OFS Q
}
else if(!n) {
Q=Q OFS "XXXX"
};
print $1 OFS Q;
Q=m=n=""
}
' OFS="|" Input_file
Hope this helps you.
Thanks,
R. Singh
1 Like
Try
Input
[akshay@localhost tmp]$ cat f
10000|1442000000:[-99]|latChangeDate:1400|latest:-99
10001|1446:[-99]|latChangeDate:144385400|latest:351592174|1448400:[35374]
10002|1424:[38006],latChangeDate:144246|latest:355746
10003
10004|14424:[358386]|latChangeDate:14425200|latest:35872386
10005|14480:[351435]|latest:351935
command
[akshay@localhost tmp]$ awk -vext='latChangeDate,latest' '
BEGIN{
FS=OFS="|"
split(ext,s,/,/)
print "ID|latChangeDate|latest"
}
{
str=$1
for(i=1; i in s; i++)
str = str OFS ((match($0,s":[^|]*"))? (substr($0,RSTART+length(s)+1,RLENGTH-(length(s)+1))) :"XXXX")
print str
}' f
output
ID|latChangeDate|latest
10000|1400|-99
10001|144385400|351592174
10002|144246|355746
10003|XXXX|XXXX
10004|14425200|35872386
10005|XXXX|351935
Here's a quick and dirty shell script to do the job. Handles any data.
#!/bin/ksh
infile="convertdata"
outfile="converteddata"
function filecleanup
{
if [[ -f $1 ]] ; then
rm $1
fi
}
function writeoutput
{
i=1
while :;
do
print $part_dt | cut -d, -f$i | read value
if [[ $value == '' ]] then
return;
fi
set +x
print $party"|"$value
((i += 1))
done
}
filecleanup $outfile
echo "Party|Part_dt"
cat $infile | while read rec
do
echo $rec | cut -d'|' -f1 | read party
echo $rec | cut -d'|' -f2 | read part_dt
part_dt=${part_dt#[} # Strip initial bracket
part_dt=${part_dt%]} # Strip trailing bracket
part_dt=${part_dt}"," # cut has strange behavior when no delimiter, so make sure each record has a delimiter
writeoutput #$party $part_dt
done
---------- Post updated at 01:55 PM ---------- Previous update was at 01:53 PM ----------
Output from above script:
Party|Part_dt
10000|12080000000
10002|13075200000
10002|-999
10003|13939200000
10004|1347200000
10004|133600000
10004|1152000000
10004|106400000
10004|12800000
10004|117200000
10004|145180000
10004|1451000000
10004|148400000
10004|14240000
10005|16000000