Hi,
I am having huge file with the following lines.
2007:10:01:00:00:49:GMT: subject=BMRA.BM.T_ABTH7.FPN, message={SD=2007:10:01:00:00:00:GMT,SP=5,NP=2,TS=2007:10:01:01:00:00:GMT,VP=0.0,TS=2007:10:01:01:30:00:GMT,VP=0.0}
2007:10:01:00:00:49:GMT: subject=BMRA.BM.T_ABTH7G.FPN, message={SD=2007:10:01:00:00:00:GMT,SP=5,NP=2,TS=2007:10:01:01:00:00:GMT,VP=0.0,TS=2007:10:01:01:30:00:GMT,VP=0.0}
I need to parse them into the following format.
2007-10-01,T_ABTH7,2007-10-0100:00:00,5,0.0
2007-10-01,T_ABTH7G,2007-10-0100:00:00,5,0.0
Is there a way to parse the entire file without reading a single line of file and formating the output.
Thanks in advance.
there are so many occurences of 2 and 0.0 in the input lines.
Highlight the portions of input line that u want to report
Thanks
Penchal
Hi Penchal,
I want to parse the highlighted values.
2007:10:01:00:00:49:GMT: subject=BMRA.BM.T_ABTH7.FPN, message={SD=2007:10:01:00:00:00:GMT,SP=5,NP=2,TS=2007:10:01:01:00:00:GMT,VP=0.0,TS=2007:10:01:01:30: 00:GMT,VP=0.0}
First output column (2007-10-01): SD=2007:10:01:00:00:00
Second column (T_ABTH7): subject=BMRA.BM.T_ABTH7.FPN
Third Column (2007-10-0100:00:00): TS=2007:10:01:01:00:00
Fourth Column (5): SP=5
Fifth Column(0.0) : VP=0.0
Output for a single line :
2007-10-01,T_ABTH7,2007-10-0100:00:00,5,0.0
Please let me know if this clear.
One way is to use the pattern matching and substitution capabilities of ksh93 or bash
#!/usr/bin/ksh93
IFS=','
while read c1 c2 c3 c4 c5 c6 rest
do
tmp1=${c2##*SD=}
tmp1=${tmp1%%:00:00:00:GMT}
tmp2=${c1##*=BMRA.BM.}
tmp3=${c5##TS=}
tmp3=${tmp3%%:GMT}
tmp3A=${tmp3:0:10}
printf "%s,%s,%s%s,%s,%s\n" ${tmp1//:/-} ${tmp2%%.FPN} ${tmp3A//:/-} ${tmp3:11} ${c3##SP=} ${c6##VP=}
done < file
Output:
2007-10-01,T_ABTH7,2007-10-0101:00:00,5,0.0
2007-10-01,T_ABTH7G,2007-10-0101:00:00,5,0.0
Hi Murphy,
Thanks for the reply.
I am getting the following error message while runnig the script.
tmp3A=${tmp3:0:10}: bad substitution
ripat
May 16, 2008, 9:05am
6
Solution with gawk:
#!/usr/bin/awk -f
BEGIN {FS=","; OFS=","}
{
print \
gensub(/^.+([0-9][0-9][0-9][0-9]:[0-9][0-9]:[0-9][0-9]).+$/, "\\1", 1, $2),
gensub(/^.+subject=BMRA.BM.(.+).FPN/, "\\1", 1, $1),
gensub(/^TS=(.+):GMT/, "\\1", 1, $5),
gensub(/^SP=(.+)/, "\\1", 1, $3),
gensub(/^VP=(.+)/, "\\1", 1, $6)
}
If your version of awk doesn't support gensub(), there is a solution with substr() and match(). Let me know.
Hi Ripat,
Thanks for the reply. Unfortunately I am not able to run the solution. I think the gensub is not supported. Can you send me the other solution that you have with substr() and match().
Thanks
Is not the nicer solution, but you can try:
#! /usr/bin/ksh
#set -x
touch file_1
touch file_2
touch large
touch final
if [ -f large ]
then
cat /dev/null > final
fi
if [ -f final ]
then
cat /dev/null > final
fi
cat /dev/null > file_1
cat /dev/null > file_2
while read line
do
echo $line | grep subject >> file_1
echo $line | grep message >> file_2
done < $1
paste file_2 file_1 > large
while read line
do
SUB=$(echo $line | awk -F"." '{print $5}')
SD=$(echo $line | cut -b13-22 | sed 's/:/-/g')
SP=$(echo $line | cut -b40)
TS=$(echo $line |cut -b84-99 | sed 's/:/-/' | sed 's/:/-/')
VP=$(echo $line |cut -b112-114)
echo "$SD,$SUB,$TS,$SP,$VP" >> final
done < large
rm large file_1 file_2
_________
execution ./script file_name
the result is the file final
ripat
May 16, 2008, 11:50am
9
Without gensub:
#!/usr/bin/awk -f
BEGIN {FS=","; OFS=","}
{
one = substr($1, match($1, /subject=BMRA.BM./) + 17, 10)
sub(/\..+/, "", one)
print substr($2, match($2, /[0-9][0-9][0-9][0-9]:[0-9][0-9]:[0-9][0-9]/), 10),
one,
substr($5, match($5, /TS=/) + 3, 19),
substr($3, match($3, /SP=/) + 3),
substr($6, match($6, /VP=/) + 3)
}
[quote=nathasha;302195904
tmp3a=${tmp3:0:10}: bad substitution[/quote]
Then you either are not using a recent version of ksh93 (later than 1999) or your data structure has changed from what you provided as a sample.