parse of lines with different delimiters

nathasha · May 16, 2008, 5:19am

Hi,

I am having huge file with the following lines.

2007:10:01:00:00:49:GMT: subject=BMRA.BM.T_ABTH7.FPN, message={SD=2007:10:01:00:00:00:GMT,SP=5,NP=2,TS=2007:10:01:01:00:00:GMT,VP=0.0,TS=2007:10:01:01:30:00:GMT,VP=0.0}
2007:10:01:00:00:49:GMT: subject=BMRA.BM.T_ABTH7G.FPN, message={SD=2007:10:01:00:00:00:GMT,SP=5,NP=2,TS=2007:10:01:01:00:00:GMT,VP=0.0,TS=2007:10:01:01:30:00:GMT,VP=0.0}

I need to parse them into the following format.

2007-10-01,T_ABTH7,2007-10-0100:00:00,5,0.0
2007-10-01,T_ABTH7G,2007-10-0100:00:00,5,0.0

Is there a way to parse the entire file without reading a single line of file and formating the output.

Thanks in advance.

penchal_boddu · May 16, 2008, 5:31am

there are so many occurences of 2 and 0.0 in the input lines.

Highlight the portions of input line that u want to report

Thanks
Penchal

nathasha · May 16, 2008, 5:46am

Hi Penchal,

I want to parse the highlighted values.

2007:10:01:00:00:49:GMT: subject=BMRA.BM.T_ABTH7.FPN, message={SD=2007:10:01:00:00:00:GMT,SP=5,NP=2,TS=2007:10:01:01:00:00:GMT,VP=0.0,TS=2007:10:01:01:30: 00:GMT,VP=0.0}

First output column (2007-10-01): SD=2007:10:01:00:00:00

Second column (T_ABTH7): subject=BMRA.BM.T_ABTH7.FPN

Third Column (2007-10-0100:00:00): TS=2007:10:01:01:00:00

Fourth Column (5): SP=5

Fifth Column(0.0) : VP=0.0

Output for a single line :
2007-10-01,T_ABTH7,2007-10-0100:00:00,5,0.0

Please let me know if this clear.

fpmurphy · May 16, 2008, 7:12am

One way is to use the pattern matching and substitution capabilities of ksh93 or bash

#!/usr/bin/ksh93

IFS=','
while read c1 c2 c3 c4 c5 c6 rest
do
   tmp1=${c2##*SD=}
   tmp1=${tmp1%%:00:00:00:GMT}
   tmp2=${c1##*=BMRA.BM.}
   tmp3=${c5##TS=}
   tmp3=${tmp3%%:GMT}
   tmp3A=${tmp3:0:10}
   printf "%s,%s,%s%s,%s,%s\n" ${tmp1//:/-} ${tmp2%%.FPN} ${tmp3A//:/-} ${tmp3:11} ${c3##SP=} ${c6##VP=}
done < file

Output:

2007-10-01,T_ABTH7,2007-10-0101:00:00,5,0.0
2007-10-01,T_ABTH7G,2007-10-0101:00:00,5,0.0

nathasha · May 16, 2008, 7:25am

Hi Murphy,

Thanks for the reply.

I am getting the following error message while runnig the script.

tmp3A=${tmp3:0:10}: bad substitution

ripat · May 16, 2008, 9:05am

Solution with gawk:

#!/usr/bin/awk -f
BEGIN {FS=","; OFS=","}
{
          print \
                  gensub(/^.+([0-9][0-9][0-9][0-9]:[0-9][0-9]:[0-9][0-9]).+$/, "\\1", 1, $2),
                  gensub(/^.+subject=BMRA.BM.(.+).FPN/, "\\1", 1, $1),
                  gensub(/^TS=(.+):GMT/, "\\1", 1, $5),
                  gensub(/^SP=(.+)/, "\\1", 1, $3),
                  gensub(/^VP=(.+)/, "\\1", 1, $6)
}

If your version of awk doesn't support gensub(), there is a solution with substr() and match(). Let me know.

nathasha · May 16, 2008, 9:26am

Hi Ripat,

Thanks for the reply. Unfortunately I am not able to run the solution. I think the gensub is not supported. Can you send me the other solution that you have with substr() and match().

Thanks

alamitab · May 16, 2008, 11:26am

Is not the nicer solution, but you can try:

#! /usr/bin/ksh
#set -x
touch file_1
touch file_2
touch large
touch final
if [ -f large ]
then
cat /dev/null > final
fi
if [ -f final ]
then
cat /dev/null > final
fi
cat /dev/null > file_1
cat /dev/null > file_2
while read line
do
        echo $line | grep subject >> file_1
        echo $line | grep message >> file_2
done < $1
paste file_2 file_1 > large
while read line
do
SUB=$(echo $line | awk -F"." '{print $5}')
SD=$(echo $line | cut -b13-22 | sed 's/:/-/g')
SP=$(echo $line | cut -b40)
TS=$(echo $line |cut -b84-99 | sed 's/:/-/' | sed 's/:/-/')
VP=$(echo $line |cut -b112-114)
echo "$SD,$SUB,$TS,$SP,$VP" >> final
done < large
rm large file_1 file_2

_________

execution ./script file_name
the result is the file final

ripat · May 16, 2008, 11:50am

Without gensub:

#!/usr/bin/awk -f
BEGIN {FS=","; OFS=","}
{
                one = substr($1, match($1, /subject=BMRA.BM./) + 17, 10)
                sub(/\..+/, "", one)
                print substr($2, match($2, /[0-9][0-9][0-9][0-9]:[0-9][0-9]:[0-9][0-9]/), 10),
                one, 
                substr($5, match($5, /TS=/) + 3, 19),
                substr($3, match($3, /SP=/) + 3),
                substr($6, match($6, /VP=/) + 3)
}

fpmurphy · May 16, 2008, 8:22pm

[quote=nathasha;302195904
tmp3a=${tmp3:0:10}: bad substitution[/quote]

Then you either are not using a recent version of ksh93 (later than 1999) or your data structure has changed from what you provided as a sample.