Newbie problem with ksh script

sabercats · February 21, 2006, 12:26pm

Hi all,
I have a directory have all of the .stat and .dat file :
they are is a pipe separate flat file.
Example:
log-20061202.stat contain 1st line and last line of log-20061202.dat with record count of that day.
Example:
Total record = 240
Tom|02-12-2006|1600 W.Santa Clara|SanJose|95123|1001|ENG <--first
Mike|02-12-2006|23 Clayton Rd|San Francisco|94127|6666|PHY <-- last

For each new log-yyyymmdd.dat file compare the record count from the stat file with the record count of the dat file. Raise an error if they do not match.

For each log-yyyymmdd.stat file have the first and last record of the dat file, confirm the first and last record of the dat file are in stat file. Raise an error if they do not match.

For each record that we are going to output file, the file to output is controled by the DATE field of the .dat file not the timestamp contained in the file name.

The output will have name yyyymmdd.dat and contain
NAME|STUDENTID|CLASS

This is a hard problem for me, please help me to learn more how to use ksh shell to solve it. Thanks for sharing.

Anubhav · February 21, 2006, 1:56pm

Your problem seems like is devided into two questions.

1st being checking the record count for that you can simply do this:


RUNDATE=`date +%Y%m%d`
COUNTSTAT=`grep "Total record" log-${RUNDATE}.stat | awk -F"=" '{print $2}'` 
COUNTLOG=`wc -l log-${RUNDATE}.dat`

if [ ${COUNTSTAT} -ne ${COUNTLOG} ]
then
     echo "Count of dat ${COUNTLOG} doesn't match count of stat file ${COUNTSTAT}"
fi

Then coming to your second question:

Here the idea is to store First record and last record in different variables and then search for them in dat file

FIRSTLINE=`head -2 log-${RUNDATE}.stat |tail -1`
LASTLINE=`tail -1 log-${RUNDATE}.stat`

for i in $FIRSTLINE
do
LINECOUNT=`grep -n $i  log-20061202.dat | awk -F":" '{print $1}`
if [ ${LINECOUNT} -eq 1 ] 
then
     echo "FIRST RECORD MATCHED"
else
     echo "FIRST RECORD DIDN'T MATCH"
      exit
fi

for i in $LASTLINE
do
LINECOUNT=`grep -n $i  log-20061202.dat | awk -F":" '{print $1}`
COUNTLOG=`wc -l log-${RUNDATE}.dat`
LASTACTLINE=`expr ${COUNTLOG} - 1`
if [ ${LINECOUNT} -eq ${LASTACTLINE} ] 
then
     echo "LAST RECORD MATCHED"
else
     echo "LAST RECORD DIDN'T MATCH"
     exit
fi

sabercats · February 21, 2006, 2:08pm

Hi Anubhav,

If you set RUNDATE=`date +%Y%m%d`
then it will check todaydate, but if it run to test for directory and compare yyyymmdd.stat with yyyymmdd.dat then it can not run . For example they check it yesterday or the day before yesterday then it will not work

Thanks for sharing,

Anubhav · February 21, 2006, 2:17pm

Then it will be best to read the date in as a parameter. So when you run the script

Run it as follows:

eg abc.ksh <YYYYMMDD>

and in the code read $1 in rundate

RUNDATE=$1

Hope that helps!!

sabercats · February 21, 2006, 2:26pm

The problem I hate is
In .dat file the DATE Field is $2 and and have mm-dd-yyyy hh:mm:ss
like 02-21-2006 12:05:06 , how do I get only 20060221 ???

Can you show me the code one more time and I learn it. Thanks

vgersh99 · February 21, 2006, 2:51pm

echo 'Mike|02-12-2006 12:05:06|23 Clayton Rd|San Francisco|94127|6666|PHY' | nawk -F'|' '{split(substr($2, 1, index($2, " ")-1), t, "-"); print t[3] t[1] t[2]}'

or better yet:

echo 'Mike|02-12-2006 12:05:06|23 Clayton Rd|San Francisco|94127|6666|PHY' | nawk -F'|' '{split($2, t, "[ -]"); print t[3] t[1] t[2]}'

sabercats · February 21, 2006, 3:23pm

So you mean
for file in *dat; do
YYYYMMDD=grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") | print t[3] t[1] t[2]}'

It did not work

vgersh99 · February 21, 2006, 5:01pm

what I meant was related to the original question:

Anubhav · February 22, 2006, 2:04pm

It did not work
since your variable wasn't correct, you missed to open quotes before the grep..

YYYYMMDD=`grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") | print t[3] t[1] t[2]}`

vgersh99 · February 22, 2006, 2:19pm

anubhav:

It did not work
since your variable wasn't correct, you missed to open quotes before the grep..
YYYYMMDD=`grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") | print t[3] t[1] t[2]}`

I don't think you copied the original 'nawk' suggestion correctly. Pay closer attention to the matching ' characters AND to the ';' character in the 'nawk' statement.

Suggestion:
copy/paste the original suggestion and see if it works as is. Next try to adopt to your specific needs.

sabercats · February 22, 2006, 3:41pm

I tried with a line, then when I put it in, it give me the error :(, that's why I asked what did i do wrong ?

Syntax Error The source line is 1.
The error context is
{split(substr($2, 1, index($2, " ")-1), t, "-") | >>> print <<< t[3] t[1] t[2]}
awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.

sabercats · February 22, 2006, 3:57pm

Here is my code for my problem, please help me point out inccorect code

#!/bin/ksh -x

cd /DATA/
for file in *.dat; do
# Try to get date to create file name.
RUNDATE=`grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") | print t[3] t[1] t[2]}`

ALINES=$(cat ${RUNDATE}.dat | wc -l)
BFIRST=$(head -1 ${RUNDATE}.stat)

IFS=\'
set -- $BFIRST
BLINES=$2

if (( ALINES == BLINES ))
then
print "Records count is matched"
else
print "Records count is not matched"
fi

AFIRST=$(head -1 ${RUNDATE}.dat)
BTHREE=$(cat ${RUNDATE}.stat | sed -n 2p)
if [ ${AFIRST} = ${BTHREE}];
then
print "First record is matched"
else
print "First record is not matched"
fi

ALAST=$(tail -1 ${RUNDATE}.dat)
BFOUR=$(cat ${RUNDATE}.stat | sed -n 3p)
if [ ${ALAST} = ${BFOUR} ]
then
print "Last record is matched"
else
print "Last record is not matched"
fi

grep "|" ${RUNDATE}.dat |

sort -t"|" +1 -2|

awk -F"|" '{
if ($2 != "") {
name = $1;
datetime = $2;
studentid = $6
class = $7;
printf"%s|%s|%s\n",name,studentid,class;
}

}' |

sort -t, \+0 -1 &gt;&gt; /NEW/DATA/log-$\{RUNDATE\}.dat

done

Thanks,

vgersh99 · February 22, 2006, 4:00pm

as i said previously......

Simply 'copy/paste' my ORIGINAL suggestion.

sabercats · February 22, 2006, 6:08pm

I am lost, i added ; but it did not work .... how dump am i

vgersh99 · February 22, 2006, 6:36pm

do me a favor, pls!

can you copy/paste the line below onto your shell window and see if it works, PLS - do NOT edit it, do NOT retype it - just a simple copy/paste!!!

echo 'Mike|02-12-2006 12:05:06|23 Clayton Rd|San Francisco|94127|6666|PHY' | nawk -F'|' '{split($2, t, "[ -]"); print t[3] t[1] t[2]}'

sabercats · February 22, 2006, 7:46pm

If I copy and paste then it will work, but if I wrote in the script then it doesnot work, somthing is wrong with my $RUNDATE .
Sorry if i make you angry and I know you try to help me .

vgersh99 · February 22, 2006, 11:44pm

well...... then you have to revise your script.
how about this and you take it from there, eh?

RUNDATE=`grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") ; print t[3] t[1] t[2]}'`

sabercats · February 23, 2006, 11:59am

Oh ! my mistake, I did put the ; in my code but did not remove the pipe because i though it is another command so it did not work . Thanks for being patient with me. How slow am I, but now atleast i am better than yesterday.

log-20061202.stat contain 1st line and last line of log-20061202.dat with record count of that day.
Example:
Total record = 240
Tom|02-12-2006|1600 W.Santa Clara|SanJose|95123|1001|ENG <--first
Mike|02-12-2006|23 Clayton Rd|San Francisco|94127|6666|PHY <-- last

For each record, the file to the output is controlled by the DATETIME of the Data record . If I want the out put give me all the record of DATETIME 02-12-2006 to log.20060212.dat and 02-13-2006 to log.20060213.dat then what should I do ?? here is my broken code

#!/bin/ksh -x

cd /DATA/
for file in *.dat; do
# Try to get date to create file name.
RUNDATE=`grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") ; print t[3] t[1] t[2]}`

ALINES=$(cat ${RUNDATE}.dat | wc -l)
BFIRST=$(head -1 ${RUNDATE}.stat)

IFS=\'
set -- $BFIRST
BLINES=$2

if (( ALINES == BLINES ))
then
print "Records count is matched"
else
print "Records count is not matched"
fi

AFIRST=$(head -1 ${RUNDATE}.dat)
BTHREE=$(cat ${RUNDATE}.stat | sed -n 2p)
if [ ${AFIRST} = ${BTHREE}];
then
print "First record is matched"
else
print "First record is not matched"
fi

ALAST=$(tail -1 ${RUNDATE}.dat)
BFOUR=$(cat ${RUNDATE}.stat | sed -n 3p)
if [ ${ALAST} = ${BFOUR} ]
then
print "Last record is matched"
else
print "Last record is not matched"
fi

grep "|" ${RUNDATE}.dat |

sort -t"|" +1 -2|

awk -F"|" '{
if ($2 != "") {
name = $1;
datetime = $2;
studentid = $6
class = $7;
printf"%s|%s|%s\n",name,studentid,class;
}

}' |

sort -t, +0 -1 >> /NEW/DATA/log-${RUNDATE}.dat

done

and I get trouble here

RUNDATE=20060212
20060212
20060213
20060212
....
+ + wc -l
+ cat log.20060212 20060212 20060213 20060212.dat
cat: 0652-050 Cannot open log.20060212.
cat: 0652-050 Cannot open 20060212.
cat: 0652-050 Cannot open 20060213.
cat: 0652-050 Cannot open 20060212.

What will you do ?
Thanks for sharing your knowledge.

vgersh99 · February 23, 2006, 1:37pm

sorry, I cannot spend more time right now.
it seems [at least to me] that you'll need to understand what your [???] code is supposed to do AND debug it from there - putting the 'set -x' should be a good start to be able to see the flow of control.

Maybe others have more bandwidth than I do right now.

Good luck.