Newbie problem with ksh script

Hi all,
I have a directory have all of the .stat and .dat file :
they are is a pipe separate flat file.
Example:
log-20061202.stat contain 1st line and last line of log-20061202.dat with record count of that day.
Example:
Total record = 240
Tom|02-12-2006|1600 W.Santa Clara|SanJose|95123|1001|ENG <--first
Mike|02-12-2006|23 Clayton Rd|San Francisco|94127|6666|PHY <-- last

log-20061202.dat have
NAME|DATETIME|Address|City|Zip|StudentID|Class
Tom|02-12-2006|1600 W.Santa Clara|SanJose|95123|1001|ENG <-- first
John|02-13-2006|234 Wlliam Rd|Oakland|94321|2324|MATH
..............................................
Mike|02-12-2006|23 Clayton Rd|San Francisco|94127|6666|PHY <--last

For each new log-yyyymmdd.dat file compare the record count from the stat file with the record count of the dat file. Raise an error if they do not match.

For each log-yyyymmdd.stat file have the first and last record of the dat file, confirm the first and last record of the dat file are in stat file. Raise an error if they do not match.

For each record that we are going to output file, the file to output is controled by the DATE field of the .dat file not the timestamp contained in the file name.

The output will have name yyyymmdd.dat and contain
NAME|STUDENTID|CLASS

This is a hard problem for me, please help me to learn more how to use ksh shell to solve it. Thanks for sharing.

Your problem seems like is devided into two questions.

1st being checking the record count for that you can simply do this:


RUNDATE=`date +%Y%m%d`
COUNTSTAT=`grep "Total record" log-${RUNDATE}.stat | awk -F"=" '{print $2}'` 
COUNTLOG=`wc -l log-${RUNDATE}.dat`

if [ ${COUNTSTAT} -ne ${COUNTLOG} ]
then
     echo "Count of dat ${COUNTLOG} doesn't match count of stat file ${COUNTSTAT}"
fi

Then coming to your second question:

Here the idea is to store First record and last record in different variables and then search for them in dat file

FIRSTLINE=`head -2 log-${RUNDATE}.stat |tail -1`
LASTLINE=`tail -1 log-${RUNDATE}.stat`

for i in $FIRSTLINE
do
LINECOUNT=`grep -n $i  log-20061202.dat | awk -F":" '{print $1}`
if [ ${LINECOUNT} -eq 1 ] 
then
     echo "FIRST RECORD MATCHED"
else
     echo "FIRST RECORD DIDN'T MATCH"
      exit
fi

for i in $LASTLINE
do
LINECOUNT=`grep -n $i  log-20061202.dat | awk -F":" '{print $1}`
COUNTLOG=`wc -l log-${RUNDATE}.dat`
LASTACTLINE=`expr ${COUNTLOG} - 1`
if [ ${LINECOUNT} -eq ${LASTACTLINE} ] 
then
     echo "LAST RECORD MATCHED"
else
     echo "LAST RECORD DIDN'T MATCH"
     exit
fi

Hi Anubhav,

If you set RUNDATE=`date +%Y%m%d`
then it will check todaydate, but if it run to test for directory and compare yyyymmdd.stat with yyyymmdd.dat then it can not run . For example they check it yesterday or the day before yesterday then it will not work :frowning:

Thanks for sharing,

Then it will be best to read the date in as a parameter. So when you run the script

Run it as follows:

eg abc.ksh <YYYYMMDD>

and in the code read $1 in rundate

RUNDATE=$1

Hope that helps!!

The problem I hate is :slight_smile:
In .dat file the DATE Field is $2 and and have mm-dd-yyyy hh:mm:ss
like 02-21-2006 12:05:06 , how do I get only 20060221 :slight_smile: ???

Can you show me the code one more time and I learn it. Thanks

echo 'Mike|02-12-2006 12:05:06|23 Clayton Rd|San Francisco|94127|6666|PHY' | nawk -F'|' '{split(substr($2, 1, index($2, " ")-1), t, "-"); print t[3] t[1] t[2]}'

or better yet:

echo 'Mike|02-12-2006 12:05:06|23 Clayton Rd|San Francisco|94127|6666|PHY' | nawk -F'|' '{split($2, t, "[ -]"); print t[3] t[1] t[2]}'

So you mean
for file in *dat; do
YYYYMMDD=grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") | print t[3] t[1] t[2]}'

It did not work :frowning:

what I meant was related to the original question:

It did not work
since your variable wasn't correct, you missed to open quotes before the grep..

YYYYMMDD=`grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") | print t[3] t[1] t[2]}`

I don't think you copied the original 'nawk' suggestion correctly. Pay closer attention to the matching ' characters AND to the ';' character in the 'nawk' statement.

Suggestion:
copy/paste the original suggestion and see if it works as is. Next try to adopt to your specific needs.

I tried with a line, then when I put it in, it give me the error :(, that's why I asked what did i do wrong ?

Syntax Error The source line is 1.
The error context is
{split(substr($2, 1, index($2, " ")-1), t, "-") | >>> print <<< t[3] t[1] t[2]}
awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.

Here is my code for my problem, please help me point out inccorect code

#!/bin/ksh -x

cd /DATA/
for file in *.dat; do
# Try to get date to create file name.
RUNDATE=`grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") | print t[3] t[1] t[2]}`

ALINES=$(cat ${RUNDATE}.dat | wc -l)
BFIRST=$(head -1 ${RUNDATE}.stat)

IFS=\'
set -- $BFIRST
BLINES=$2

if (( ALINES == BLINES ))
then
print "Records count is matched"
else
print "Records count is not matched"
fi

AFIRST=$(head -1 ${RUNDATE}.dat)
BTHREE=$(cat ${RUNDATE}.stat | sed -n 2p)
if [ ${AFIRST} = ${BTHREE}];
then
print "First record is matched"
else
print "First record is not matched"
fi

ALAST=$(tail -1 ${RUNDATE}.dat)
BFOUR=$(cat ${RUNDATE}.stat | sed -n 3p)
if [ ${ALAST} = ${BFOUR} ]
then
print "Last record is matched"
else
print "Last record is not matched"
fi

grep "|" ${RUNDATE}.dat |

sort -t"|" +1 -2|

awk -F"|" '{
if ($2 != "") {
name = $1;
datetime = $2;
studentid = $6
class = $7;
printf"%s|%s|%s\n",name,studentid,class;
}

}' |

sort -t, \+0 -1 &gt;&gt; /NEW/DATA/log-$\{RUNDATE\}.dat 

done

Thanks,

as i said previously......

Simply 'copy/paste' my ORIGINAL suggestion.

I am lost, i added ; but it did not work .... how dump am i :frowning:

do me a favor, pls! :wink:

can you copy/paste the line below onto your shell window and see if it works, PLS - do NOT edit it, do NOT retype it - just a simple copy/paste!!!

echo 'Mike|02-12-2006 12:05:06|23 Clayton Rd|San Francisco|94127|6666|PHY' | nawk -F'|' '{split($2, t, "[ -]"); print t[3] t[1] t[2]}'

If I copy and paste then it will work, but if I wrote in the script then it doesnot work, somthing is wrong with my $RUNDATE .
Sorry if i make you angry and I know you try to help me .

well...... then you have to revise your script.
how about this and you take it from there, eh?

RUNDATE=`grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") ; print t[3] t[1] t[2]}'`

Oh ! my mistake, I did put the ; in my code but did not remove the pipe because i though it is another command so it did not work . Thanks for being patient with me. How slow am I, but now atleast i am better than yesterday.

log-20061202.stat contain 1st line and last line of log-20061202.dat with record count of that day.
Example:
Total record = 240
Tom|02-12-2006|1600 W.Santa Clara|SanJose|95123|1001|ENG <--first
Mike|02-12-2006|23 Clayton Rd|San Francisco|94127|6666|PHY <-- last

log-20061202.dat have
NAME|DATETIME|Address|City|Zip|StudentID|Class
Tom|02-12-2006|1600 W.Santa Clara|SanJose|95123|1001|ENG <-- first
John|02-13-2006|234 Wlliam Rd|Oakland|94321|2324|MATH
..............................................
Mike|02-12-2006|23 Clayton Rd|San Francisco|94127|6666|PHY <--last

For each record, the file to the output is controlled by the DATETIME of the Data record . If I want the out put give me all the record of DATETIME 02-12-2006 to log.20060212.dat and 02-13-2006 to log.20060213.dat then what should I do ?? here is my broken code

#!/bin/ksh -x

cd /DATA/
for file in *.dat; do
# Try to get date to create file name.
RUNDATE=`grep "|" $file | nawk -F"|" '{split(substr($2, 1, index($2, " ")-1), t, "-") ; print t[3] t[1] t[2]}`

ALINES=$(cat ${RUNDATE}.dat | wc -l)
BFIRST=$(head -1 ${RUNDATE}.stat)

IFS=\'
set -- $BFIRST
BLINES=$2

if (( ALINES == BLINES ))
then
print "Records count is matched"
else
print "Records count is not matched"
fi

AFIRST=$(head -1 ${RUNDATE}.dat)
BTHREE=$(cat ${RUNDATE}.stat | sed -n 2p)
if [ ${AFIRST} = ${BTHREE}];
then
print "First record is matched"
else
print "First record is not matched"
fi

ALAST=$(tail -1 ${RUNDATE}.dat)
BFOUR=$(cat ${RUNDATE}.stat | sed -n 3p)
if [ ${ALAST} = ${BFOUR} ]
then
print "Last record is matched"
else
print "Last record is not matched"
fi

grep "|" ${RUNDATE}.dat |

sort -t"|" +1 -2|

awk -F"|" '{
if ($2 != "") {
name = $1;
datetime = $2;
studentid = $6
class = $7;
printf"%s|%s|%s\n",name,studentid,class;
}

}' |

sort -t, +0 -1 >> /NEW/DATA/log-${RUNDATE}.dat

done

and I get trouble here

RUNDATE=20060212
20060212
20060213
20060212
....
+ + wc -l
+ cat log.20060212 20060212 20060213 20060212.dat
cat: 0652-050 Cannot open log.20060212.
cat: 0652-050 Cannot open 20060212.
cat: 0652-050 Cannot open 20060213.
cat: 0652-050 Cannot open 20060212.

What will you do ?
Thanks for sharing your knowledge.

sorry, I cannot spend more time right now.
it seems [at least to me] that you'll need to understand what your [???] code is supposed to do AND debug it from there - putting the 'set -x' should be a good start to be able to see the flow of control.

Maybe others have more bandwidth than I do right now.

Good luck.