Table Cleanup Script

moe458 · October 8, 2010, 8:22am

I needed some help with a script to fetch and delete all records prior to 3 days from now connecting to sybase from sunos. I wrote the following script but not working..can someone please guide me with my code.

Thanks

#!/bin/ksh
##GET PREVIOUS DAY DATE
dt=`date | awk '{printf"%4d%2d%2d\n",$6,$2,($3-3)}' | sed 's/ /0/g'`

#GET THE RECORDS FROM 3 DAYS OLD
SQL_QUERY="select * from CLEANUPTABLE where convert(varchar(12), ORDER_DATE, 3) <= '${dt}'"
echo $SQL_QUERY
LOADER=`isql -U${dbuid} -P${dbpwd} -S${dbsrv} << EOF
${LOADER}
${SQL_QUERY} >>cleanup.out
go
EXIT;
EOF

if [ $? -ne 0 ]
then
echo "Failed to execute query: ${SQL_QUERY}">> $LOG_FILE
exit -10 ;
fi

DGPickett · October 8, 2010, 10:47am

Last I checked, truncate table does it all only for the table owner.

My favorite archival delete is something like:

select the delete target rows to a global temp table.
bcp that table out.
Verify the row and line counts match.
Using a script or stored procedure or TSQL loop, for 128 lines/rows at a time in clustered index column order, delete original rows and commit.

This ensures minimal target table impact, and nothing deleted is ever noy archived, even if the unattended script has a problem.

Missing the last go & eof linefeed, second ` misplaced, misspelled table name?

Must be a code frag, fi but not if.

You are removing partial days (three days to the runtime microsecond) if your date column is a datetime type:

Maybe you want distinct, but I always say, if you think distinct, reconsider as count(*) group by.

I am not a big fan of <<, as the behavior of echo '...' is more predictable and it flows better reading left to right. I haven't quantified if a tmp file create for << is cheaper than a pipe and fork for echo, just cleaner piped stream layout. I found your second ` in the middle, when in needed to be a line below EOF.

 
#!/bin/ksh
 
echo ' CLEANUPTABLE
' >cleanup.out
 
results=$(
echo "
delete CLEANUPTABLE
where datetime_col < dateadd ('dd',-3, getdate())
go
select datetime_col, count(*) ct
from CLEANUPTABLE
group by datetime_col
go
' | isql -U${dbuid} -P${dbpwd} -S${dbsrv} -w3000 -E vi
)

moe458 · October 8, 2010, 1:37pm

I'm able to run the query in the sybase and it works fine but when i try to put it in the script it gives errors..not sure what the problem is..i get the following error

sunos-moe > cleanup.sh
select cleanup.out cleanup.sh cleanup.tmp test.sh truncateEOL.sh truncateEOL.sql from CLEANUPTABLE  where convert(varchar(12), ORDER_DATE, 3) <= '20100005'
cleanup.sh[6]: syntax error at line 6 : ``' unmatched
sunos-moe >

---------- Post updated at 10:51 AM ---------- Previous update was at 10:49 AM ----------

[/COLOR]

1> select count(*) from CLEANUPTABLE  where convert(varchar(12), ORDER_DATE, 3) <= '20100810'
2> go
             
 ----------- 
       97039 

(1 row affected)
1>

here is the what I had pulled from the database...i don't want to truncate anything unless i'm able to select..there are 130K records in there...just want to clean up what i want..let's do the select and deal with the truncate later

---------- Post updated at 01:37 PM ---------- Previous update was at 10:51 AM ----------

Okay I got it resolved.

#!/bin/ksh

dt=`date | awk '{printf"%4d%2d%2d\n",$6,$2,($3-3)}' | sed 's/ /0/g'`
isql -U${dbuid} -P${dbpwd} -S${dbsrv} << EOF
select * from CLEANUPTABLE where convert(varchar(12), ORDER_DATE, 3) <= '${dt}' 
go
output to cleanup.out
EXIT;
EOF

DGPickett · October 12, 2010, 10:33am

At the low level, see man page for date + option.

The problem with date and getdate() is that you do not want to be tied to the clock at any low level, ever, except writing history rows and similar real time activities. You should be able to test with arbitrary data/dates, or if there is an outage, clean up on a later day.

You probably want each archive file to hold all of one day. You could select each distinct date old enough to archive and loop for each date at a high level to archive and delete, so it automatically does more days if it fails to run some days.

T-SQL can can convert date strings using select without from eith in a scalar subquery to load a variable.

You do not want to convert ever row in the table if you can avoid it, you want to convert your bound(s) constants, especially if there is a useful index. BTW, a table clustered on time is lock-prone for non-batch use, as all users are churning at the end of the table.

You want to ensure that either:

the date_time_col is a string, or
the date_time_col is always loaded with a date and no time, or
the date_time_col is processed in a range for the one day, e.g., "date_time_col >= day and date_time_col < (day + 1)".

Wasting a date-time datatype with just date is confusing, too, so often designers use either a CCYYMMDD string, a CCYYMMDD as literal int or a int days since epoch date. We all have 8 byte wide CPUs these days, so getting it down to 4 bytes is mostly useful for space saving.