File name and format validation

Hi Gurus,

I used unix long time back. I need help for writing a unix script which can be automated and execute every day on specific time.

1.) This is the actual functional requirement.
Informatica should reject incoming files that have invalid filenames or file formats

2.) My File name will be of below format:

<interface_name>_<source_name>_<template-type>_<sequence_number>_<datetime-stamp>.csv 

e.g. VJA_AP_BRSIN_00001_2601201012316.csv

3.) For eg: if I have the below fields in my file

SiteID
Num(3) 
ICOMS Action
TBD
SKU
Char(50) 
Serial
Char(16) 

I should validate the name of the file and also its format. For this, a shell script has to be written. Every day, i see 50 files in my incoming directory.

I feel, The level of field validation should be
A) Count of fields in a file should be matched
B) Datatypes and field value size should match
C) if any NOT NULL fields coming as NULL - check

I appreciate if some one could help me ASAP.

---------- Post updated at 07:54 AM ---------- Previous update was at 07:49 AM ----------

Hi All,

I also need to check whether the file has a header , detail records and footer in my validation

We can't do all your job, but give some ideas.

  1. reject invalid filenames,
BASE=/Mydirectory
for file in `find $BASE -type f `
do
  FN=`basename $file`
  if echo $FN |grep "^[A-Z]\{3\}_[A-Z]\{2\}_[A-Z]\{5\}_[0-9]\{5\}_[0-9]\{13\}.csv"  ; then 
       echo $file is valid file
  else 
       echo $file is invalid file
#     rm $file
   fi
done
  1. For file format and other requests, you need provide some samples to us first.

HI rdcwayx,

Thanks for your reply.

I haven't got exact file with me now. sorry for that.
But i know the details of the file.

File is a .csv file and it is comma separated with header, detail and footer values.

File name is VJA_AP_BRSIN_00001_2601201012316.csv

For eg: File has 6 fields which has numeric, string and date fields.

HSiteID,ICOMS Action,SKU,serial,edate
01,ABC,Pending,23,4,19951227120556
02,DM,Pending,26,5,19951227120556
03,RP,delivered,28,,19951227120556
T3

The level of field validation is more with 4 checks. As i have experience in working on small scripts few years back. I request some one to help me on this...

A) After loading the source VJA_AP_BRSIN_00001_2601201012316.csv file to target xxx.csv file, I need to count the reords matching the target count of records.(In Trailer, i have 3, it should match with target xxx.csv file record count.
B) Datatypes and field value size should match
I mean first field site id should be numeric, 2nd field ICOMS should be string, 3rd string, 4th numeric, 5th numeric, 6th date.
C) if any NOT NULL fields coming as NULL - check
If any of the required fields coming as NULL then we should create a log and send the message that file is invalid, correct it.
D) Need to check whether the file came with header , detail records and footer. If not, log it with message, no footer or no header, etc.

Thank you very much.

---------- Post updated at 11:59 AM ---------- Previous update was at 04:39 AM ----------

One more validation to be done was

E) No. of columns in source file should match as expected(for eg:6 as above file)
and i should check for 'n' no. of files at a time placed in a folder: Srcfile folder.

Thanks,
vsmeruga

hi some one help me on this

Hi All

Can you just help me on finding the NULLs from the existing list of columns.
2nd row - column4 is NULL, 3rd row - column5 is NULL

My file data will be like below:

01,ABC,Pending,23,4,19951227120556
02,DM,Pending,,5,19951227120556
03,RP,delivered,28,,19951227120556

with file name as ATRPU_RP_ATU_00008_05022010125056.csv

Thanks,
vsmeruga

awk -F, '{for(i=0; ++i<=NF;){
if($i==""){print "Row No " NR " and Column No " i " is null"}}}' infile

Thanks. I will try and know about it

---------- Post updated at 05:01 AM ---------- Previous update was at 04:40 AM ----------

Hi Malcome

Thanks for the quick reply. Let me re frame my question again.

Actually, I have 3 mandatory fields among the list of fields in a file. I should not get the values as NULL for those 3 fields

Mandatory Fields are : Feild1, Field4, Field5

If i find any NULL values in those 3 fields. I need to write the message to log as "Mandatory field : Field Num coming as NULL. File cannot be processed"

---------- Post updated at 06:19 AM ---------- Previous update was at 05:01 AM ----------

Hi trying to execute below script with script file name as interface_main_script.sh
and getting the error as below:

user prompt: ksh interface_main_script.sh
interface_main_script.sh[6]: syntax error at line 14 : `elif' unexpected

Please let me know my mistake.

interface_main_script.sh:

#!/bin/ksh
#set -x

BASE=/grid/PowerCenter/stage/velocity_r3/inbound/ATRPU

for file in `find $BASE -type f `
do
  FN=`basename $file`
  intname = $FN |grep "^[A-Z]\{5\}"  
  if $intname= 'ATRPU' then 
	echo "ATRPU Interface"
       #./AT_VELUNIX01_SRV01_inbound_ftp.sh ${intname}
  elif  $intname= 'ATRGI' then 
	echo "ATRGI Interface"
       #./AT_VELUNIX01_SRV01_inbound_ftp.sh ${intname}
  else
	echo "Not valid Interface Files"
   fi
done