Hi,
I am facing one scenario in which I need to extract exact position of date and time from the name of the files. For example, Below is the record in which I need to extract position of YYYYMMDD,HHMISS
and YYMMDD
. Date and time variables can come more than once. I need to use these position and rename ABC_123_YYYYMMDD_V01P1234YYMMDDHHMISS.txt
file in ABC_123_V01P.YYYYMMDD.HHMISS.txt
format i.e. If I receive file as ABC_123_20190401_V01P1234190401190242.txt
then I should be able to rename it as ABC_123_V01P.20190401.190242.txt.
$ cat RECORD
SOURCE|LOGIN|SERVER|ABC_123_YYYYMMDD_V01P1234YYMMDDHHMISS.txt|ABC_123_V01P.YYYYMMDD.HHMISS.txt|CATEGORY
$
I have tried below code but, I am not getting desired result.
My Code -
$ cat timestamp_format.csv | awk -F"," -v OFS="," '{print $1,$2,$3,$4}' | while IFS="," read FILE_TIMESTAMP FILE_TIMESTAMP_PARAM FILE_TIMESTAMP_REGEX EXP_FILE_TIMESTAMP_FORMAT
do
awk -v OFS="|" -v FILE_TIMESTAMP="${FILE_TIMESTAMP}" -v FILE_TIMESTAMP_REGEX="${FILE_TIMESTAMP_REGEX}" -v EXP_FILE_TIMESTAMP_FORMAT="${EXP_FILE_TIMESTAMP_FORMAT}" 'FNR==NR{a[NR]=$1;b[NR]=$3;c[NR]=$4;CNT=NR;next}
{
SOURCE_FILENAME=$4
TARGET_FILENAME=$5
for (i=1;i<=CNT;i++)
{
VAR=a;
DATE_FORMAT_REGEX=b;
S1=index($4,VAR);
L1=length(VAR);
S_FTP_FILE_TIMESTAMP1=index(SOURCE_FILENAME,FILE_TIMESTAMP);
L_FTP_FILE_TIMESTAMP1=length(FILE_TIMESTAMP);
TEMP_SOURCE_FILENAME=sub(VAR,DATE_FORMAT_REGEX,SOURCE_FILENAME);
S2=index(SOURCE_FILENAME,VAR);
L2=length(VAR);
S_FTP_FILE_TIMESTAMP2=index(SOURCE_FILENAME,FILE_TIMESTAMP);
L_FTP_FILE_TIMESTAMP2=length(FILE_TIMESTAMP);
for (j=1;j<=CNT;j++)
{
DVAR=a[j];
EXP_DATE_FORMAT=c[j];
DS1=index($5,DVAR);
DL1=length(DVAR);
DS_FILE_TIMESTAMP1=index(TARGET_FILENAME,FILE_TIMESTAMP);
DL_FILE_TIMESTAMP1=length(FILE_TIMESTAMP);
TEMP_TARGET_FILENAME=sub(VAR,DATE_FORMAT_REGEX,TARGET_FILENAME);
DS2=index(TARGET_FILENAME,DVAR);
DL2=length(DVAR);
DS_FILE_TIMESTAMP2=index(TARGET_FILENAME,FILE_TIMESTAMP);
DL_FILE_TIMESTAMP2=length(FILE_TIMESTAMP);
if ( S1 > 0 && DS1 > 0 && S_FTP_FILE_TIMESTAMP1 > 0 && DS_FILE_TIMESTAMP1 > 0 )
{
print $0,S1,L1,substr($4,S1,L1),DATE_FORMAT_REGEX,S_FTP_FILE_TIMESTAMP1,L_FTP_FILE_TIMESTAMP1,FILE_TIMESTAMP,FILE_TIMESTAMP_REGEX,S2,L2,substr($4,S2,L2),DATE_FORMAT_REGEX,S_FTP_FILE_TIMESTAMP2,L_FTP_FILE_TIMESTAMP2,FILE_TIMESTAMP,FILE_TIMESTAMP_REGEX,DS1,DL1,substr($5,DS1,DL1),EXP_DATE_FORMAT,DS_FILE_TIMESTAMP1,DL_FILE_TIMESTAMP1,FILE_TIMESTAMP,FILE_TIMESTAMP_REGEX,EXP_FILE_TIMESTAMP_FORMAT,DS2,DL2,substr($5,DS2,DL2),EXP_DATE_FORMAT,DS_FILE_TIMESTAMP2,DL_FILE_TIMESTAMP2,FILE_TIMESTAMP,FILE_TIMESTAMP_REGEX,EXP_FILE_TIMESTAMP_FORMAT
}
}
}
}' FS="," date_format.csv FS="|" RECORD
done
$
Mapping files used are -
$ cat date_format.csv
YYYYMMDD,[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9],........,%Y%m%d
YYYY-MM-DD,[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9],....-..-..,%Y-%m-%d
YYYY_MM_DD,[0-9][0-9][0-9][0-9]_[0-9][0-9]_[0-9][0-9],...._.._..,%Y_%m_%d
YYMMDD,[0-9][0-9][0-9][0-9][0-9][0-9],......,%y%m%d
YY-MM-DD,[0-9][0-9]-[0-9][0-9]-[0-9][0-9],..-..-..,%y-%m-%d
DD-MMM-YYYY,[0-9][0-9]-[a-z|A-Z][a-z|A-Z][a-z|A-Z]-[0-9][0-9][0-9][0-9],..-...-....,%d-%b-%Y
DD_MMM_YYYY,[0-9][0-9]_[a-z|A-Z][a-z|A-Z][a-z|A-Z]_[0-9][0-9][0-9][0-9],.._..._....,%d_%b_%Y
D-MMM-YYYY,[0-9]-[a-z|A-Z][a-z|A-Z][a-z|A-Z]-[0-9][0-9][0-9][0-9],.-...-....,%e-%b-%Y
D_MMM_YYYY,[0-9]_[a-z|A-Z][a-z|A-Z][a-z|A-Z]_[0-9][0-9][0-9][0-9],._..._....,%e_%b_%Y
$
$ cat timestamp_format.csv
XXXXXXXXXX,[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9],..........,%s
HHMISS,[0-9][0-9][0-9][0-9][0-9][0-9],......,%H%M%S
$
Above code is giving me this output, which is wrong -
SOURCE|LOGIN|SERVER|ABC_123_YYYYMMDD_V01P1234YYMMDDHHMISS.txt|ABC_123_V01P.YYYYMMDD.HHMISS.txt|CATEGORY|9|8|YYYYMMDD|........|32|6|HHMISS|......|0|8|ABC_123_|........|32|6|HHMISS|......|14|8|YYYYMMDD|%Y%m%d|23|6|HHMISS|......|%H%M%S|0|8|ABC_123_|%Y%m%d|23|6|HHMISS|......|%H%M%S
SOURCE|LOGIN|SERVER|ABC_123_YYYYMMDD_V01P1234YYMMDDHHMISS.txt|ABC_123_V01P.YYYYMMDD.HHMISS.txt|CATEGORY|9|8|YYYYMMDD|........|32|6|HHMISS|......|0|8|ABC_123_|........|32|6|HHMISS|......|16|6|YYMMDD|%y%m%d|23|6|HHMISS|......|%H%M%S|0|6|ABC_12|%y%m%d|23|6|HHMISS|......|%H%M%S
SOURCE|LOGIN|SERVER|ABC_123_YYYYMMDD_V01P1234YYMMDDHHMISS.txt|ABC_123_V01P.YYYYMMDD.HHMISS.txt|CATEGORY|11|6|YYMMDD|......|32|6|HHMISS|......|0|6|ABC_12|......|32|6|HHMISS|......|14|8|YYYYMMDD|%Y%m%d|23|6|HHMISS|......|%H%M%S|0|8|ABC_123_|%Y%m%d|23|6|HHMISS|......|%H%M%S
SOURCE|LOGIN|SERVER|ABC_123_YYYYMMDD_V01P1234YYMMDDHHMISS.txt|ABC_123_V01P.YYYYMMDD.HHMISS.txt|CATEGORY|11|6|YYMMDD|......|32|6|HHMISS|......|0|6|ABC_12|......|32|6|HHMISS|......|16|6|YYMMDD|%y%m%d|23|6|HHMISS|......|%H%M%S|0|6|ABC_12|%y%m%d|23|6|HHMISS|......|%H%M%S
Expected Output is as follows -
SOURCE|LOGIN|SERVER|ABC_123_YYYYMMDD_V01P1234YYMMDDHHMISS.txt|ABC_123_V01P.YYYYMMDD.HHMISS.txt|CATEGORY|9|8|YYYYMMDD|........|32|6|HHMISS|......|26|6|YYMMDD|......|0|0|||14|8|YYYYMMDD|%Y%m%d|23|6|HHMISS|......|%H%M%S|0|0||||0|0||||
Can someone advise on this code or suggest better approach?