Differentiate 2 files name

yadavricky · April 9, 2015, 12:16pm

Hello All,

I have 2 Type of files.

1. MYTEST001_RKP_DORALDO_20150402120000.zip
2. CMP001_STD001_MOGANO_RPSL_20150409_C.zip

I can receive these Two type of file at one location.

If i receive second type of file

CMP001_STD001_MOGANO_RPSL_20150409_C.zip

I have to process without connecting to database because i have all the required input.

But if i receive First kind of file

MYTEST001_RKP_DORALDO_20150402120000.zip

I need to connect to database to fetch some value depending on MYTEST001.

Is there any way to differentiate these two types of file with above given naming convention.

Thanks for your help.

Corona688 · April 9, 2015, 1:25pm

case "$FILENAME" in
M*)
        echo "MYTEST file"
        ;;
CMP*)
        echo "CMP file"
        ;;
esac

yadavricky · April 9, 2015, 3:43pm

Thanks for your reply but the files are randomly generated. except there naming convention nothing is fixed. like for the first one.
<std>_<lb>_<vend>_datewithdetails_c.zip

Corona688 · April 9, 2015, 5:16pm

And what format does the other kind follow?

yadavricky · April 10, 2015, 4:02am

Thanks for your reply.

First File format would be as below

<compound>_<study>_<agent>_<ana>_YYYYMMDD_(C or 1).zip

Second File format would be as below

<study>_<agent>_<ana>_YYYYMMDD(randomconstant)_(C or 1).zip

I find one way to differentiate is count the _ in filename. is there any other option.

protocomm · April 10, 2015, 4:38am

try this:

if [ "$(ls file | awk -F_ '$5~/[0-9]{8}/')" != "" ];then echo GoodFile;fi

RudiC · April 10, 2015, 4:51am

The file names in your first post don't follow your spec in post#5. There's no C nor 1 in file name 1.
protocomm's proposal is fine for your sample names, but it relies on the count of the "_" as well; So why not use that?

MadeInGermany · April 10, 2015, 12:23pm

Let awk split on _ and print if the number of fields is less than 6:

 ls *.zip | awk -F_ 'NF<6'
MYTEST001_RKP_DORALDO_20150402120000.zip

yadavricky · April 20, 2015, 6:53am

Thanks to all for quick reply. But I am getting some problem because of difference in Unix and Linux option available for command.

I have 2 set of file..

First Set these should be treated seperatly

EFC10668_LAB_COVANCE_20150415120000.dat
EFC10668_LAB_COVANCE_20150415120000_1.dat
EFC10668_LAB_COVANCE_20150415120000_C.dat

Second Set these should be treated differently

AVE0005_EFC10668_COVANCE_LABL_20150409.dat
AVE0005_EFC10668_COVANCE_LABL_20150409_C.dat
AVE0005_EFC10668_COVANCE_LABL_20150409_1.dat

In linux i am counting the _ in filename but in HP UX I am able find any option.

grep -o

currently I am using below If condition to separate but I thing it is not going to work properly.

if [ "$(ls $n_filezip_name | awk '{FS="_"}  $5~/[0-9]{8}/')" = "" ];

Can someone help on this.

agent.kgb · April 20, 2015, 7:35am

I believe, counting of "_" is quite posixish:

echo "a_b_c_d" | awk -F_ '{print NF-1}'