FILE_ID extraction from file name and save it in CSV file after looping through each folders
My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that?
I have folders in unix environment, directory structure is structured as follows
year folder -> inside 12 months folders -> inside 30/31 days folders
I ran ls command folder
year as follows
2009 2010 2011 2012
I ran cd command for year 2012
$ cd 2012
I ran ls command for 2012 year folder
$ ls
01 02 03 04 05 06 07 08 09
then I ran command for september
$ cd 09
$ ls
01 02 03 04 05 06 07 08 09 10 11 12 13
$ cd 13
$ ls
sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz
sasmm_fsbc_durds_id00020513_t20120913003312.dat.trnsfr.gz
there are folders for each year like 2009,2010,2011 and 2012
and folder has 12 folders for each months like 01,02,03,04,05,06,07,08,09,10,11,12
and each month folder has 31 folders for days like 1,2,3, etc... 29,30,31
inside each day folder has files..
the file name is as follows,
sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz
sasmm_fsbc_durds_id00020513_t20120913003312.dat.trnsfr.gz
I want to have one csv file and that file needs to have two columns , one is for file_id and
second field is for file name.
to obtain file_id value ,loop through each folders and get file name, then read file name and
get substring between "sasmm_fsbc_durds_id000" and _t and store it in file_id column and store
file name in file_name column.
in above example for file sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz
read file name sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz
cut 20532 and save it in a file_id clumn and the whole file name in second column = sasmm_fsbc_durds_id00020532_t20100313192606.dat
CSV file will look like
file_id file_name
20532 sasmm_fsbc_durds_id00020532_t20100313192606.dat
20513 sasmm_fsbc_durds_id00020513_t20120913003312.dat
file_id is to be cut from the file name , if you look at the file name closely, you can see;
after 000 , file_ids in above file name examples , they are 20532 and 20513.
How do I loop through year 2012 and 12 months folders and 31 days folders inside it and create
csv file which has data as shown above?
I am very new unix, please help me out.. If you provide a code , that would be great..
thanks..
output CSV file look like this
file_id file_name
20532 sasmm_fsbc_durds_id00020532_t20100313192606.dat
20513 sasmm_fsbc_durds_id00020513_t20120913003312.dat
do we need to search files recursively for finding file in each folder or to go dwon to day folder?