extract string portion from filename using sed

santam · May 22, 2009, 9:11am

Hi All,

I posted something similar before but I now have a another problem.

I have filenames as below

TOP_TABIN240_20090323.200903231830
TOP_TABIN235_1_20090323.200903231830

i need to extract the dates as in bold. Using bash v 3.xx

Im trying to using the print sed command but not getting the result I want

source_file_issue_date=`echo $fname | sed '/_[0-9][0-9]*[.]/p'`
echo "source_file_issue_date: $source_file_issue_date"

Appreciate any ideas?!

Kind Regards
Satnam

santam · May 22, 2009, 9:27am

realised I have to use substitute but but still no joy

sed 's/^.*[a-zA-Z]*[.][0-9][0-9]$//'`

Franklin52 · May 22, 2009, 9:31am

Try:

sed 's/.*_\(.*\)\..*/\1/'

santam · May 22, 2009, 9:47am

works like a treat!

now Im trying to figure the logic behind it!

Thanks again

Satnam

ghostdog74 · May 22, 2009, 9:53am

# ls -1 TOP*|awk 'BEGIN{FS="[_.]"}{print $(NF-1)}'

fpmurphy · May 22, 2009, 11:33am

BTW, there is no need to invoke sed to parse the filename. It can all be done within bash i.e.

$ fname="TOP_TABIN240_20090323.200903231830"
$ source_file_issue_date=$(tmp=${fname/*_}; echo ${tmp/\.*})
$ echo "source_file_issue_date: $source_file_issue_date"
source_file_issue_date: 20090323
$

santam · May 22, 2009, 12:07pm

yup, I have to say this is all a learning experience!

Thanks to all for feedback!

Regards
Satnam

ghostdog74 · May 22, 2009, 1:08pm

fpmurphy:

BTW, there is no need to invoke sed to parse the filename. It can all be done within bash i.e.
$ fname="TOP_TABIN240_20090323.200903231830"
$ source_file_issue_date=$(tmp=${fname/*_}; echo ${tmp/\.*})
$ echo "source_file_issue_date: $source_file_issue_date"
source_file_issue_date: 20090323
$

well, you would still need a loop to go through many such files. if its a while loop, such as

ls TOP* | while read F
do
 ....
done

then might as well pipe these files to sed (or awk) as its faster this way.

jim_mcnamara · May 22, 2009, 1:44pm

Are you sure? Each iteration of a loop with a pipe (if I got what you meant) means a new child process. 20000 files = 20000 processes. Lots of overhead.

With the shell-only version it does do things in a single process.

What IS regrettable (IMO) is that bash keeps changing - you can only do xx operation with version x.y and higher. While the same thing happens elsewhere: ksh vs maybe ksh93, newbies do not have a clue what bash version they have.

radoulov · May 22, 2009, 2:03pm

The OP uses bash so there is no need to loop or to use external commands (assuming no IFS characters in the filenames):

% ls TOP*
TOP_TABIN235_1_20090323.200903231830  TOP_TABIN240_20090323.200903231830
% files=(TOP*) files=(${files[@]%.*})
% printf '%s\n' "${files[@]##*_}"    
20090323
20090323

maxim42 · May 22, 2009, 3:39pm

hi every body
i tried something may be add value in this discussion

tmp=TOP_TABIN240_20090323.200903231830
solve1=${tmp##_} ; echo ${solve1%%.}
20090323

ghostdog74 · May 22, 2009, 8:27pm

yes, you are right, i am saying that.

I see that radoulov has given a pure bash solution, well , even so, there is not much of a difference in performance whether one do it in pure bash syntax or pipe to tools like awk.