Cutting out text from specific portion on filename

GermanJulian · May 12, 2010, 6:50am

Hi,

how do I go about cutting out the first numeric characters after the word "access"?

access1005101228.merged-00.15.17.86.d8.b8.log.gz

Scott · May 12, 2010, 6:52am

$ echo access1005101228.merged-00.15.17.86.d8.b8.log.gz | cut -c7-12
100510

GermanJulian · May 12, 2010, 7:01am

oh sorry

did not make it clear that I am running a find command and there might be different directories before the actual filename so a cut will not work

Franklin52 · May 12, 2010, 7:06am

This command gives the 6 characters after the word access:

sed -e 's/.*access\(......\).*/\1/'

vgersh99 · May 12, 2010, 7:19am

echo 'access1005101228.merged-00.15.17.86.d8.b8.log.gz' | sed 's/.*access\([^.][^.]*\).*/\1/'

ygemici · May 12, 2010, 7:25am

Bash solution is more simple

a="access1005101228.merged-00.15.17.86.d8.b8.log.gz"

echo ${a:6:6}
100510

Franklin52 · May 12, 2010, 7:33am

As mentioned by the OP:

there might be different directories before the actual filename

vgersh99 · May 12, 2010, 7:36am

... and the length of the stream of numbers following 'access' might vary as well (most likely).

frans · May 12, 2010, 7:52am

bash way

A="access1005101228.merged-00.15.17.86.d8.b8.log.gz"
D="access"

A=${A#$D}
A=${A:0:6}

clx · May 12, 2010, 8:00am

try:

find .....  | awk -F "/" '{print substr($NF,7,6)}'

ygemici · May 12, 2010, 9:14am

yes you are right i had forgetten this

then allright

a=/root/test/sdasda/sdasdasd/access1005101228.merged-00.15.17.86.d8.b8.log.gz

a=${a#*access*} ; echo ${a:0:6}
100510

GermanJulian · May 12, 2010, 9:48am

Hi,

thanks for all your help. I like the sed solution a lot and my ueber script is nearly finished however now I have one more file with a different structure

/dir1/dir2/maybeotherDIRS/http_log.SOMENAME.2010.05.01.00.45.00.230.gz

again I need to extract the date portion.

Franklin52 · May 12, 2010, 10:13am

germanjulian:

Hi,

thanks for all your help. I like the sed solution a lot and my ueber script is nearly finished however now I have one more file with a different structure
/dir1/dir2/maybeotherDIRS/http_log.SOMENAME.2010.05.01.00.45.00.230.gz
again I need to extract the date portion.

One way:

sed 's/.*\([0-9]\{4\}\).\([0-9]\{2\}\).\([0-9]\{2\}\).*/\1\2\3/'

vgersh99 · May 12, 2010, 11:00am

echo '/dir1/dir2/maybeotherDIRS/http_log.SOMENAME.2010.05.01.00.45.00.230.gz' | sed  -e 's#.*/##' -e 's#[^0-9][^0-9]*[.]\([0-9][0-9.][0-9.]*\)[.].*#\1#'

durden_tyler · May 12, 2010, 12:24pm

Or, with Perl -

$
$
$ echo '/dir1/dir2/maybeotherDIRS/http_log.SOMENAME.2010.05.01.00.45.00.230.gz' | perl -pne 's/^.*?\.([\d\.]+)\..*$/$1/'
2010.05.01.00.45.00.230
$
$

tyler_durden

GermanJulian · May 12, 2010, 12:24pm

thanks

ls -R1 /EMEA/*/*/http_log*.gz |sed  -e 's#.*/##' -e 's#[^0-9][^0-9]*[.]\([0-9][0-9.][0-9.]*\)[.].*#\1#' | cut -d. -f3,4,5 | tr -d "."

clx · May 12, 2010, 3:23pm

I don't think its a good way to invoke three external commands and pipes just to get any sub string. (at least for your requirement).

if your files patterns are same.

ls -R1 /EMEA/*/*/http_log*.gz | awk -F"." '{print $5$6$7}'