search & replace pattern

Hi,

My problem is that I have to search a changing pattern and replace it with the wild card char "*"

 
i/p: 99_*_YYYYMMDD_SRC.txt.tar.gz
o/p: 99_*_*_SRC.txt.tar.gz

The problem is that YYYYMMDD pattern is not static. It could be YYYYMMDDHHMI or could be YYYYMMDDHHMISS.

Can someone please help me here?

-dips

echo '99_*_YYYYMMDD_SRC.txt.tar.gz' | nawk -F_ '$3="*"' OFS=_

Hi vgersh,

Thanks!! for your quick solution it worked with this kind of pattern.........(had to replace nawk by awk....it seems nawk is not supported on my linux)

But it slipped my mind :o to mention that I have to first search for YYYYMMDD like pattern meaning it is not certain that it'll be at the 3rd position only. It could be anywhere in the string like

 
instead of 99_*_YYYYMMDD_SRC.txt.tar.gz
it could be 99_*_SRC_YYYYMMDD.txt.tar.gz
or YYYYMMDD_99_*_SRC.txt.tar.gz
 

I mean to say it could be anywhere in the string.....sorry for the confusion :frowning:

-dips

assuming the minimum number of digits in the pattern to be 'blanked out' is 4:

nawk '{sub("[0-9][0-9][0-9][0-9][0-9]*","*")}1' myFile
OR
sed 's#[0-9]\{4,\}#*#' myFile

Hi vgersh,

I tried your code

 
echo 45_*_YYYYMMDD_SRC.txt.tar.gz | sed 's#[0-9]\{4,\}#*#'

but it returned the same string

 
45_*_YYYYMMDD_SRC.txt.tar.gz

-dips

I assumed the 'YYYYMMDD' was actually a mnemonic for the date/time numeric spec.
If you want to take 'YYYYMMDD' literally - that's even easier...:

sed 's#YYYY[^_.]*#*#' myFile
1 Like

Hi Vgersh,

Thanks for the solution :)!!

Can you please explain this sed command ?

-dips

sed 's#YYYY[^_.]*#*#'

Anything that starts with 'YYYY' followed by anything, BUT '_' or '.' is to be replaced with a '*'.

Hi Vgersh,

I think I have an additional requirement involving extraction of YYYYMMDD parttern, however what you assumed about this being a mnemonic for data/time data is now true. Can you help me out again?

 
For e.g.
i/p1: 45_*_20111019_SRC.txt.tar.gz -> o/p: 20111019
i/p2: 201110192359_45_*_SRC.txt.tar.gz -> o/p: 201110192359
i/p3: 45_*_SRC_2011101923.txt.tar.gz -> o/p: 2011101923

I attempted the below code:

echo 45_*_20111019_SRC.txt.tar.gz | sed 's/.....\(.\{8\}\)\(.*\)/\1/'

but I know this is in vain because it's too much tied up with the length of the string & the pattern length!!
-dips

you have to assume a minimum length of numbers to determine if it's a date or not. As your dates may differ in length, I assumed the minimum length of 8 - \{8,\} . Notice the trailing comma in the spec. From man ed :

       *    An RE followed by:
              \{m\}
                   Matches exactly m occurrences of the character matched by
                   the RE.
              \{m,\}
                   Matches at least m occurrences of the character matched by
                   the RE.
              \{m,n\}
                   Matches any number of occurrences of the character matched
                   by the RE from m to n inclusive.

considering file dip.txt:

45_*_20111019_SRC.txt.tar.gz
201110192359_45_*_SRC.txt.tar.gz
45_*_SRC_2011101923.txt.tar.gz

the following:

sed 's#^.*_\([0-9]\{8,\}\)[_.]*.*#\1#;s#_.*##' dip.txt

produces:

20111019
201110192359
2011101923

1 Like

Hi Vgersh,

Simply Superb !!:slight_smile: Thanks again.

-dips