search & replace pattern

dips_ag · October 18, 2011, 1:25pm

Hi,

My problem is that I have to search a changing pattern and replace it with the wild card char "*"

 
i/p: 99_*_YYYYMMDD_SRC.txt.tar.gz
o/p: 99_*_*_SRC.txt.tar.gz

The problem is that YYYYMMDD pattern is not static. It could be YYYYMMDDHHMI or could be YYYYMMDDHHMISS.

Can someone please help me here?

-dips

vgersh99 · October 18, 2011, 1:36pm

echo '99_*_YYYYMMDD_SRC.txt.tar.gz' | nawk -F_ '$3="*"' OFS=_

dips_ag · October 18, 2011, 1:58pm

Hi vgersh,

Thanks!! for your quick solution it worked with this kind of pattern.........(had to replace nawk by awk....it seems nawk is not supported on my linux)

But it slipped my mind :o to mention that I have to first search for YYYYMMDD like pattern meaning it is not certain that it'll be at the 3rd position only. It could be anywhere in the string like

 
instead of 99_*_YYYYMMDD_SRC.txt.tar.gz
it could be 99_*_SRC_YYYYMMDD.txt.tar.gz
or YYYYMMDD_99_*_SRC.txt.tar.gz

I mean to say it could be anywhere in the string.....sorry for the confusion

-dips

vgersh99 · October 18, 2011, 2:16pm

assuming the minimum number of digits in the pattern to be 'blanked out' is 4:

nawk '{sub("[0-9][0-9][0-9][0-9][0-9]*","*")}1' myFile
OR
sed 's#[0-9]\{4,\}#*#' myFile

dips_ag · October 18, 2011, 2:24pm

Hi vgersh,

I tried your code

 
echo 45_*_YYYYMMDD_SRC.txt.tar.gz | sed 's#[0-9]\{4,\}#*#'

but it returned the same string

 
45_*_YYYYMMDD_SRC.txt.tar.gz

-dips

vgersh99 · October 18, 2011, 2:29pm

I assumed the 'YYYYMMDD' was actually a mnemonic for the date/time numeric spec.
If you want to take 'YYYYMMDD' literally - that's even easier...:

sed 's#YYYY[^_.]*#*#' myFile

dips_ag · October 18, 2011, 3:34pm

Hi Vgersh,

Thanks for the solution :)!!

Can you please explain this sed command ?

-dips

vgersh99 · October 18, 2011, 3:39pm

sed 's#YYYY[^_.]*#*#'

Anything that starts with 'YYYY' followed by anything, BUT '_' or '.' is to be replaced with a '*'.

dips_ag · October 19, 2011, 2:18pm

Hi Vgersh,

I think I have an additional requirement involving extraction of YYYYMMDD parttern, however what you assumed about this being a mnemonic for data/time data is now true. Can you help me out again?

 
For e.g.
i/p1: 45_*_20111019_SRC.txt.tar.gz -> o/p: 20111019
i/p2: 201110192359_45_*_SRC.txt.tar.gz -> o/p: 201110192359
i/p3: 45_*_SRC_2011101923.txt.tar.gz -> o/p: 2011101923

I attempted the below code:

echo 45_*_20111019_SRC.txt.tar.gz | sed 's/.....\(.\{8\}\)\(.*\)/\1/'

but I know this is in vain because it's too much tied up with the length of the string & the pattern length!!
-dips

vgersh99 · October 19, 2011, 3:22pm

dips_ag:

Hi Vgersh,

I think I have an additional requirement involving extraction of YYYYMMDD parttern, however what you assumed about this being a mnemonic for data/time data is now true. Can you help me out again?
 
For e.g.
i/p1: 45_*_20111019_SRC.txt.tar.gz -> o/p: 20111019
i/p2: 201110192359_45_*_SRC.txt.tar.gz -> o/p: 201110192359
i/p3: 45_*_SRC_2011101923.txt.tar.gz -> o/p: 2011101923
I attempted the below code:
echo 45_*_20111019_SRC.txt.tar.gz | sed 's/.....\(.\{8\}\)\(.*\)/\1/'
but I know this is in vain because it's too much tied up with the length of the string & the pattern length!!
-dips

you have to assume a minimum length of numbers to determine if it's a date or not. As your dates may differ in length, I assumed the minimum length of 8 - \{8,\} . Notice the trailing comma in the spec. From man ed :

       *    An RE followed by:
              \{m\}
                   Matches exactly m occurrences of the character matched by
                   the RE.
              \{m,\}
                   Matches at least m occurrences of the character matched by
                   the RE.
              \{m,n\}
                   Matches any number of occurrences of the character matched
                   by the RE from m to n inclusive.

considering file dip.txt:

45_*_20111019_SRC.txt.tar.gz
201110192359_45_*_SRC.txt.tar.gz
45_*_SRC_2011101923.txt.tar.gz

the following:

sed 's#^.*_\([0-9]\{8,\}\)[_.]*.*#\1#;s#_.*##' dip.txt

produces:

20111019
201110192359
2011101923

dips_ag · October 19, 2011, 3:54pm

Hi Vgersh,

Simply Superb !! Thanks again.

-dips