pattern searching

Hi,

Can you please help me out here? I am trying develop a search pattern to extract certain words from the two strings below.

I want to extract ericsson_msc_live from the 2 strings and then the date, which is a part of the filename just before the .jar extension.

ericsson_msc_live_bln_western1_20120809.jar

ericsson_msc_live_bln_western_20120711.jar

I am able to extract ericsson_msc from string below using awk and '_live' as the delimiter, but then it doesnt give me the full name so I have to append '_live' as a print statement. Using sed I am able to extract only numbers from the strings but then again it doesnt give me the correct output as western1 is also a part of the file name

Looking forward for some advice

Thanks

Provide the details about the output and it's format, when you are giving the file names as input.
ls -1 *.jar|awk -F "_live" '{print $1"_live"}'

gives me output as ericsson_msc_live

and the below gives me the date (or only numbers in the string)

ls -1 *.jar|awk -F "_live" '{print $2}'|sed "s/[^0-9]//g"|awk '{print substr($0,1,8)}'

but the problem is, the above command gives me this too 12012072, which is not the date because the string contains 'western1_20120809.jar' also

hope that helps

If all files under the directory in the same format u can try

 
for filename in *.jar ; do
basename $filename|sed -e 's/_bln_western1//g'
done

Dear Vidya,

thats the problem..all files are not in the same format. i have mentioned it in the previous post:

ericsson_msc_live_bln_western1_20120809.jar

ericsson_msc_live_bln_western_20120711.jar

then you have to try something like

ls -1 *.jar|awk -F"[_.]" '{print $1"_"$2"_"$3"_"$(NF-1)}'

Hope this is what you are expecting.

for file in ericsson_msc_live_bln_western1_20120809.jar ericsson_msc_live_bln_western_20120711.jar
do
echo "$file" | sed -n 's/\(ericsson_msc_live\).*\([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*/\1\2/p'
done
ericsson_msc_live20120809
ericsson_msc_live20120711

nopes. that doesnt give me the output..basically what I am trying is using the first command I will check the existence of the directory by the outputstring1 and if present I will create a directory with the value of outputstring2

I guess what you have asked and what you want to achieve are two diff things :confused: command which i gave will get u the file names the way you wanted but how u wanna use that is completely at your stake..

If I get it right, you could do something like this

Referencing solution from #6,

for file in *.jar
do
 outstring1=$(echo "$file" | awk -F"[_.]" '{print $1,$2,$3}' OFS=_)
 outstring2=$(echo "$file" | awk -F"[_.]" '{print $(NF-1)}') 
done
 

Efficiency is highly depends on what you actually trying to achieve. It's possible that everything could be done in single commands without so-many external calls/pipes.
Please post your requirements in details with expected input and output sample.

basically, I have tens of files like the below:
$ls -1 *.jar
antispam_live_dec_20120727.jar
credit_transfer_live_bln_20120711.jar
credit_transfer_live_bln_20120711_1.jar
credit_transfer_live_bln_20120712.jar
credit_transfer_live_bln_20120712_1.jar
credit_transfer_live_bln_20120713.jar
credit_transfer_live_bln_20120713_1.jar
credit_transfer_live_bln_20120714.jar
credit_transfer_live_bln_20120714_1.jar
credit_transfer_live_bln_20120715.jar
credit_transfer_live_bln_20120715_1.jar
credit_transfer_live_bln_20120716.jar
credit_transfer_live_bln_20120716_1.jar
credit_transfer_live_bln_20120717.jar
credit_transfer_live_bln_20120717_1.jar
credit_transfer_live_bln_20120718.jar
credit_transfer_live_bln_20120718_1.jar
credit_transfer_live_bln_20120719.jar
credit_transfer_live_bln_20120719_1.jar
credit_transfer_live_bln_20120720.jar
credit_transfer_live_bln_20120720_1.jar
credit_transfer_live_bln_20120721.jar
credit_transfer_live_bln_20120721_1.jar
credit_transfer_live_bln_20120722.jar
credit_transfer_live_bln_20120722_1.jar
credit_transfer_live_bln_20120723.jar
credit_transfer_live_bln_20120723_1.jar
credit_transfer_live_bln_20120724.jar
credit_transfer_live_bln_20120724_1.jar
credit_transfer_live_bln_20120725.jar
credit_transfer_live_bln_20120725_1.jar
credit_transfer_live_bln_20120726.jar
ericsson_msc_live_bln_central1_20120725_1.jar
ericsson_msc_live_bln_central1_20120726.jar
ericsson_msc_live_bln_central1_20120726_1.jar
ericsson_msc_live_bln_central1_20120727.jar
ericsson_msc_live_bln_central1_20120727_1.jar
ericsson_msc_live_bln_central1_20120728.jar

......

what i want is:

1 to extract xx_xx_live from the string

2 store it in a variable A

3 check whether a directory of the same variable is created or not, if not the create it, and then go inside

4 extract the date part from the original string, that would be the following part: (credit_transfer_live_bln_20120726.jar) and store it in a variable B

5 check if a directory of same date as B is present under A or not, if not, then create it and move the original string file under that date.

If it helps, I can provide the whole list of file names

Thanks a lot everyone

#!/bin/ksh

## Operate on .jar files in the current directory.
for filename in *.jar
do
  ## Extract the directory name and date using regular expressions.
  dirname=$( print "$filename" | sed -n "s/^\(.*_live\).*/\1/p" )
  dirdate=$( print "$filename" | sed -n "s/.*_\([0-9]\{8\}.*\).jar$/\1/p" )

  ## If directory does not exist, create it.
  if [[ ! -d "$dirname" ]]; then
    mkdir $dirname
  fi

  ## Directory exists so test for date directory.
  if [[ ! -d "$dirname/$dirdate" ]]; then
    mkdir "$dirname/$dirdate"
  fi

  ## date dir exists so move file.
  mv "$filename" "$dirname/$dirdate"

done

exit 0

@gary_w

i tried out the individual sed expressions on the command line just to see the output and here is what I am getting the for the date part:

$for i in `ls -1 .jar`;do echo $i|sed -n "s/.*_\([0-9]\).jar$/\1/p" ;done
20120727
20120711
1
20120712
1
20120713
1
20120714
1
20120715
1
20120716
1
20120717
1
20120718
1
20120719
1
20120720
1
20120721
1
20120722
1
20120723
1
20120724
1
20120725
1
20120726
1
20120727
1
20120728
1
20120729
1
20120730
1
20120731
1
20120801
1
20120802
1
20120803
20120804
1
20120805
1
20120806
20120807
1
20120808
1
20120809
20120721
20120729
20120801
20120804
20120711
1
20120712
1
20120713

:frowning:

Check out the amended version that allows for the different naming convention.

sed -n "s/.*_\([0-9]\{8\}.*\).jar$/\1/p"

now it print out extra characters after the date

$ for i in `ls -1 *.jar`;do echo $i|sed -n "s/.*_\([0-9]\{8\}.*\).jar$/\1/p" ;done|more
20120727
20120711
20120711_1
20120712
20120712_1
20120713
20120713_1
20120714
20120714_1
20120715
20120715_1
20120716
20120716_1
20120717
20120717_1
20120718
20120718_1
20120719
20120719_1
20120720
20120720_1
20120721
20120721_1
20120722
20120722_1
20120723
20120723_1
20120724
20120724_1
20120725
20120725_1
20120726

sorry for troubling you :o

---------- Post updated at 10:17 PM ---------- Previous update was at 10:12 PM ----------

cant I say extract the date part from the last characters of the string:

awk lets me substr characters from the start of the string..how about substr characters from the end of the string, with the last character as the first character

I thought you wanted that extra character. Well all you have to do is tweak the regular expression:

sed -n "s/.*_\([0-9]\{8\}\).*\.jar$/\1/p"

thank you thank you thank you..please explain to me the regular expression..i have been trying to get hold of this since long..but have only managed to bang my head

From left to right:

sed -n "s/.*_\([0-9]\{8\}\).*\.jar$/\1/p"
sed   - Stream editor
-n    - Suppress normal action which is to print all lines
"s/   - Start search/replace pattern
.     - Match any character
*     - and any number of any character
_     - followed by an underscore
\(    - Start first group (this portion of the search pattern, if found, can be referred to later)
[0-9] - followed by a single number in the range of 0-9
\{8\} - followed by 8 instances of the previous pattern (a number)
\)    - End the first group
.*    - followed by any character and any number of any character
\.    - followed by a period (escaped with a backslash since the period
        has special meaning in a regex)
jar   - followed by the string "jar"
$     - followed by the end of the line
/     - Start replace pattern
\1    - Refers to the first group as defined inside the \(  \)
/     - End replace string
p     - Don't replace, but print the match instead
"     - End the sed pattern
1 Like

ok!, so basically what comes inside the

gets printed on the screen. great thanks for letting it out so cleanly. thanks a lot

Technically it gets printed to standard output (STDOUT). That's how you are able to capture it in a variable.

You're welcome!