parsing filename and grabbing specific string patterns

Hi guys...Wow I just composed a huge post and it got erased as I was logged out automatically
Anyways I hope someone can help me out here.
So the task I'm working on is like this
I have a bunch of files that I care about sitting in a directory say $HOME/files

Now my job is to go and loop over all these files and then grab parts of each filename and check if the parsed string exists in a parameter that is
passed to the script say $4

The filename parsing has the following rules

1: testfile_string1_string2_20100232to201130203.csv
parsed value should be "string1_string2"

2: testfile_string3_string4.csv
parsed value should be "string3_string4.csv"

3: testfile_string5_string6_string7_20113203to20110423.csv
parsed value should be "string5_string6_string7"

So the idea here is to grab the part of the filename after the first "_" upto a digit "20103232" eg 1 , eg3
Or the end of the string in case numeric values aren't there eg 2

I've been trying all day but I'm not able to find an answer for this
I'ld appreciate if one of you experts would help me out.

Thanks
Ruka

Try:

cd "$HOME/files"
for i in *.csv
do
  i=${i#testfile_}
  i=${i%%_[0-9]*}
  i=${i%.csv}
  case $4 in 
    *$i*) echo "$i"
  esac
done
$ echo testfile_string1_string2_string3_20100232to201130203.csv |sed 's/^[^_]*_\(.*\)_[^_]*/\1.csv/'

string1_string2_string3.csv

Hi Scrutinizer, rdcwayx

Thanks for the input guys..:slight_smile: I really appreciate it

Scrutinizer

I think the way you hacked through the string pattern was awesome...
My only question is the use of case ie the variable $4 is actually the superset of $i so Im not sure how the case $4 check actually works...it does...i tried it but isnt it like checking if a superset is present in its subset? how does that work?
eg $4 = "string1_string2_string3_string4_string5_string6_string7"
and the value of i that is the file names after the parsing are eg
$i = "string1_string2"
or $i= "string3_string4" etc...so how does the case of $4 work ?

Thanks for the solution..It works....Im just trying to understand this

rdcwayx

The solution works for the example you gave but doesnt work for the other cases....Thanks a lot though :slight_smile:

You guys are great....
I hope I can help some one like this someday.
Ruk

fixed.

$ echo testfile_string3_string4.csv |sed 's/^[^_]*_//;s/_[0-9][^_]*\./\./'
string3_string4.csv

$ echo testfile_string1_string2_string3_20100232to201130203.csv |sed 's/^[^_]*_//;s/_[0-9][^_]*\./\./'
string1_string2_string3.csv

If $4 matches *string* (any number of characters (*), followed by string and then any number of characters (second *)), then it follows that $4 is a superset of string.

Thanks for explaining.

That made it much simpler...Im used to other programming languages so it felt odd to me :slight_smile:

Thanks for your help..

@rdcwayx

Thanks for your help too....