Extract variables from filenames and output to file

malandisa · April 13, 2011, 9:36am

I need some help. I have a list of files (thousands) and would like to extract some variables from the file name and save that to a file

The list of files look like:

I am trying to write the following script but I am stuck at how I can get thevariables 'doy' and 'yr' from each file and then combine into one file with two columns "yr doy' then write that to a file

And the required output is

Please any help and ideas will be highly appreciated
Thank you

kevintse · April 13, 2011, 9:45am

A single line of perl will do, let's say you save your file names to the file named 'data.txt'

 perl -pe  's/^.(\d{4})(\d{3}).*$/$1 $2/' data.txt

malandisa · April 13, 2011, 9:58am

Hi kevintse,

Thank you! That indeed does what I need. I don't fully understand what the fine detail of how this line does the job.

I have one extra question, what if I want to add another column to the output which is some value extracted from inside the file, would this be done within a command like this??

Thank you

kevintse · April 13, 2011, 10:24am

What I used in the one liner is called Regular Expression.

perl -pe  's/^.(\d{4})(\d{3}).*$/$1 $2/' data.txt

The slashes are just separators, the syntax is: s/Regex/replacement/
^ matches the very start of the string(a line in data.txt).
. matches a single character(any character).
(\d{4}) \d represents number from 0 to 9, 4 in the curly braces means the pattern will match 4 numbers, the outer braces capture the 4 numbers(yr you want), this is called a group in Regular Expression.
(\d{3}) is roughly the same as the previous pattern.
$1 $2 prints the 1st and 2nd groups captured by the pattern.

For your last question, yes, Perl can easily achieve what you want but may not be as easy as the previous command, shouldn't be complicated though

malandisa · April 13, 2011, 2:06pm

Hi kevintse,

Thank you for that explanation and the time you have taken to help. Much appreciated. I am still figuring out how I will add another column there. One way I am thinking is if it is possible to put the 'filename' directly into this line command so that the command does not read the 'filename' from the file called data.

I thought something like this would work...

so that the output is

then I can add another variable as a third column

Any advice will be appreciated
Thank you

kevintse · April 14, 2011, 1:37am

You may pipe the filename to the perl script:

echo 'e20012110129_xform_azisum_mlt_2.25_0.50.txt' | perl -pe 's/^.(\d{4})(\d{3}).*$/$1 $2/'