Understanding an example of perl map() function

yifangt · September 15, 2017, 12:18pm

Hello,
I have many folders under which there is always a file with the same name, which contains the data I need to process later. A perl oneliner was borrowed

perl -e 'print "gene_id\t", join("\t", map {/(.*)\//; $1} @ARGV),"\n";' *_test.trim/level.csv

to make a header so that each column corresponding to the respective folder to distinguish the same file names for later processing. The directory structure looks like this:

1_test.trim/level.csv
15_test.trim/level.csv
17_test.trim/level.csv
30_test.trim/level.csv
34_test.trim/level.csv
8_test.trim/level.csv

The output is:

gene_id    1_test    15_test    17_test    30_test    34_test    8_test

I had hard time to understand the $1 within the map() function in the oneliner.
I think I understand what the map() and join() functions in perl, but this $1 tripped me quite hard.
(.)\/ is the regex which is to get rid of the .trim/ part, I believe, but then comes the $1. Maybe, the whole part of map {/(.)\//; $1} is doing something that I did not catch.
I appreciate any explanation for me.

Corona688 · September 15, 2017, 1:36pm

map is confusing because it's actually a kind of loop. $1 is the bracketed part of /(.*)\// in this context.

The whole thing means, "For each item in @ARGV[N], do { /(.*)\//; output[N]=$1 }"

Then the whole thing is crammed into a "join" which returns them tab-separated.

I have no idea why it removes the ".trim", the .*\/ is a regex meaning "several of any character, followed by a forward slash". It just stops at the last forward slash in the string ( not the first, because of greedy matching. )

yifangt · September 15, 2017, 2:28pm

Thanks!
Let's skip the @ARGV and the foreach loop which seems not that confusing to me.
I am aware of perl normally omits the default $, so that I dissect the map{} as complex statements because map{/(.*)\//, $; $1} is with the curly bracket {}, not the round ().
Take the first folder "1_test.trim/level.csv" as example. I tried to understand it with two steps.
I am not quite clear with your output[N] = $1.
First is map(/(.*)\//, $_). I thought
$0 = 1_test.trim/level.csv # similar to $0 in awk, not accurate but to give some idea for parsing. So I changed to give the correct part
$1 = 1_test.trim/ #"Not "1_test" at the original post after I struggled with this part that level.csv is skipped/omitted!
Is this correct?

Corona688 · September 15, 2017, 2:58pm

Yes. Perl allows you to call things without the () if you really want to, and they did so here. How it would with them is:

map( {code block}, @ARGV)

So @ARGV is the input array, and {code block} is what it does to every element of the array in turn. It's given @ARGV[N] as its argument, and returns whatever you want to transform it into.

I'm not sure that's valid syntax for perl in general - shoving entire code blocks wherever - especially since that code block isn't executed immediately, but repeatedly called by map(). This seems like special behavior.

The steps in the { } block are:

Match $_ (i.e. @ARGV[N]) against /(.*)\//
Return $1, i.e. the bracketed section matched by the regex
Assign the returned value to output[N] ( implied, done internally by map() )

Once every element is parsed, it's returned as a list into join().

Corona688 · September 15, 2017, 3:01pm

You know what? When I execute this, it does return the ".trim" part. So that's one mystery solved. That trimming is either done before or after perl.

$ perl -e 'print "gene_id\t", join("\t", map {/(.*)\//; $1} @ARGV),"\n";' 1_test.trim/level.csv
gene_id 1_test.trim

$

yifangt · September 15, 2017, 3:56pm

It does print ".trim" ---That's my bad!
Thank you so much!!!!!! $1 is the back reference from the regex.
Modified version gives expected output in post #1.

perl -e 'print "gene_id\t", join("\t", map {/(.*_test).trim\//; $1} @ARGV),"\n";' *_test.trim/level.csv

Wish perldoc had explicitly addressed this.