Unix Cut or Awk from 'Right TO Left'

limamichelle · December 13, 2010, 10:56am

Hello,
I want to get the User Name details of a user from a file list.

This list can be in the format:

FirstName_MiddleName1_LastName_ID
FirstName_LastName_ID
FirstName_MiddleName1_MiddleName2_LastName_ID

What i want it to return is FirstName_MiddleName1_LastName of a user.

I know in awk i can for instance get the ID this way

 echo FirstName_MiddleName1_LastName_ID | awk -F_ '{print $(NF -0)}'

Is there a way to get the user-name details (without the ID?) - maybe a reverse cut?

methyl · December 13, 2010, 11:00am

One way:

echo FirstName_MiddleName1_LastName_ID | sed -e "s/_ID\$//g"

FirstName_MiddleName1_LastName

vgersh99 · December 13, 2010, 11:04am

echo FirstName_MiddleName1_LastName_ID | sed 's/_[^_][^_]*$//'
OR
echo FirstName_MiddleName1_LastName_ID | nawk -F_ 'NF--&&$1=$1' OFS=_

limamichelle · December 13, 2010, 11:26am

Hello, Thanks for your help. but unfortunately im adding this to a perl script... and im getting syntax errors...

anyway to convert this to perl?
the $ARGV[0] would be FirstName_MiddleName1...etc_LastName_ID

i tried:

my $tgt_user = print $ARGV[0] | sed 's/_[^_][^_]*$//;

Thanks

durden_tyler · December 13, 2010, 12:14pm

Assuming that your data looks like this -

$
$ cat f35
FirstName_MiddleName1_LastName_ID
FirstName_LastName_ID
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName_ID
$

If you want to remove the "_ID" at the end of each line, then you could do this -

$
$ perl -plne 's/_ID$//' f35
FirstName_MiddleName1_LastName
FirstName_LastName
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName
$

But that doesn't work if you have something other than "ID" at the end, for example "_XY".

To remove the "_" followed by (any) two characters at the end, do this -

$
$ perl -plne 's/_..$//' f35
FirstName_MiddleName1_LastName
FirstName_LastName
FirstName_LastName
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName
$

That doesn't work if you have more than two characters after the "_" at the end, for example "_XYZ".

To remove "" followed by any number of characters other than "" at the end, do this -

$
$ perl -plne 's/_[^_]*$//' f35
FirstName_MiddleName1_LastName
FirstName_LastName
FirstName_LastName
FirstName_LastName
FirstName_MiddleName1_MiddleName2_LastName
$
$

This is a general substitution of which "_ID" is a special case.

Assuming that you want to go for the third one-liner, your Perl program would change like so -

...
my $tgt_user = $ARGV[0];
$tgt_user =~ s/_[^_]*$//;
...

HTH,
tyler_durden

limamichelle · December 13, 2010, 1:30pm

Thanks again,
one last thing, in perl how do i get the Id alone?

i.e.

echo FirstName_LastName_ID | awk -F_ '{print $(NF +0)}'

tx

---------- Post updated at 07:04 PM ---------- Previous update was at 06:48 PM ----------

I solved it!

my $ID = $ARGV[0];
$ID =~ s/[$_]*_//;

---------- Post updated at 07:30 PM ---------- Previous update was at 07:04 PM ----------

Its not solved

This is the Error i get:
Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ]*_/ at user_spl.pl line 39.

it works in the command line when i type:

perl -plne 's/[$_]*_//' users.txt

but not when i run the program:

my $ID = $ARGV[0];
$ID =~ s/[$_]*_//;

durden_tyler · December 13, 2010, 5:41pm

limamichelle:

...
This is the Error i get:
Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ]*_/ at user_spl.pl line 39.

it works in the command line when i type:
perl -plne 's/[$_]*_//' users.txt
but not when i run the program:
my $ID = $ARGV[0];
$ID =~ s/[$_]*_//;

You really should have a look at the "Special Variables" section of the (FREE!!) online Perl documentation -

http://perldoc.perl.org/perlvar.html

Or have a look at how file processing is done in the book "Learning Perl".

First, the one-liner:

perl -plne 's/[$_]*_//' users.txt

$_ is Perl's default input and pattern-searching space. When you use the perl command-line interpreter with a file name and those options (plne), it opens the file for you, loops through each record and assigns each record to "$_".

So, with a "users.txt" file like the following -

$
$ cat users.txt
FirstName_MiddleName1_LastName_ID
FirstName_LastName_ID
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName_ID
$

a "print $_" will print each record as expected:

$
$ perl -lne 'print $_' users.txt
FirstName_MiddleName1_LastName_ID
FirstName_LastName_ID
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName_ID
$
$

The option "p" will always print the line, so you can avoid typing "print". Like so -

$
$ perl -plne '$_' users.txt
FirstName_MiddleName1_LastName_ID
FirstName_LastName_ID
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName_ID
$

But using it with the "s///" operator may give you unexpected results -

$
$ perl -plne 's/$_//' users.txt
 
 
 
 
 
$

Which is why your regular expression works, but in an unintuitive way. The (square) brackets always match a single literal or a single character in a range.

So, [9] matches the digit "9", and [w-z] matches a single ASCII character in the range "w" through "z". The expression [$_] matches the first, and only the first, character of the input space. So, this -

$
$ perl -plne 's/[$_]//' users.txt
irstName_MiddleName1_LastName_ID
irstName_LastName_ID
irstName_LastName_XY
irstName_LastName_XYZ
irstName_MiddleName1_MiddleName2_LastName_ID
$

substitutes the first character by a zero-length string. You do not need [$_] to match a single character. A dot "." is a regular expression for a single character. So, this -

$
$ perl -plne 's/.//' users.txt
irstName_MiddleName1_LastName_ID
irstName_LastName_ID
irstName_LastName_XY
irstName_LastName_XYZ
irstName_MiddleName1_MiddleName2_LastName_ID
$

works exactly the same way.

And so, the regex [$_]_ is essentially the same as ._ and both mean zero or more characters all the way up to the last underscore character ("_").

Thus, your one-liner should've been like so -

$
$ perl -plne 's/.*_//' users.txt
ID
ID
XY
XYZ
ID
$
$

In a Perl program, @ARGV is an array of all input parameters, and if you pass a file name to it, then $ARGV[0] is the name of that file. This is not the same as $_.

Unlike the command-line perl interpreter, file handling (opening, looping, assigning to $_) is not done for you over here. You will have to do all that explicitly.

So, if your Perl program looks like this -

$
$ cat users.pl
#!perl -w
print "\$_ = |",$_,"|\n";
$

You won't see anything printed when you pass the file name as a parameter -

$
$ perl users.pl users.txt
Use of uninitialized value $_ in print at users.pl line 2.
$_ = ||
$

That because the file name "users.txt" is assigned to $ARGV[0] and that has nothing to do with $_.

You could check the value of the argument array (@ARGV) to get a better idea of what's happening:

$
$
$ cat users.pl
#!perl -w
print "My argument array \@ARGV is ==>|@ARGV|<==\n";
$
$
$ perl users.pl users.txt
My argument array @ARGV is ==>|users.txt|<==
$
$ perl users.pl users.txt users1.txt users2.txt users3.txt
My argument array @ARGV is ==>|users.txt users1.txt users2.txt users3.txt|<==
$
$

Perl "flattens" the argument array and shows its elements as a string separated by single spaces. But you should know that in the second case, $ARGV[0] = "users.txt", $ARGV[1] = "users1.txt", $ARGV[2] = "users2.txt" and $ARGV[3] = "users3.txt".

Maybe printing it in a loop is clearer -

$
$ cat users.pl
#!perl -w
print "My argument array \@ARGV is as follows:\n";
for ($i=0; $i<=$#ARGV; $i++) {
  print "Element $i = $ARGV[$i]\n";
}
$
$ perl users.pl users.txt users1.txt users2.txt users3.txt
My argument array @ARGV is as follows:
Element 0 = users.txt
Element 1 = users1.txt
Element 2 = users2.txt
Element 3 = users3.txt
$
$

So this piece of code -

my $ID = $ARGV[0];
$ID =~ s/[$_]*_//;

assigns the first element of the array @ARGV i.e. the file name parameter to $ID, but the next invocation to s/// operator tries to substitute the following regex:

s/[]*_//

$_ is uninitialized, remember? And so Perl throws that error message.

(You could initialize $_ and use it here, but that's beside the point.)

So what you want to do after you obtain the file name is, open the file, loop through the records, let Perl assign the record value to $_ implicitly and then call the s/// operator. Like so -

$
$
$ cat users.pl
#!perl -w
my $ID = $ARGV[0];                                 # the file name is assigned to $ID now
open (DATA, "<", $ID) or die "Can't open $ID: $!"; # open the file for reading and associate it with the file handler DATA
while (defined ($_ = <DATA>)) {                    # assign the next record to $_ and while it is defined, then
  $_ =~ s/.*_//;                                   # remove all characters up to the last "_" in current record
  print $_;                                        # and print the resultant current record
}                                                  # until there are no more records
close (DATA) or die "Can't close $ID: $!";         # good idea to clean up after ourselves
$
$

And then the Perl program will work as expected -

$
$
$ perl users.pl users.txt
ID
ID
XY
XYZ
ID
$
$

But Perl does a lot of things for you, without you asking for them. And that is true especially for Perl's default variable "$_". So when you say this -

while (<DATA>)

Perl knows you meant this -

while (defined ($_ = <DATA>))

And when you say this -

s/.*_//;

Perl knows you meant this -

$_ =~ s/.*_//;

and so on...

So your program can be shortened to this -

$
$
$ cat users.pl
#!perl -w
my $ID = $ARGV[0];                                 # the file name is assigned to $ID now
open (DATA, "<", $ID) or die "Can't open $ID: $!"; # open the file for reading and associate it with the file handler DATA
while (<DATA>) {                                   # assign the next record to $_ and while it is defined, then
  s/.*_//;                                         # remove all characters up to the last "_" in current record
  print;                                           # and print the resultant current record
}                                                  # until there are no more records
close (DATA) or die "Can't close $ID: $!";         # good idea to clean up after ourselves
$
$
$
$ perl users.pl users.txt
ID
ID
XY
XYZ
ID
$
$
$

HTH,
tyler_durden