sort entire line based on part of the string

gurpal2000 · July 31, 2008, 11:01am

hey gurus,

my-build1-abc
my-build10-abc
my-build2-abc
my-build22-abc
my-build3-abc

basically i want to numerically sort the entire lines based on the build number. I dont zero pad the numbers because thats "how it is"

sort -n won't work because it starts from the beginning.

so how would you do that in a one-liner. sed/awk maybe?

Cheers

aigles · July 31, 2008, 11:10am

sort -n -k1.9 inputfile

> cat gurpal.dat
my-build1-abc
my-build10-abc
my-build2-abc
my-build22-abc
my-build3-abc
> sort -n -k1.9 gurpal.dat
my-build1-abc
my-build2-abc
my-build3-abc
my-build10-abc
my-build22-abc
>

Jean-Pierre.

gurpal2000 · July 31, 2008, 11:48am

aigles:

sort -n -k1.9 inputfile

> cat gurpal.dat
my-build1-abc
my-build10-abc
my-build2-abc
my-build22-abc
my-build3-abc
> sort -n -k1.9 gurpal.dat
my-build1-abc
my-build2-abc
my-build3-abc
my-build10-abc
my-build22-abc
>

Jean-Pierre.

great! so does the 1.9 mean start at position 9? what if it is variable? do we need some regex?

thanks

gurpal2000 · July 31, 2008, 4:02pm

ok so here is the real question. Essentially i'm listing labels from source control and format looks like

project_subproject_major.minor.build

example

myproject_abcdef_0.0.1
myproject_abcdef_0.0.2
myproject_abcdef_0.0.9
myproject_abcdef_0.0.10
myproject_abcdef_1.1.1
myproject_abcdef_1.1.10
myproject_abcdef_1.1.2
myproject_abcdef_1.0.1
myproject_abcdef_1.0.10

the major.minor.build numbers can be any magnitude. How does sort -k deal with things like this? So the output should be the same text but ordered numerically by major.minor.build.

That would be excellent if you could hint

aigles · July 31, 2008, 5:24pm

I haven't found a simplest solution :

awk -F_ '{v=$NF; gsub(/\./," ",v) ; print v,$0}' inputfile |
sort -n  -k1,1 -k2,2 -k3,3 | 
cut -d' ' -f4-

Inputfile:

myproject_abcdef_0.0.1
myproject_abcdef_0.0.2
myproject_abcdef_0.0.9
myproject_abcdef_0.0.10
myproject_abcdef_1.1.1
myproject_abcdef_1.1.10
myproject_abcdef_1.1.2
myproject_abcdef_1.0.1
myproject_abcdef_1.0.10

Output:

myproject_abcdef_0.0.1
myproject_abcdef_0.0.2
myproject_abcdef_0.0.9
myproject_abcdef_0.0.10
myproject_abcdef_1.0.1
myproject_abcdef_1.0.10
myproject_abcdef_1.1.1
myproject_abcdef_1.1.2
myproject_abcdef_1.1.10

Jean-Pierre.

gurpal2000 · July 31, 2008, 6:49pm

aigles:

I haven't found a simplest solution :

awk -F_ '{v=$NF; gsub(/\./," ",v) ; print v,$0}' inputfile |
sort -n  -k1,1 -k2,2 -k3,3 | 
cut -d' ' -f4-

Inputfile:

myproject_abcdef_0.0.1
myproject_abcdef_0.0.2
myproject_abcdef_0.0.9
myproject_abcdef_0.0.10
myproject_abcdef_1.1.1
myproject_abcdef_1.1.10
myproject_abcdef_1.1.2
myproject_abcdef_1.0.1
myproject_abcdef_1.0.10

Output:

myproject_abcdef_0.0.1
myproject_abcdef_0.0.2
myproject_abcdef_0.0.9
myproject_abcdef_0.0.10
myproject_abcdef_1.0.1
myproject_abcdef_1.0.10
myproject_abcdef_1.1.1
myproject_abcdef_1.1.2
myproject_abcdef_1.1.10

Jean-Pierre.

now THAT is clever!

tpltp · July 31, 2008, 11:35pm

Hello aigles,
Could you elaborate that the option of sort -k?
I mean in your two solutions, the first one used -k1.9 and the second one used the -k1,1, are there any difference between them? I can't sort the file using -k1,9 in the first example.

And In the manual of sort, the option K should take the comma not the dot.

Annihilannic · August 1, 2008, 12:50am

From the sort man page:

           -k keydef   The keydef argument defines a restricted sort key.
                       The format of this definition is

                            field_start[type][,field_end[type]]
...
                       A field_start position specified by m.n is
                       interpreted to mean the nth character in the mth
                       field. 
...

tpltp · August 1, 2008, 2:35am

annihilannic:

From the sort man page:

   -k keydef   The keydef argument defines a restricted sort key.
   The format of this definition is

   field_start[type][,field_end[type]]
...
   A field_start position specified by m.n is
   interpreted to mean the nth character in the mth
   field. 
...

Thank you, I missed the part after the option.

gurpal2000 · August 1, 2008, 2:58am

annihilannic:

From the sort man page:

   -k keydef   The keydef argument defines a restricted sort key.
   The format of this definition is
 
   field_start[type][,field_end[type]]
...
   A field_start position specified by m.n is
   interpreted to mean the nth character in the mth
   field. 
...

strange i'm using sort 6.9 on mandriva 2008 and this doesnt appear in the man pages:

[server~]$ sort --version
sort (GNU coreutils) 6.9
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software. You may redistribute copies of it under the terms of
the GNU General Public License <The GNU General Public License - GNU Project - Free Software Foundation (FSF)>.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.

gurpal2000 · August 1, 2008, 2:59am

ah i see it. it is mentioned a bit differently

   POS  is  F[.C][OPTS], where F is the field number and C the character position in the field; both are
   origin 1.  If neither -t nor -b is in effect, characters in a field are counted from the beginning of
   the  preceding whitespace.  OPTS is one or more single-letter ordering options, which override global
   ordering options for that key.  If no key is given, use the entire line as the key.