Sorting alphanumeric strings without a pattern

Good evening to all!!
I'm facing this problem:

I saved in a txt a list of files name (one txt for every folder):

hello0.jpg
hello1.jpg
hello10.jpg
hello11.jpg
hello12.jpg
hello13.jpg
hello14.jpg
hello15.jpg
hello16.jpg
hello17.jpg
hello18.jpg
hello19.jpg
hello2.jpg
hello20.jpg
hello21.jpg
hello22.jpg
hello23.jpg
hello24.jpg
hello3.jpg
hello4.jpg
hello5.jpg
hello6.jpg
hello7.jpg
hello8.jpg
hello9.jpg

and I would like to sort them using the numbered part.
Using sort -n I get the aforementioned sorting.
The real problem is that I don't have a fixed pattern for files name (otherwise I could use sort with -k).
It could be:

hello0.jpg
helloagain0.jpg
hello1_0.jpg

Only thing I know is that all files in the same folder will have same pattern.

Any help is appreciated!!

One can extract the numeric part, paste them together and sort by newly added column, then get rid if the added column on the output:

$ cat aa
qwe1.asd
zxc3.ewq
wer4.tre
qweqwe2.asd
$
$ cat aa|tr -d '[:alpha:]'|paste aa - | sort -k2,2n | cut -f1
qwe1.asd
qweqwe2.asd
zxc3.ewq
wer4.tre
$ 

Thanks a lot for your super fast reply!!
Does this work even with this filename:

JonnyDouble01_pg001.jpg
JonnyDouble01_pg002.jpg
.....

where there are more than one numeric part (one is the same for all files, the other is the key to sorting...)

---------- Post updated 06-05-12 at 05:05 AM ---------- Previous update was 06-04-12 at 04:01 PM ----------

Using these files, I can't get to sort them:

h0ello0.jpg
h1ello1.jpg
h2ello10.jpg
h3ello11.jpg
h4ello12.jpg
h5ello13.jpg
h6ello14.jpg
h7ello15.jpg
h8ello16.jpg
h9ello17.jpg
h10ello18.jpg
h1ello19.jpg
h12ello2.jpg
h13ello20.jpg
h14ello21.jpg
h15ello22.jpg
h16ello23.jpg
h17ello24.jpg
h18ello3.jpg
h19ello4.jpg
h20ello5.jpg
h21ello6.jpg
h22ello7.jpg
h23ello8.jpg
h24ello9.jpg

That's because they have more than one numeric part.
Only common things among all files is that the are numbered in the end of the name (before the extension).

Only way I found to srt them is to remove the extension (.jpg or whatever), than read one char at a time from the end of each string and stop when the char isn't a number.

That's the starting point of the numeration and I can use sort -k option...

Now I need to find correct commands!:smiley:

A crude way:

sed 's/.*[^0-9]\([0-9]*\)\..*/\1/g' <filename>|paste - <filename>|sort -nk1|awk '{print $2}'

Slightly less crude :wink:

sed 's/.*[^0-9]\([0-9]*\)\./\1 &/' file | sort -n | cut -d" " -f2-

That really helped me!!!!
Thanks a lot to everyone.
Last question: are you aware of any limitation of this approach? i.e. any file name it can't sort correctly?

The file names need to have an extension, or it will not work..
Here is an alternative sed that should work for any pattern, you could try:

sed 's/.*[^0-9]\([0-9][0-9]*\)[^0-9]*$/\1 &/'
# awk '{s=gensub("[^0-9]*([0-9][0-9]*).*","\\1",$0);print s,$0}' infile|sort -n|awk '{print $2}'

@ygemici, that will not work for file names with spaces and/or with multiple number sequences in them... gensub is gawk only..

# awk '{x=$0;gsub("[^0-9]"," ");print $NF,x}' infile|sort -n|awk '{$1="";sub(" ","")}1'

Yup that would not have those shortcomings I think :slight_smile:

:slight_smile: :slight_smile:

Thanks again for your help!!!

I tried with severl filenames but I encountered one problem!

With these files:

03_000a.jpg
03_000b.jpg
03_000c.jpg
03_001.png
03_002.png
03_003.png
03_004-005.png
03_006.png
03_007.png
03_008.png
03_009.png
03_010.png
Credits10.png
Credits11.png

I get this sorting:

03_000a.jpg
03_000b.jpg
03_000c.jpg
03_001.png
03_002.png
03_003.png
03_004-005.png
03_006.png
03_007.png
03_008.png
03_009.png
03_010.png
Credits10.png
03_011.png
Credits11.png
03_012.png
03_013.png

Because of this problem (first numbers are the sorting order).

000 03_000a.jpg
000 03_000b.jpg
000 03_000c.jpg
001 03_001.png
002 03_002.png
003 03_003.png
005 03_004-005.png
006 03_006.png
007 03_007.png
008 03_008.png
009 03_009.png
010 03_010.png
011 03_011.png
012 03_012.png
013 03_013.png
10 Credits10.png
11 Credits11.png

I think the problem is that filenames aren't consistent in this folder.
Is it possible to reproduce the "sort by name" windows option?

Thanks again!!

EDIT: probably here is a solution:

1) get a column of every first character using

sed 's/\(.\{1\}\).*/\1/' file1

2) get a column of incremental numbers just like I already do using

sed 's/.*[^0-9]\([0-9][0-9]*\)[^0-9]*$/\1 &/' file2

(although I need to get only the first column)
3) sort the lines using two keys: first one should be file1, second one should be file2

this way, I can sort twice, once alphabetically, then, keeping the first sorting result, using incremental numbers from files name...
I think this should work, but I don't know how to do it...

Do you know perl...because with perl you can split and store filenames in a hash array and sort them using either the key which can be the name...or the value which would be the numeric part of the filename or on both the key and value...so you have a lot of options to play with in perl...

Sadly, I need plain unix as I'm using it in a Kindle!

what is "plain unix"?

What kind of "plain unix" runs in a kindle?

There is no actual operating system named "UNIX" any more, the same way IEEE doesn't actually manufacture electrical sockets, just define what they ought to look like.

As I stated before, I'm new to all of this...
All I can tell you is that I can't use perl or bash, only "standard" shell commands.

I'm not trying to be difficult, my question's actually relevant. If you don't know, you should find out -- it's important. Even the 'standard' commands aren't quite the same everywhere, and knowing what extensions you do and don't have may make the difference between a simple, fast solution and a slow, difficult one.

uname -a should tell you.

Here's the output:

Linux kindle 2.6.31-rt11-lab126 #1 Wed Apr 4 20:41:38 PDT 2012 armv7l GNU/Linux

Okay, you have linux on an embedded system, which probably means your utilities -- close to all of them -- are just busybox. (a swiss-army-knife of a program which can pretend to be dozens or hundreds of different minimal shell commands.) This isn't generic UNIX at all, quite far from it really.

busybox comes with a half-decent version of awk at least so I'll take a look...