Sorting alphanumeric strings without a pattern

Thanks a lot for everything!!

I don't know how to do it, but sorting with two keys should work...

You know, Windows doesn't seem to have the magical ability to consistently handle inconsistently named files either. I'm always running into corners where it guesses wrong too.

Could you not just rename the inconsistent files and solve the problem permanently?

I'm not sure what you mean by that.

I can think of ways that may end up working with the filenames you've posted, but it will choke on lots of other inconsistent filenames. A general solution would need the ability to actually analyze the names for the pattern, and figure out what to do with exceptions...

All I have now is a way of sorting files IF the folder contains only one name pattern.

Sure I can provide user input to guide the sorting process, but I'd prefer not to do it.

Why sorting using two keys isn't a good idea?
Getting the first character and use it to sort means I can sort files with a two steps process.
(this means file names differs from the first character: 03_XXX, 10_XXX, a_XXX... But, as 99,5% of files I need to sort are like that, this could work...)

Sorting by two keys:

sed 's/\(.*[^0-9]\)\([0-9][0-9]*\)/\1 \2 &/' infile | sort -k1,1 -k2,2n|cut -d" " -f3-

Thanks a lot for your reply!
I came to the very same solution!
Now, I need something more...
is there a way of dinamically add keys to sort?
I mean:
I can count the number of columns in a file. I need, then, to sort that file using:

where N is the number of columns.
Is there a way, maybe a while loop, to add keys to sort command?

Thanks!!!

Hi.

This solution-comparison will not be useful if one must use busybox or similar, but for people who can acquire and make available codes, the msort solution seems simple and produces a result that appeals to me:

#!/usr/bin/env bash

# @(#) s3	Demonstrate sed-sort-cut with msort-hybrid.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C sed sort cut msort

FILE=${1-data5}
pl " Input file $FILE:"
head -20 $FILE

# Results sed, sort, cut.
sed 's/\(.*[^0-9]\)\([0-9][0-9]*\)/\1 \2 &/' $FILE |
sort -k1,1 -k2,2n |
cut -d" " -f3- > f1

# Results, msort hybrid.
msort -q -l -n 1,1 -c hybrid $FILE > f2

pl " Compare (sed,sort,cut) and msort-hybrid:"
paste <( cat f1 ) <( cat f2 ) |
align -g5

exit 0

producing:

% ./s3

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
sed GNU sed version 4.1.5
sort (GNU coreutils) 6.10
cut (GNU coreutils) 6.10
msort 8.44

-----
 Input file data5:
03_003.png
03_009.png
03_007.png
03_006.png
03_004-005.png
03_010.png
03_000a.jpg
03_002.png
03_001.png
03_000b.jpg
Credits10.png
03_008.png
03_000c.jpg
Credits11.png

-----
 Compare (sed,sort,cut) and msort-hybrid:
03_000a.jpg		03_000a.jpg
03_000b.jpg		03_000b.jpg
03_000c.jpg		03_000c.jpg
03_001.png		03_001.png
03_002.png		03_002.png
03_003.png		03_003.png
03_006.png		03_004-005.png
03_007.png		03_006.png
03_008.png		03_007.png
03_009.png		03_008.png
03_010.png		03_009.png
03_004-005.png		03_010.png
Credits10.png		Credits10.png
Credits11.png		Credits11.png

See link in previous post for some details about msort.

Personally, I think I would seriously consider renaming files consistently (possibly assisted by symbolic links) and be done with it.

Best wishes ... cheers, drl

Thanks a lot, but, unfortunately, I must use busybox...:wall:

I'm now trying to do something different...

This is the input:

xxxholic02_c07_000.png
xxxholic02_c07_001.png
xxxholic02_c07_002.png
xxxholic02_c07_003.png
xxxholic02_c07_004.png

And I would like to have this output:

x 02 07 000 xxxholic02_c07_000.png
x 02 07 001 xxxholic02_c07_001.png
x 02 07 002 xxxholic02_c07_002.png
x 02 07 003 xxxholic02_c07_003.png
x 02 07 004 xxxholic02_c07_004.png

This means:

  • getting the first character of each line
  • getting every numeric field of each line

I managed to do it this way:

sed 's/\(.\{1\}\).*/\1/' infile > outfile1
sed "s/[^0-9]*\([0-9][0-9]*\)[^0-9]*/ \1/g;s/^[^0-9][^0-9]*$/-1/" infile | sed 's/^ *//' > outfile2

than, I used the paste command to join outfile1 and outfile2...
Problem is, I don't have paste on Kindle...

Is there any way of doing those sed commands appending results to original file so that I don't need to paste them?

Thanks!!!:slight_smile:

... Withdrawn (my sed skills aren't quite up to this).

Try this in this particular case.

sed 's/^\(.\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)/\1 \2 \3 \4 &/' infile

If it is really fixed format you could get away with this:

sed 's/\(.\).......\(..\)..\(..\)_\(...\)/\1 \2 \3 \4 &/' infile

It works fine, but it gets only 3 numeric fields.
Is there a way to get every numeric field separated by a space (after getting the first character)?
Thanks again and sorry for all this mess!

Try:

awk '{p=$0; gsub(/[^0-9]+/,FS); print substr(p,1,1),$0,p}' infile

Ok been working on my sed solution try this (assumption no ":" or "@" in your filenames):

e.sed:

s/^\(.\).*/\1@&:&/
:a
s/@\([0-9 ]*\)[^0-9: ][^0-9: ]*\([0-9]*\)/@\1 \2 /
ta
s/@[^0-9]*:/@ -1:/
s/[@:]/ /g
s/  */ /g
$ sed -f e.sed infile
x 02 07 000 xxxholic02_c07_000.png
x 02 07 001 xxxholic02_c07_001.png
x 02 07 002 xxxholic02_c07_002.png
x 02 07 003 xxxholic02_c07_003.png
x 02 07 004 xxxholic02_c07_004.png