Sorting alphanumeric strings without a pattern

silver18 · June 22, 2012, 1:29pm

Thanks a lot for everything!!

I don't know how to do it, but sorting with two keys should work...

Corona688 · June 22, 2012, 1:29pm

You know, Windows doesn't seem to have the magical ability to consistently handle inconsistently named files either. I'm always running into corners where it guesses wrong too.

Could you not just rename the inconsistent files and solve the problem permanently?

Corona688 · June 22, 2012, 1:32pm

I'm not sure what you mean by that.

I can think of ways that may end up working with the filenames you've posted, but it will choke on lots of other inconsistent filenames. A general solution would need the ability to actually analyze the names for the pattern, and figure out what to do with exceptions...

silver18 · June 22, 2012, 2:48pm

All I have now is a way of sorting files IF the folder contains only one name pattern.

Sure I can provide user input to guide the sorting process, but I'd prefer not to do it.

Why sorting using two keys isn't a good idea?
Getting the first character and use it to sort means I can sort files with a two steps process.
(this means file names differs from the first character: 03_XXX, 10_XXX, a_XXX... But, as 99,5% of files I need to sort are like that, this could work...)

Chubler_XL · June 24, 2012, 11:57pm

Sorting by two keys:

sed 's/\(.*[^0-9]\)\([0-9][0-9]*\)/\1 \2 &/' infile | sort -k1,1 -k2,2n|cut -d" " -f3-

silver18 · June 25, 2012, 9:21am

Thanks a lot for your reply!
I came to the very same solution!
Now, I need something more...
is there a way of dinamically add keys to sort?
I mean:
I can count the number of columns in a file. I need, then, to sort that file using:

where N is the number of columns.
Is there a way, maybe a while loop, to add keys to sort command?

Thanks!!!

drl · June 25, 2012, 9:50am

Hi.

This solution-comparison will not be useful if one must use busybox or similar, but for people who can acquire and make available codes, the msort solution seems simple and produces a result that appeals to me:

#!/usr/bin/env bash

# @(#) s3	Demonstrate sed-sort-cut with msort-hybrid.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C sed sort cut msort

FILE=${1-data5}
pl " Input file $FILE:"
head -20 $FILE

# Results sed, sort, cut.
sed 's/\(.*[^0-9]\)\([0-9][0-9]*\)/\1 \2 &/' $FILE |
sort -k1,1 -k2,2n |
cut -d" " -f3- > f1

# Results, msort hybrid.
msort -q -l -n 1,1 -c hybrid $FILE > f2

pl " Compare (sed,sort,cut) and msort-hybrid:"
paste <( cat f1 ) <( cat f2 ) |
align -g5

exit 0

producing:

% ./s3

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
sed GNU sed version 4.1.5
sort (GNU coreutils) 6.10
cut (GNU coreutils) 6.10
msort 8.44

-----
 Input file data5:
03_003.png
03_009.png
03_007.png
03_006.png
03_004-005.png
03_010.png
03_000a.jpg
03_002.png
03_001.png
03_000b.jpg
Credits10.png
03_008.png
03_000c.jpg
Credits11.png

-----
 Compare (sed,sort,cut) and msort-hybrid:
03_000a.jpg		03_000a.jpg
03_000b.jpg		03_000b.jpg
03_000c.jpg		03_000c.jpg
03_001.png		03_001.png
03_002.png		03_002.png
03_003.png		03_003.png
03_006.png		03_004-005.png
03_007.png		03_006.png
03_008.png		03_007.png
03_009.png		03_008.png
03_010.png		03_009.png
03_004-005.png		03_010.png
Credits10.png		Credits10.png
Credits11.png		Credits11.png

See link in previous post for some details about msort.

Personally, I think I would seriously consider renaming files consistently (possibly assisted by symbolic links) and be done with it.

Best wishes ... cheers, drl

silver18 · June 25, 2012, 3:35pm

Thanks a lot, but, unfortunately, I must use busybox...:wall:

I'm now trying to do something different...

This is the input:

xxxholic02_c07_000.png
xxxholic02_c07_001.png
xxxholic02_c07_002.png
xxxholic02_c07_003.png
xxxholic02_c07_004.png

And I would like to have this output:

x 02 07 000 xxxholic02_c07_000.png
x 02 07 001 xxxholic02_c07_001.png
x 02 07 002 xxxholic02_c07_002.png
x 02 07 003 xxxholic02_c07_003.png
x 02 07 004 xxxholic02_c07_004.png

This means:

getting the first character of each line
getting every numeric field of each line

I managed to do it this way:

sed 's/\(.\{1\}\).*/\1/' infile > outfile1
sed "s/[^0-9]*\([0-9][0-9]*\)[^0-9]*/ \1/g;s/^[^0-9][^0-9]*$/-1/" infile | sed 's/^ *//' > outfile2

than, I used the paste command to join outfile1 and outfile2...
Problem is, I don't have paste on Kindle...

Is there any way of doing those sed commands appending results to original file so that I don't need to paste them?

Thanks!!!

Chubler_XL · June 25, 2012, 11:58pm

... Withdrawn (my sed skills aren't quite up to this).

Scrutinizer · June 26, 2012, 1:08am

Try this in this particular case.

sed 's/^\(.\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)/\1 \2 \3 \4 &/' infile

If it is really fixed format you could get away with this:

sed 's/\(.\).......\(..\)..\(..\)_\(...\)/\1 \2 \3 \4 &/' infile

silver18 · June 26, 2012, 4:26am

It works fine, but it gets only 3 numeric fields.
Is there a way to get every numeric field separated by a space (after getting the first character)?
Thanks again and sorry for all this mess!

Scrutinizer · June 26, 2012, 4:34am

Try:

awk '{p=$0; gsub(/[^0-9]+/,FS); print substr(p,1,1),$0,p}' infile

Chubler_XL · June 26, 2012, 6:55pm

Ok been working on my sed solution try this (assumption no ":" or "@" in your filenames):

e.sed:

s/^\(.\).*/\1@&:&/
:a
s/@\([0-9 ]*\)[^0-9: ][^0-9: ]*\([0-9]*\)/@\1 \2 /
ta
s/@[^0-9]*:/@ -1:/
s/[@:]/ /g
s/  */ /g

$ sed -f e.sed infile
x 02 07 000 xxxholic02_c07_000.png
x 02 07 001 xxxholic02_c07_001.png
x 02 07 002 xxxholic02_c07_002.png
x 02 07 003 xxxholic02_c07_003.png
x 02 07 004 xxxholic02_c07_004.png