Sort data file by case

Hello,
I'm trying to sort a large data file by the 3rd column so that all of the first words in the 3rd column that are in all uppercase appear before (or after) the non uppercase words. For example,

Data file:

xxx	12345		Rat in the house
xxx	12345		CAT in the hat
xxx	12345		Dog in the yard
xxx	12345		BAT in the belfry
xxx	12345		Duck in the pond
xxx	12345		ANT in the garage

Desired output:

xxx	12345		CAT in the hat
xxx	12345		BAT in the belfry
xxx	12345		ANT in the garage
xxx	12345		Rat in the house
xxx	12345		Dog in the yard
xxx	12345		Duck in the pond

In this example, the ordering of the first three lines of the output file (and the last three lines) doesn't matter, as long as they are separated by case.
Thanks very much!

You could try something like:

tmpfile="NC$$"
> "$tmpfile"
awk -v f="$tmpfile" '
match($3, /[^[:upper:]]/) {
	print > f
	next
}
1' file
cat "$tmpfile"
rm -f "$tmpfile"

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

if file contains the data in your sample input file, it writes exactly what you said you wanted to standard output.

1 Like

Hi.

       -d, --dictionary-order
              consider only blanks and alphanumeric characters

-- man sort

From your sample input file:
Input data file data1:

xxx	12345		Rat in the house
xxx	12345		CAT in the hat
xxx	12345		Dog in the yard
xxx	12345		BAT in the belfry
xxx	12345		Duck in the pond
xxx	12345		ANT in the garage

via:
sort -d -k3,3 data1
to:

xxx	12345		ANT in the garage
xxx	12345		BAT in the belfry
xxx	12345		CAT in the hat
xxx	12345		Dog in the yard
xxx	12345		Duck in the pond
xxx	12345		Rat in the house

Best wishes ... cheers, drl

1 Like

Using sort -d works with the sample data given because ANT, BAT, and CAT (starting with A, B, and C) come before Dog, Duck, and Rat (starting with D and R). If ANT had been Ant and Dog had been DOG , sort -d would not put DOG before Ant .

Hi, Don.

Agreed ... cheers, drl