Sorting based on Multiple columns

Hi,

I have a requirement whereby I have to sort a flat file based on Multiple Columns (similar to ORDER BY Clause of Oracle). I am getting 10 columns in the flat file and I want the file to be sorted on 1st, 3rd, 4th, 7th and 9th columns in ascending order. The flat file is pipe seperated.
Any help will be highly appreciated.

Thanks,
Dharmesh

There is a good man-page for your question :

man sort

It does it all :slight_smile:

Regs David

Hi !

Try to use this command :

sort +1 +3 +4 +7 +9 file_to_be_sorted

witt

From witt

Hi !

Try to use this command :

sort +1 +3 +4 +7 +9 file_to_be_sorted
------------------------------------------------------------------------------------

here sort +1 means sort from the 2nd field, we need to use something like

sort +0 -1 +2 -3 +4 -5 +6 -7 +8 -9 filename

Just try out and let me know the result, I am also telling based on the property of the sort utility, not too sure.

Cheers
JK

hi ,
the commands
sort +1 +3 +4 +7 +9 file_to_be_sorted
and
sort +0 -1 +2 -3 +4 -5 +6 -7 +8 -9 filename
sort on the forst field only

Well I am not too sure what you exactly want the output to be but sort +1 -2 +3 -4 will sort the output based on 2nd field and if second field is same it will see the next specified field in this case it is 4th field.

I had given the examples below

# cat junk [This is a junk file which contains data]
5 3 2 9
3 4 1 7
5 2 3 1
6 1 3 6
1 2 4 5
3 1 2 3
5 2 2 3
# sort +1 -2 +3 -4 junk [sort based on the 2nd and 4th field]
3 1 2 3
6 1 3 6
5 2 3 1
5 2 2 3
1 2 4 5
5 3 2 9
3 4 1 7
# sort +1 -2 junk [sort based on the 2nd field]
3 1 2 3
6 1 3 6
1 2 4 5
5 2 2 3
5 2 3 1
5 3 2 9
3 4 1 7
# sort +1 +3 junk [See here it is same as sort +1 it checks for the 3rd field if the 2nd field is same, it does not go for the 4the field ]
3 1 2 3
6 1 3 6
5 2 2 3
5 2 3 1
1 2 4 5
5 3 2 9
3 4 1 7
# sort +1 junk
3 1 2 3
6 1 3 6
5 2 2 3
5 2 3 1
1 2 4 5
5 3 2 9
3 4 1 7
#

If this is not what you wanted, post your data and the expected output whatever you want, we will look into that

Cheers
JK

1705614|FNP|2|99|U|1/20/2009 0:00:00
1705614|MSN|1|99|C|1/2/2009 0:00:00

I want the above data to be sorted based on 0 / 4 / 3 / 2 fields (assume above data in abc.txt file)

sort -t '|' +0n +4 +3n +2nr abc.txt
1705614|MSN|1|99|C|1/2/2009 0:00:00
1705614|FNP|2|99|U|1/20/2009 0:00:00

in case my 4 and 3 fields are same I want to sort by field 2 in descending order, that means if the data is

1705614|MSN|1|99|C|1/2/2009 0:00:00
1705614|FNP|2|99|C|1/20/2009 0:00:00

I want this to be flipped like

1705614|FNP|2|99|C|1/20/2009 0:00:00
1705614|MSN|1|99|C|1/2/2009 0:00:00

with the above sort command it doesn't work...please help

sort -t '|' -k1n -k4n -k3rn -k2nr abc.txt

vgersh,

I'm sorry to say that it doesn't work...I'm trying to be clear here

1705614|FNP|2|99|C|1/20/2009 0:00:00
1705614|MSN|1|88|U|1/2/2009 0:00:00

should be sorted on 0th field i.e 1705614
1705614
Then on 4th field i.e C ( C is should come first)
U
Should both be the 'C' s,
sort on 3rd field i.e 99 (Ascending)
88
Should both be the same numbers,
sort on 2nd field i.e 2 (descending)
1
I hope I am clear...Thanks

Fields and characters within fields are numbered starting with 1

sort -t '|' -k1n,1 -k5,5 -k4n,4 -k3n,3r myFile

Vgersh,

I appreciate your help...Seems working...Thank you so much

vgersh,

Hope I'm not bothering you...one last question for now

1705614|FNP|2|99|C|1/2/2009 0:00:00
1705614|MSN|2|99|C|1/22/2009 0:00:00

Assuming all those previously sorted fields equal < can i sort the above
to get the second row up since the date is recent...

Thanks again...

nawk -F'|' '{n=split($NF, a, "[ /]"); gsub(":", "", a[n]);printf("%d%02d%02d%s%s\n", a[3], a[1], a[2], a[n], OFS $0)}' OFS='|' myFile | sort -rn -k1,1 | cut -d '|' -f2-

sample.txt file
1873072|PAC|2|3|C|1/4/2009 0:00:00
1873072|PA|2|3|U|1/9/2009 0:00:00

nawk -F'|' '{n=split($NF, a, "[ /]"); gsub(":", "", a[n]);print a[3] a[1] a[2] a[n] OFS $0}' OFS='|' sample.txt | sort -rn -k1,1 | cut -d '|' -f2-

gives

1873072|PA|2|3|U|1/9/2009 0:00:00
1873072|PAC|2|3|C|1/4/2009 0:00:00

but In the above case I need the data in reverse, because column 5 should be preferred for 'C', only in case if both are 'C's or 'U's then i want this to be sorted on the last column i.e date

Thanks
-venkat

so why don't you combine the 2 proposed solutions?
You understand what both of them do, don't you?

nawk -F'|' '{n=split($NF, a, "[ /]"); gsub(":", "", a[n]);printf("%d%02d%02d%s%s\n", a[3], a[1], a[2], a[n], OFS $0)}' OFS='|' myFile | sort -t '|' -k2n,2 -k6,6 -k5n,5 -k4n,4r -k1n,1r | cut -d '|' -f2-

Good luck.

vgersh,

that works...Thanks a lot...

Honestly i still do not completely understand the code which is working totally fine....Trying to learn...

I appreciate it...

Not sure If i can contact you for any help needed in this area...:slight_smile:

Thanks
-venkat