sorting multi dimensional array

phoeberunner · August 5, 2010, 1:19am

Hi there,

Can someone let me know how to sort the 2 dimensional array below by column 1 then by column 2?

22 55
2222 2230
33 66
44 58
222 240
11 25
22 60
33 45

output:
11 25
22 55
22 60
33 45
33 66
44 58
222 240
2222 2230

Thanks,
Phoebe

clx · August 5, 2010, 2:29am

Do you have this data in file?

sort -kn1 -kn2 file

frans · August 5, 2010, 2:49am

a simple

sort -n file

does the job too

phoeberunner · August 5, 2010, 3:49pm

this is a small part in my awk script. the input data is not in file, but it was manipulated in my script.

Could you suggest a code to solve this?

Thanks!

---------- Post updated at 09:38 AM ---------- Previous update was at 07:34 AM ----------

Is that any other approach can solve this?

---------- Post updated at 12:49 PM ---------- Previous update was at 09:38 AM ----------

nobody happen to this kind of problem? please advice.

clx · August 6, 2010, 2:35am

Please explain the complete scenario.
if possible, show your awk script.

phoeberunner · August 9, 2010, 1:55pm

I have an input file (more than 20K records) as following. The information I'm interested to manipulate are at column 10, 11 and 13.

Column 13: It's item name, item name may appear more than once in the table.
Column 10: A string of "start position" seperated by comma.
Column 11: A string of "end position" seperated by comma.

Start                                       End                                       Item
90098643,90152028,90178267     90098881,90152170,90185093    B1
76540388,76779489,76877692     76540569,76779684,76878102    B2
76540388,76779489,76877692     76540569,76779684,76878102    B2
90098643,90178260                   90098890, 90185093                 B1

I'm would like to find overlapping regions for each item.

Output:

Item    Start                                    End
B1       90098643,90152028,90178260  90098890,90152170,90185093    
B2       76540388,76779489,76877692  76540569,76779684,76878102

By literally, overlapping regions of B1 are (90098643-90098890,90152028-90152170,90178260-90185093)

The following is my script,

{a[$13]++
 start[$13]=start[$13] "" $10
 end[$13]=end[$13] "" $11
}
END {
for (i in a){
        split(start,split_start,",")
        split(end,split_end,",")
        mylen=length(split_start)
        for (k=1;k<mylen;k++){
               if (split_start[k]"+"split_end[k] in mypair) continue
                else {
                        mypair[split_start[k]"+"split_end[k]]++
                        if (k==1) mystring=split_start[k]"+"split_end[k]
                        else mystring=mystring","split_start[k]"+"split_end[k]
                }

        }
        split(mystring,mylist,",")
        asort(mylist)
        count=length(mylist)

#---------finding overlapping regions
        ind=0
        for (z=1;z<=count;z++){
                split(mylist[z],item,"+")
                if (z==1){
                        ind+=1
                        unionlist[ind]=mylist[z]
                }else{
                        split(unionlist[ind],old,"+")
                        aa=old[1]
                        bb=old[2]
                        cc=item[1]
                        dd=item[2]
                        if (cc>bb){
                                ind+=1
                                unionlist[ind]=cc"+"dd
                        }else if (cc>=aa && cc<bb){
                                if (dd>bb){ unionlist[ind]=aa"+"dd}
                                else {unionlist[ind]=aa"+"bb}
                        }
                }
        }
        for (j=1;j<=ind;j++) mystring3=mystring3","unionlist[j]
        print i,length(unionlist),mystring3
        delete mypair
        delete unionlist
        delete mylist
        mystring3=""
}}

From the above script, you would see I store string of regions in this format, (90098643+90098890,90152028+90152170,90178260+90185093).
I want to sort them in ascending order so that it will ease the finding of overlapping region. The function asort() is not appropriate in the following case,

The output sorted region would be,

The order is incorrect as 222+456 should have positioned at last.

I'm sure that the part finding overlapping region is correct, I tested it with another programming language. Now the only problem I have is the sorting part.

Does anyone could suggest me to sort a 2 dimentional array?

Thanks,
phoebe

alister · August 9, 2010, 3:04pm

Those key definitions should be:

 sort -k1n -k2n file

or the equivalent but simpler:

sort -n -k1 -k2 file

---------- Post updated at 03:04 PM ---------- Previous update was at 03:02 PM ----------

No, it does not. It may give you the correct result with this particular dataset and your particular sort implementation, but it is definitely not a correct solution.