Another transposing issue

stevesmith · September 15, 2006, 2:13pm

Hello

I need to sort a file with data such as so it breaks on column 1 and all the data in column 2 is sorted into rows with a unique column 1:
1 5
1 6
1 7
2 3
2 4
3 7
3 0
3 9

So it comes out as:
1 5 6 7
2 3 4
3 7 0 9

I've tried many iterations of nawk but can't get it working!!!

Thanks in advance!

vgersh99 · September 15, 2006, 2:20pm

what have you tried so far?

anbu23 · September 15, 2006, 2:22pm

awk '
{
   if ( arr[$1] == "" )
        arr[$1]=$1 " " $2
   else
        arr[$1]=arr[$1] " " $2
}
END{
   for( key in arr)
        print arr[key]
}
' file

vgersh99 · September 15, 2006, 2:39pm

and how does this accomplish what the OP wants?

vgersh99 · September 15, 2006, 2:46pm

here's one way of sorting by both rows [by $1] and columns within the rows:

nawk -f steve.awk steve.txt | sort -k 1n,1

steve.awk:

function isort(A,n,     i,j,t) {
    for (i = 2; i <= n; i++)
        for (j = i; j > 1 && A[j-1] > A[j]; j--) {
              # swap A[j-1] and A[j]
              t = A[j-1]; A[j-1] = A[j]; A[j] = t
        }
}

{
  arr[$1] = ($1 in arr) ? arr[$1] OFS $2 : $2
}

END {
  for ( i in arr ) {
    n=split(arr, tmpA, OFS)
    isort(tmpA, n)
    printf("%s%s", i, OFS )
    for(j=1; j <= n; j++)
       printf("%s%s", tmpA[j], (j==n) ? "\n" : OFS)
  }
}

anbu23 · September 15, 2006, 3:00pm

Did you try running this code?
This code will give what the output he has given.
if he wants second column to be sorted
sort -kn1 -kn2 file | awk ' ...'

vgersh99 · September 15, 2006, 3:27pm

well..... even assuming fixing the 'sort' options - these syntax is illegal on Sun/Solaris....
Also assuming the OP's sample input file [steve.txt]...

$ sort -n -k 1,1 -k 2,2 steve.txt | nawk -f steve1.awk 
2 3 4
3 0 7 9
1 5 6 7

are you seeing different results?

anbu23 · September 15, 2006, 3:32pm

i dunno the syntax in sun/solaris. i got the same result as yours.

vgersh99 · September 15, 2006, 3:34pm

you mean these results?

2 3 4
3 0 7 9
1 5 6 7

if that's the case, do you think that's what the OP wanted?

ghostdog74 · September 15, 2006, 9:15pm

alternatively in Python:

store = {} 
all = open("datafile.txt").readlines()
for items in all:
 	key,value = items.split()
  	if store.has_key(key):
 		store[key].append(value)
 	else:
 		store[key] = [value]

for i in sorted(store.keys()):
 	print i, ' '.join(store)

Output:

1 5 6 7
2 3 4
3 7 0 9

futurelet · September 15, 2006, 11:34pm

Ruby:

h = Hash.new{[]}
while line = gets
  k,v = line.split
  h[k] <<= v
end
puts h.sort.map{|a| a.join " "}

sayonm · September 16, 2006, 3:52am

for i in `awk '{print $1}' file | uniq`
do
        echo $i `grep $i file | awk '{print $2}'`
done

hope this helps.........

cheers,
sayon

matrixmadhan · September 16, 2006, 4:22am

tried with sample modified input and it would fail,

>echo 1 `grep 1 s | awk '{print $2}'`
5 6 7 2 0

a slight modification to script

for i in `awk '{print $1}' s | uniq`
do
  awk -F" " 'BEGIN{x='$i'} { if( $1=='$i' ) x=x" "$2} END{print x}' s
done

>newoutput
1 5 6 7
2 3 4
3 7 0 9
11 2 0

ghostdog74 · September 16, 2006, 4:30am

your script will be slower compared to one done in memory..just my opinion

anyway, for the sample input data, i got

1 5 6 7
2 3 4
3 3 7 0 9

sayonm · September 16, 2006, 4:48am

some modification in the previous code:-

for i in `awk '{print $1}' file | uniq`
do
        echo $i `grep "^$i " file | awk '{print $2}'`
done

the output is:-

[sayonm@zion ~]$ sh script.sh
1 5 6 7
2 3 4
3 7 0 9
11 2 0

now its working fine ........thanks ghostdog74 and matrixmadhan for pointing out the mistake :o
abt the memory usage....ya i agree this will be a bit heavy...but if the file isnt too big, that wont b much problem...and nowdays almost all use memory >= 256mb , so i dont think that is a problem...
cheers,
sayon