Rows to columns

jimmyf · January 5, 2017, 2:45pm

trying to sort an array, only the 4th column is uniq,

have this:

dog cat bird house1
dog cat bird house2
dog cat bird house3
rose daisy tulip house1
rose daisy tulip house3

would like this:

dog cat bird house1 house2 house3
rose daisy tulip house1 missing house3

any help would greatly be appreciated

---------- Post updated at 02:45 PM ---------- Previous update was at 02:39 PM ----------

apology is should say columns to rows

RudiC · January 5, 2017, 3:26pm

Do you think you can adapt this solution from the lower left of this page (under "More UNIX and Linux Forum Topics You Might Find Helpful")?

vgersh99 · January 5, 2017, 3:29pm

a bit on a verbose side, but..
awk -f jim.awk myFile where jim.awk is:

{f4A[$4]; pA[$0]; a[$1 OFS $2 OFS $3]}
END {
  for (i in a) {
    printf("%s", i)
    for (j in f4A)
      printf("%s%s", OFS, ((i OFS j) in pA)? j:"missing")
    printf ORS
  }
}

jimmyf · January 5, 2017, 4:54pm

Brilliant thank you!

Don_Cragun · January 5, 2017, 8:02pm

Note that with the script suggested by vgersh99, the order of the rows in the output need not be in the same order as the input lines and the order of the 4th through the last output columns need not be in the same order as their appearance in the input.

For a given version of awk a for loop of the form for(index_variable in array) the order of processing elements in array [] using that for loop will be consistent as long as no elements are added to or removed from array [] between invocations of that loop, but different implementations of awk may produce different random orders for the sequence of index_variable returned while processing the for loop and within a given version of awk different values stored in an array might produce different output orders relative to the order of appearance of those values in the input.

If it is important to keep the order of the columns in the output the same as the order of appearance of different values in the 4th field in the input file, you could try something like:

awk '
{	if(!($4 in f4A)) {
		f4A[$4]
		of4a[++cf4A] = $4
	}
	pA[$0]
	a[$1 OFS $2 OFS $3]
}
END {	for (i in a) {
		printf("%s", i)
		for(j = 1; j <= cf4A; j++)
			printf("%s%s", OFS, ((i OFS of4a[j]) in pA) ? \
			    of4a[j] : "missing")
		print ""
	}
}' file

which uses the of4a [] array to keep track of the order in which entries were added to the f4a [] array. Another array could be defined to keep track of the input order of the elements of the a[] array if the order of the output rows is important in addition to the order of the output columns.

vgersh99 · January 6, 2017, 9:13am

That's absolutely on the money, Don!

jimmyf · January 9, 2017, 1:56pm

I am in awe of you both.