Rearranging into new columns (awk?)

coryvp · July 26, 2012, 12:38pm

Hi experts,

I've used several solutions from this forum to delete nonsense and rearrange data in the project file I'm working on. I'm hoping you guys can give me some tips on further rearranging the data (I've seen a few solutions by searching, but one specific item has me stumped, which is only using the date-time fields once).

I have a text file that looks like this:

2MTEMP
2012071512
73.28
2MTEMP
2012071513
75.65
MSLP
2012071512
1014.60
MSLP
2012071513
1015.11
2MDEW
2012071512
65.52
2MDEW
2012071513
67.33

I'd like to rearrange it into columns to look like:

TIME           2MTEMP      MSLP     2MDEW  
2012071512     73.28      1014.60   65.52
2012071513     75.65      1015.11   67.33

Each field contains its own separate date/time field (eg 2012071512) but I only need it to strip out the valid times once and then use only the field name (2MTEMP,MSLP,etc) and the value following to line up as given above.

I'd really appreciate your help!

Scrutinizer · July 26, 2012, 5:21pm

Hi, try:

awk '
  {
    h=$1
    getline d
    getline v
    Date[d]
    Header[h]
    Value[h,d] = v
  }

  END {
    s="TIME\t"
    for (h in Header) {
      s = s "\t" h
    }
    print s
    for (d in Date) {
      s = d
      for (h in Header) {
        s = s "\t" Value[h,d]
      }
      print s
    }
  }
' infile

TIME		MSLP	2MDEW	2MTEMP
2012071512	1014.60	65.52	73.28
2012071513	1015.11	67.33	75.65

coryvp · July 28, 2012, 8:17am

Thanks for your help!! I have one question; I noticed when this operated on a longer dataset, it confused some of the date-time groups (for instance, it inserted 2012071618 after 2012071612, while skipping those in between--the last two digits of each group are the UTC hour, by the way) rather than going in chronological order. Is there something I could add to tell it the specific date-time format so it will keep them in order? Thanks again!

Scrutinizer · July 28, 2012, 1:33pm

To preserve the correct order:

If you are using gawk:

WHINY_USERS=1 gawk ' ....

Or you could pipe everything through sort -n and add the header later.
Otherwise a change would be required in the script...

RudiC · July 28, 2012, 1:43pm

--- lovely!

painless · July 29, 2012, 7:59am

if there are only these 3 columns of values: 2MTEMP, MSLP, 2MDEM, try these:

list="2MTEMP
2012071512
73.28
2MTEMP
2012071513
75.65
MSLP
2012071512
1014.60
MSLP
2012071513
1015.11
2MDEW
2012071512
65.52
2MDEW
2012071513
67.33"
echo "${list}"|awk '{
  header=$1;
  getline datetime
  getline value
  date_list[datetime]
  value_list[datetime,header]=value
}
END{
    for (t in date_list)
        print t"\t"value_list[t,"MSLP"]"\t"value_list[t,"2MDEW"]"\t"value_list[t,"2MTEMP"]
}'|sort