Data formation

srikanth38 · January 4, 2013, 5:20am

I have a data like as follows, I need to format it as shown in as below. Request you to help me here ?

I/P

aa|3|1
aa|4|2
bb|3|1
bb|4|1
cc|3|26
cc|4|1

O/P

aa|3|1|4|2
bb|3|1|4|1
cc|3|26|4|1

Thanks,
Srikanth

Don_Cragun · January 4, 2013, 8:55am

Please clarify:

Do you want sets of two adjacent lines with the same value in the 1st field combined into a single line?
Do you want sets of two adjacent lines to be combined no matter what is in the 1st field?
Do you want all lines in your input with the same first field combined into a single line (even if they are not adjacent)?
Are there always 3 fields in input lines, or do you want all fields combined no matter how many fields there are?

srikanth38 · January 7, 2013, 7:22am

Please find my answers.

Do you want sets of two adjacent lines with the same value in the 1st field combined into a single line? Yes
Do you want sets of two adjacent lines to be combined no matter what is in the 1st field? No
Do you want all lines in your input with the same first field combined into a single line (even if they are not adjacent)? No
Are there always 3 fields in input lines, or do you want all fields combined no matter how many fields there are? Yes

Don_Cragun · January 7, 2013, 8:04am

The response Yes doesn't say which of the two specified behaviors you want implemented. The following should work either way but is more complex than is needed if there are always 3 input fields per line.

awk 'BEGIN {FS = OFS = "|"}
NR % 2 {o = $0          # This is the 1st line in a pair of lines.
        key = $1
        next    
}                       
{       if(key != $1) { # This is the 2nd line in a pair of lines.
            printf("1st field on line %d (%s) != 1st field on line %d (%s)\n",
                NR - 1, key, NR, $1) 
            exit 1  
        }       
        for(i = 2; i <= NF; i++) o = o OFS $i
        print o 
}' input_file

NOTE: On Solaris systems, use /usr/xpg4/bin/awk or nawk instead of awk.

Fundix · January 7, 2013, 10:53am

A Perl script (maybe a little bit too long) but doing the job :

Input file :

aa|3|1
aa|4|2
bb|3|1
bb|4|1
cc|3|26
cc|4|1
aa|8|5
aa|7|9

Script :

#!/usr/bin/perl -w
use strict;

my $cur_dir = $ENV{PWD};
my $filename = $cur_dir."/file008";
my ($record,@fields,%hash,$key,$flg);

open(FILE,"<$filename") or die"open: $!";

while( defined( $record = <FILE> ) ) {
  chomp $record;
  @fields=split(/\|/,$record);

  # New key and not the 1st one ?
  if( (defined($key)) && ($key ne $fields[0]) ) {
    $flg=1;
  } else {
    $flg=0;
  }

  # Processing new key or same key
  if( $flg == 1) {
    # new key printing previous one and processing new one
    print "$key$hash{$key}\n";
    delete( $hash{$key} );
    $key=$fields[0];
    $hash{$key}="\|$fields[1]\|$fields[2]";
  } else {
    # processing existing key
    $key=$fields[0];
    $hash{$key}.="\|$fields[1]\|$fields[2]";
  }
}

# Printing last key if needed
if( $flg == 0) {
  print "$key$hash{$key}\n";
}

close(FILE);

Output :

aa|3|1|4|2
bb|3|1|4|1
cc|3|26|4|1
aa|8|5|7|9

Corona688 · January 7, 2013, 2:06pm

awk -F"|" '{ T=$1 ; sub(/^[^|]+[|]/, ""); A[$1]=A[$1]"|"$0 } END { for(X in A) print substr(A[X], 2); }' inputfile