awk/nawk question to format a file

kumar04 · March 1, 2009, 11:51am

Hi,

I am new to awk/nawk, needs help.

I want to merge the rows having emplid attribute same into a single row in the following file. In actual this kind of file will have around 50k rows.

Here is my input file

id|emplid|firstname|dep|lastname
1|001234|test|1001|1
2|002345|test|1032|2
3|001234|test|1020|1
4|123456|test|1044|4

If we see closly,lines 1 & 3 (as 001234 matches) are same but dep has different values.
I want to merge 1 & 3 lines into one line like following

id|emplid|firstname|dep|lastname
1|001234|test|1001 1020|1
2|002345|test|1032|2
3|123456|test|1044|4

Essentially I am trying to combine the rows and attribute dep where emplid is same or matching in another row(s).

Can you pl. help me how can I do in awk/nawk.

Please don't hesitate to ask if it needs more explanation.
Thanks in advance for your help.

Regards,

vgersh99 · March 1, 2009, 1:36pm

There're plenty of similar threads - please use the 'Search' function first.
Please do come back if you have specific implementation question.

summer_cherry · March 1, 2009, 10:05pm

hi , seems perl is a little bit easier to address your issue.

use strict;
my ($n,%hash)=(1);
open FH,"<a.txt";
while(<FH>){
  chomp;
	my @tmp=split("[|]",$_);
	if ( not exists $hash{$tmp[1]}){
		$hash{$tmp[1]}->{SEQ}=$n;
		$hash{$tmp[1]}->{VAL}=$_;
		$n++;
	}
	else{
		my @t=split("[|]",$hash{$tmp[1]}->{VAL});
		my $pre=join "|",@t[0..2];
		my $mid=$t[3]." ".$tmp[3];
		my $post=$t[4];
		$hash{$tmp[1]}->{VAL}=$pre."|".$mid."|".$post;
	}
}
close FH;
for my $key (sort { $hash{$a}->{SEQ} <=> $hash{$b}->{SEQ} } keys %hash){
	print $hash{$key}->{SEQ}."|".substr($hash{$key}->{VAL},index($hash{$key}->{VAL},"|")+1),"\n";
}

kumar04 · March 2, 2009, 4:50pm

Thank you for your valuable inputs. I 'll try the scripts. regards

kumar04 · March 3, 2009, 1:12pm

Hello Summer Cherry,

Its really a great help!! Appreciate from my heart. You are great !!

-Kumar

ripat · March 3, 2009, 2:22pm

Don't know about performance but awk solution seems more terse than perl:

awk -F'|' '{a[$2] = a[$2] " " $4} END {for (i in a) {nb += 1; print nb, i, a}}' file

Although id numbering may not be what you expect.

Franklin52 · March 3, 2009, 2:35pm

Another approach:

awk -F "|" '
NR==FNR{a[$2]=a[$2]?a[$2]" "$4:$4;next}
FNR==1{print;next}
a[$2]{$4=a[$2];a[$2]="";$1=++c;print}
' OFS="|" file file

Regards

kumar04 · March 10, 2009, 1:53pm

Hi,

The above given perl code works fine. Can any one please explain me the following lines from the above perl code.

if ( not exists $hash{$tmp[1]}){
for my $key (sort { $hash{$a}->{SEQ} <=> $hash{$b}->{SEQ} } keys %hash){
print $hash{$key}->{SEQ}."|".substr($hash{$key}->{VAL},index($hash{$key}->{VAL},"|")+1),"\n";

Appreciate your help,

Regards
-Kumar