File manipulation

ajay41aj · January 13, 2009, 12:22pm

Here is the sample input file:

6A subject1 Bldg1
6A subject1 Bldg2 Yes
6A subject1 Bldg3
8D subject2 Bldg1
8D subject2 Bldg2
8D subject2 Bldg3 Yes
E4 subject3 Bldg1
E4 subject3 Bldg2
E4 subject3 Bldg3
4F subject4 Bldg1
4F subject4 Bldg2
4F subject4 Bldg3 Yes
9F subject5 Bldg1 Yes
9F subject5 Bldg2
9F subject5 Bldg3 Yes
32 subject6 Bldg1
32 subject6 Bldg2

From this, I want two output files:

The entries that has "Yes" (only the first three fields.)

The output must look like this:

6A subject1 Bldg2
8D subject2 Bldg3
4F subject4 Bldg3
9F subject5 Bldg1
9F subject5 Bldg3

This will be straight forward, I think.

I want the entries where none of the field1 of the "yes" entries exist (and just column 1 and 2.)

The output must look like this:

E4 subject3
32 subject6

(All entries with 6A, 8D, 4F, 9F will get deleted as they have at least one "yes")

How to do this using a shell script?

Thanks,
Ajay

Ikon · January 13, 2009, 12:41pm

What have you tried so far?

This looks like school work. NO?

ajay41aj · January 13, 2009, 1:00pm

No, the data is a sample only, but that is what I am trying to achieve.

I could do the first one with no problem (as it is straight forward) and played around with awk, sed, etc, with no success with the second part.

funksen · January 14, 2009, 8:19am

I'm sure this can be done with a one-liner, but this one works, not really performance optimized and a lot of temp files

put your input in input.txt

cp input.txt list.txt
grep Yes list.txt | tee yes.txt | while read ONE TWO THREE
                                        do grep -v $TWO list.txt > list1.txt
                                        cp list1.txt list.txt
                                done

cat list.txt >> yes.txt
awk '{ $NF ="";print}' yes.txt | uniq
rm list.txt yes.txt list1.txt

cfajohnson · January 14, 2009, 9:01am

awk '/Yes/ { print $1, $2, $3 > "yesfile"; ++x[$1]; next }
!x[$1] { print $1, $2 > "nofile" }' "$INPUTFILE"

funksen · January 14, 2009, 9:25am

nofile produces the wrong output

6A subject1
8D subject2
8D subject2
E4 subject3
E4 subject3
E4 subject3
4F subject4
4F subject4
32 subject6
32 subject6

in my case, input as mentioned in first post

cfajohnson · January 14, 2009, 9:34am

awk '/Yes/ { print $1, $2, $3 > "yesfile"; ++x[$1]; next }
!x[$1]++ {print $1, $2 }' "$FILE"

ajay41aj · January 14, 2009, 1:30pm

thanks a lot guys...

it is of great help.

cheers,

ajay

ajay41aj · January 14, 2009, 1:38pm

i get the following output for the awk script

6A subject1
8D subject2
E4 subject3
4F subject4
32 subject6

summer_cherry · January 15, 2009, 5:22am

perl may be a good choice

#!/usr/bin/perl
my (@first,@second);
sub _exist{
	for($i=0;$i<=$#second;$i++){
		return 1 if $second[$i] eq $_[0];
	}
	return 0;
}
open FH,"<a.txt";
while(<FH>){
	my @tmp=split(" ",$_);
	if ($tmp[3] eq "Yes"){
		$hash{$tmp[0]}=1;
		push @first,join " ",@tmp[0,1,2];
	}
	push @arr,$_;
}
close FH;
foreach (@arr){
	my @tmp=split(" ",$_);
	my $tmp=join " ",@tmp[0,1];
	push @second, $tmp if (! exists $hash{$tmp[0]}) && ( _exist($tmp) != 1);
}

open FH,">first.txt";
print FH join "\n",@first;
close FH;
open FH,">second.txt";
print FH join "\n",@second;
close FH;