Hi Masters! I know this problem is quite difficult.
I have two files that looked like this:
File1
mary a b d
anne d e
jane g h
sam a
File2
role1 a d c d
role2 e f g h
role3 a b d
role4 a d e
role5 g h
It will first look into file1 then compare to all entries in file2 regardless of arrangement.
The output should be:
mary role1 role3
anne role 4
jane role2 role5
sam role1 role3 role4
I've tried some other way to do this but unsuccessful. Can it be done in UNIX? Please help. Thanks!
reborg
March 10, 2007, 10:45am
2
This looks a lot like homework, so I will not give an answer.
Post what you have tried already and maybe someone will point out where you are going wrong.
It's not a homework. Actually I tried using excel to manipulate the files but I'm not familiar with macros so I hope someone can help me do it in UNIX.
This will be a repeated process for me and having a right code will really help me a lot. Unfortunately my limited knowledge in UNIX wont help me too.
kharen11:
Hi Masters! I know this problem is quite difficult.
I have two files that looked like this:
File1
mary a b d
anne d e
jane g h
sam a
File2
role1 a d c d
role2 e f g h
role3 a b d
role4 a d e
role5 g h
It will first look into file1 then compare to all entries in file2 regardless of arrangement.
The output should be:
mary role1 role3
anne role 4
jane role2 role5
sam role1 role3 role4
I've tried some other way to do this but unsuccessful. Can it be done in UNIX?
What are the criteria? This script looks for all lines in File2 that contain any of the letters after the name, but the ouput doesn't match yours:
while read name a b c d e
do
[ -n "$a$b$c$d$e" ] &&
result=$( grep ${a:+-e " $a"} ${b:+-e " $b"} ${c:+-e " $c"} \
${d:+-e " $d"} ${e:+-e " $e"} File2 | cut -d ' ' -f1)
set -- $result
echo "$name" $result
done < File1
The files on file1 can go over several rows and columns as well as for file2. For example, mary has a, b and d. It will then search the file2 that contains a, b and d (it should be all not any) which in this case role1 and role3. Then it will begin searching for second entry in file1 who is anne, and so on.
#! /opt/third-party/bin/perl
open(FILE, "<", b) || die ("Unable to open file. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
$fileHash{$split_arr[0]} = $dump;
}
close(FILE);
open(FILE, "<", a) || die ("Unable to open file. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
print "$split_arr[0] ";
foreach my $key ( keys %fileHash ) {
if( $fileHash{$key} =~ m/$dump/ ) {
print "$key ";
}
}
print "\n";
}
close(FILE);
exit 0
I have onething to be clarified,
from the examples provided,
for mary only role3 would match
and not
both role3 and role1
Could you please check and confirm that ?
matrixmadhan:
#! /opt/third-party/bin/perl
open(FILE, "<", b) || die ("Unable to open file. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
$fileHash{$split_arr[0]} = $dump;
}
close(FILE);
open(FILE, "<", a) || die ("Unable to open file. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
print "$split_arr[0] ";
foreach my $key ( keys %fileHash ) {
if( $fileHash{$key} =~ m/$dump/ ) {
print "$key ";
}
}
print "\n";
}
close(FILE);
exit 0
I have onething to be clarified,
from the examples provided,
for mary only role3 would match
and not
both role3 and role1
Could you please check and confirm that ?
Yes, you're right
Can you tell me where in the code is the first file and the second file?
Oops!
I should have made it clear!
#! /opt/third-party/bin/perl
open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
$fileHash{$split_arr[0]} = $dump;
}
close(FILE);
open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
print "$split_arr[0] ";
foreach my $key ( keys %fileHash ) {
if( $fileHash{$key} =~ m/$dump/ ) {
print "$key ";
}
}
print "\n";
}
close(FILE);
exit 0
Wow! You're a genius! It worked! It will save me lots of time in doing my work.
Thanks a lot!
matrixmadhan:
Oops!
I should have made it clear!
#! /opt/third-party/bin/perl
open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
$fileHash{$split_arr[0]} = $dump;
}
close(FILE);
open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
print "$split_arr[0] ";
foreach my $key ( keys %fileHash ) {
if( $fileHash{$key} =~ m/$dump/ ) {
print "$key ";
}
}
print "\n";
}
close(FILE);
exit 0
matrixmadhan -
I find some minor problem on the code. It should search for the exact file pattern.
In my example below:
record1
mary MI_AP
anne MI_RC
record2
role1 MI_AP_REC
role2 MI_AP MI_RC
output of the current code:
mary role1 role2
anne role2
Is it possible that it should only search for exact word so the output will be (below) ..?
mary role2
anne role2
if it has to match exact string change the above to
if( $fileHash{$key} =~ $dump ) {
Try that!
run the below as such and let us know the results
I have modified the code
#! /opt/third-party/bin/perl
open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i <= $#split_arr; $i++ ) {
$dump .= ( $split_arr[$i] . ":");
}
$dump =~ s/:$//;
$fileHash{$split_arr[0]} = $dump;
}
close(FILE);
open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
print "$split_arr[0] ";
foreach my $key ( keys %fileHash ) {
@diff_arr = split(/:/, $fileHash{$key});
for( my $i = 0; $i <= $#diff_arr; $i++ ) {
if( $dump =~ $diff_arr[$i] ) {
print "$key ";
}
}
}
print "\n";
}
close(FILE);
exit 0
Yes it works!!! Thank you so much.
Here's a Python alternative:
for line1 in open("file1"):
line1 = line1.strip().split(" ",1)
f1col = line1[1:][0].split()
print
print line1[0],
for line2 in open("file2"):
count =0
line2 = line2.strip().split(" " ,1)
for item1 in f1col :
for item2 in line2[1:][0].split():
if item1 == item2 : count+=1
if count == len(f1col): print line2[0],
output:
# ./test.py
mary role1 role3
anne role4
jane role2 role5
sam role1 role3 role4
and
# ./test.py
mary role2
anne role2
What if I want to compare the files the other way around? I can reverse the two files but it will give too many fields output based from my current file and i still have to further arrange the data to get my desired output (almost 23,000 rows for file1 record). It should be all values from file2 that is present in file1. (previous request was for all values of files in file2). Both were needed to get the desired output for my records.
File1
mary a b c d
anne e f g h
jane a d e
sam g h
File2
role1 a b
role2 a b c
role3 g h
role4 a e
role5 e f g
Output
mary role1 role2
anne role3 role5
jane role4
sam role3
Would appreciate it if the code is in korn or perl. Thanks in advance...
tested and it works fine!
Try this!
#! /opt/third-party/bin/perl
open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i <= $#split_arr; $i++ ) {
$dump .= ( $split_arr[$i] . ":");
}
$dump =~ s/:$//;
$fileHash{$split_arr[0]} = $dump;
}
close(FILE);
open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");
while( <FILE> ) {
chomp;
@first_arr = split(/ /, $_);
print "$first_arr[0] ";
foreach my $key ( keys %fileHash ) {
@second_arr = split(/:/, $fileHash{$key});
for($i = 0; $i <= $#second_arr; $i++ ) {
$set = 0;
for( $j = 1; $j <= $#first_arr; $j++ ) {
if( $first_arr[$j] =~ $second_arr[$i] ) {
$set = 1;
last;
}
}
last if( $set == 0 )
}
print "$key " if( $set == 1 )
}
print "\n";
}
close(FILE);
exit 0
matrixmadhan:
run the below as such and let us know the results
I have modified the code
#! /opt/third-party/bin/perl
open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i <= $#split_arr; $i++ ) {
$dump .= ( $split_arr[$i] . ":");
}
$dump =~ s/:$//;
$fileHash{$split_arr[0]} = $dump;
}
close(FILE);
open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
print "$split_arr[0] ";
foreach my $key ( keys %fileHash ) {
@diff_arr = split(/:/, $fileHash{$key});
for( my $i = 0; $i <= $#diff_arr; $i++ ) {
if( $dump =~ $diff_arr[$i] ) {
print "$key ";
}
}
}
print "\n";
}
close(FILE);
exit 0
I found some problem with this one (the reverse of the other). The output repeats.
File1
mary MI_AP MI_RC
anne MI_RC
File2
role1 MI_AP_REC
role2 MI_AP MI_RC
Output of this code:
mary role2 role2
anne role2
Needed output:
mary role2
anne role2
Am i giving you too much problem ? This one (perl) is really new to me.
Code:
#! /opt/third-party/bin/perl
open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i <= $#split_arr; $i++ ) {
$dump .= ( $split_arr[$i] . ":");
}
$dump =~ s/:$//;
$fileHash{$split_arr[0]} = $dump;
}
close(FILE);
open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");
while( <FILE> ) {
chomp;
@split_arr = split(/ /, $_);
my $dump;
for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
$dump .= $split_arr[$i];
}
print "$split_arr[0] ";
foreach my $key ( keys %fileHash ) {
@diff_arr = split(/:/, $fileHash{$key});
for( my $i = 0; $i <= $#diff_arr; $i++ ) {
if( $dump =~ $diff_arr[$i] ) {
print "$key ";
}
}
}
print "\n";
}
close(FILE);
I found some problem with this one (the reverse of the other). The output repeats.
File1
mary MI_AP MI_RC
anne MI_RC
File2
role1 MI_AP_REC
role2 MI_AP MI_RC
Output of this code:
mary role2 role2
anne role2
Needed output:
mary role2
anne role2
Am i giving you too much problem ? This one (perl) is really new to me.
Sorry bout the confusing username above, i forgot my friend was logged in in my PC and i forgot to change user before replying..