xshang
October 10, 2012, 2:59pm
1
I have more than 100 files like this:
SVEAVLTGPYGYT 2
SVEGNFEETQY 10
SVELGQGYEQY 28
SVERTGTGYT 6
SVGLADYNEQF 21
SVGQGYEQY 32
SVKTVLGYEQF 2
SVNNEQF 12
SVRDGLTNSPLH 3
SVRRDREGLEQF 11
SVRTSGSYEQY 17
SVSVSGSPLQETQY 78
SVVHSTSPEAF 59
SVVPGNGYT 75
There is a string in $1 and its frequency in $2.
I have two questions. How can I merge these file into one file, which include all the string in $1 and each frequency in different fields?
How can I find the same string included in the 100 files, and output its each frequency?
I can do this using awk between two files, but failed to deal with so many.
Thank you!
What is your desired output and what have you tried so far? Why did it work with two files, but not with 100 files?
something along these lines to search for a 'string' - not tested - should get you started.
awk -v str2find='SVELGQGYEQY' '{
fileA[$1]=($1 in fileA)?fileA[$1] FS FILENAME:FILENAME
freq[$1,FILENAME]=$2
}
END {
if ( str2find in fileA) {
print "string", "file", "frequency"
n=split(fileA[str2find], tmp, FS)
for (i=1;i<=n;i++)
print str2find, tmp, freq[str2find,tmp]
}
}' my100filesGohere
1 Like
xshang
October 10, 2012, 3:19pm
4
I use the following code when dealing with two files.
awk 'NR==FNR{A[$1]=$0; next} $1=A[$1]' file1 FS=, OFS='\t' file2
The file like this is what I want: (in case the strings shown here are duplicate between these files)
string file1 file2 file3 file4 file5 ...............
SVERTGTGYT 6 4 5
SVGLADYNEQF 21 3 7
SVGQGYEQY 32 5 6
SVKTVLGYEQF 2 4 9
SVNNEQF 12 4 6
rdrtx1
October 10, 2012, 4:45pm
5
awk '
{if (length(fns[FILENAME])<1) {
fn[fc++]=FILENAME;
fns[FILENAME]=FILENAME;
}
wd[$1]=$1;
ws[$1 FILENAME]=$2;
}
END{
printf("%-20s", "string");
for (i=0; i<fc; i++) {
printf("%-20s", fn);
}
print;
for (i in wd) {
printf("%-20s", i);
for (j=0; j<fc; j++) {
printf("%-20s", ws[i fn[j]]);
}
print
}
}' file1 file2 file3 ...
1 Like