find duplicate string in many different files

I have more than 100 files like this:

SVEAVLTGPYGYT	2	
SVEGNFEETQY	10	
SVELGQGYEQY	28	
SVERTGTGYT	6	
SVGLADYNEQF	21	
SVGQGYEQY	32	
SVKTVLGYEQF	2	
SVNNEQF	       12	
SVRDGLTNSPLH	3	
SVRRDREGLEQF	11	
SVRTSGSYEQY	17	
SVSVSGSPLQETQY	78	
SVVHSTSPEAF     59
SVVPGNGYT	75	


There is a string in $1 and its frequency in $2.
I have two questions. How can I merge these file into one file, which include all the string in $1 and each frequency in different fields?

How can I find the same string included in the 100 files, and output its each frequency?

I can do this using awk between two files, but failed to deal with so many.

Thank you!

What is your desired output and what have you tried so far? Why did it work with two files, but not with 100 files?

something along these lines to search for a 'string' - not tested - should get you started.

awk -v str2find='SVELGQGYEQY'  '{
  fileA[$1]=($1 in fileA)?fileA[$1] FS FILENAME:FILENAME
  freq[$1,FILENAME]=$2
}
END {
  if ( str2find in fileA) {
     print "string", "file", "frequency"
     n=split(fileA[str2find], tmp, FS)
     for (i=1;i<=n;i++)
        print str2find, tmp, freq[str2find,tmp]
  }
}' my100filesGohere
1 Like

I use the following code when dealing with two files.

awk 'NR==FNR{A[$1]=$0; next} $1=A[$1]' file1 FS=, OFS='\t' file2

The file like this is what I want: (in case the strings shown here are duplicate between these files)

string         file1       file2     file3    file4    file5   ...............
SVERTGTGYT	6	      4           5
SVGLADYNEQF	21	      3           7
SVGQGYEQY	32	      5           6
SVKTVLGYEQF	2	      4          9
SVNNEQF	       12	      4           6
awk '
{if (length(fns[FILENAME])<1) {
  fn[fc++]=FILENAME;
  fns[FILENAME]=FILENAME;
 }
 wd[$1]=$1;
 ws[$1 FILENAME]=$2;
}
END{
 printf("%-20s", "string");
 for (i=0; i<fc; i++) {
  printf("%-20s", fn);
 }
 print;
 for (i in wd) {
  printf("%-20s", i);
  for (j=0; j<fc; j++) {
   printf("%-20s", ws[i fn[j]]);
  }
  print
 }
}' file1 file2 file3 ...
1 Like