I am using awk
to search $5
of the "input" file using the "list" file as the search criteria. So if the id in line 1 of "list" is found in "search" then it is counted in the ids found
. However, if the line in "list" is not found in "search", then it is outputted as is missing
. The awk
below runs and works for most but the ids with a ;
in them are missing but can be manually found in the file. I am not sure where to add this though. Thank you :).
input
chrX 48933012 48933134 chrX:48933012-48933134 PRAF2;WDR45
chrX 48934078 48934193 chrX:48934078-48934193 PRAF2;WDR45
chrX 48934293 48934422 chrX:48934293-48934422 PRAF2;WDR45
chr17 42426522 42426680 chr17:42426522-42426680 GRN;L01117
chr17 42426783 42426929 chr17:42426783-42426929 GRN;L01117
chr17 30814628 30815572 chr17:30814628-30815572 AK307275;CDK5R1
chr2 234668923 234669807 chr2:234668923-234669807 UGT1A1;UGT1A10;UGT1A3;UGT1A4;UGT1A5;UGT1A6;UGT1A7;UGT1A8;UGT1A9
chr2 234675669 234675821 chr2:234675669-234675821 UGT1A1;UGT1A10;UGT1A3;UGT1A4;UGT1A5;UGT1A6;UGT1A7;UGT1A8;UGT1A9
chr12 9221325 9221448 chr12:9221325-9221448 A2M
chr12 9222330 9222419 chr12:9222330-9222419 A2M
list
PRAF
GRN
CDK5R1
UGT1A1
A2M
current output
1 ids found
CDK5R1 is missing
PRAF is missing
GRN is missing
UGT1A1 is missing
desired output
5 ids found
awk '
NR==FNR { lookup[$0]++; next }
($5 in lookup) { seen[$5]++ }
END {
print length(seen)" ids found";
for (id in seen) delete lookup[id];
for (id in lookup) print id " is missing"
}' list input > count
awk with error
awk '
> NR==FNR { lookup[$0]+|;++; next }
> ($5 in lookup) { seen[$5]++ }
> END {
> print length(seen)" ids found";
> for (id in seen) delete lookup[id];
> for (id in lookup) print id " is missing"
> }' list2 input > count
awk: cmd. line:2: NR==FNR { lookup[$0]+|;++; next }
awk: cmd. line:2: ^ syntax error
awk: cmd. line:2: NR==FNR { lookup[$0]+|;++; next }
awk: cmd. line:2: ^ syntax error