I am trying to create a cronjob
that will run on startup that will look at a list.txt
file to see if there is a later version of a database using database.txt
as the source. The matching lines are written to output
.
$1
in database.txt
will be in list.txt
as a partial match. $2
of database.txt
will also be in list.txt
.
If the output file and the database.txt
match then "all are current", but if a line or lines between the two files does not match the "newer version of line available"
So using the first line in database.txt
as an example, refGene
is a partial match to the text in bold in list.txt
. The $2
between the two files is the same. There may be multiple lines, as in this case, but the dates will always match.
The awk
below seems to find the partial match, but that is as far as I get. Thank you :).
database.txt (always two fields separated by a space, first fields contain the name and the second field is the date)
refGene 20151211
clinvar 20170215
popfreq_all 20150413
dbnsfp 20170123
spidex 20150827
list.txt (file can be variable in length but the name is a partial match in $1 and the date is in $2, file is tab-delimeted)
hg19_clinvar_20130905.txt.gz 20140527 415781
hg19_clinvar_20130905.txt.idx.gz 20140527 73218
hg19_clinvar_20131105.txt.gz 20140527 580838
hg19_clinvar_20131105.txt.idx.gz 20140527 167090
hg19_clinvar_20140211.txt.gz 20140527 694067
hg19_clinvar_20140211.txt.idx.gz 20140527 181049
hg19_clinvar_20140303.txt.gz 20140527 773948
hg19_clinvar_20140303.txt.idx.gz 20140527 182842
hg19_clinvar_20140702.txt.gz 20140712 1111503
hg19_clinvar_20140702.txt.idx.gz 20140712 367271
hg19_clinvar_20140902.txt.gz 20140911 1503198
hg19_clinvar_20140902.txt.idx.gz 20140911 389069
hg19_clinvar_20140929.txt.gz 20141002 1521398
hg19_clinvar_20140929.txt.idx.gz 20141002 389735
hg19_clinvar_20150330.txt.gz 20150413 1988285
hg19_clinvar_20150330.txt.idx.gz 20150413 426235
hg19_clinvar_20150629.txt.gz 20150724 2211904
hg19_clinvar_20150629.txt.idx.gz 20150724 428773
hg19_clinvar_20151201.txt.gz 20160303 1978309
hg19_clinvar_20151201.txt.idx.gz 20160303 188549
hg19_clinvar_20160302.txt.gz 20160303 2070491
hg19_clinvar_20160302.txt.idx.gz 20160303 195824
hg19_clinvar_20161128.txt.gz 20161205 2762808
hg19_clinvar_20161128.txt.idx.gz 20161205 239561
hg19_clinvar_20170130.txt.gz 20170215 4756134
hg19_clinvar_20170130.txt.idx.gz 20170215 312735
hg19_dbnsfp30a.txt.gz 20151015 2916074880
hg19_dbnsfp30a.txt.idx.gz 20151015 4981998
hg19_dbnsfp31a_interpro.txt.gz 20151223 147102844
hg19_dbnsfp31a_interpro.txt.idx.gz 20151223 2445036
hg19_dbnsfp33a.txt.gz 20170123 3610182452
hg19_dbnsfp33a.txt.idx.gz 20170123 5034641
hg19_popfreq_all_20150413.txt.gz 20150413 1059027804
hg19_popfreq_all_20150413.txt.idx.gz 20150413 212518299
hg19_refGeneMrna.fa.gz 20151211 41379833
hg19_refGene.txt.gz 20151211 5304233
hg19_refGeneVersion.txt.gz 20151211 131417
hg19_spidex.zip 20150827 2991981619
desired output
refGene 20151211
clinvar 20170215
popfreq_all 20150413
dbnsfp 20170123
spidex 20150827
awk used to generate list.txt
awk 'FNR==NR{a[$1]; next} {for (i in a) if (index($0, i)) print}' database hg19_avdblist.txt > list