I am trying to use the two files shown below to either remove or rename contents in one of those files. If in file1.txt $5
matches $5
of file2.txt and the value in $1
of file1.txt is not "No Match" then that value is substituted for all values in $5 and $1
of file2.txt. If however in $1
of file1.txt the value is "No Match", then the row in file2.txt with that in it and the one below it are removed. Thank you :).
Contents of file1.txt
file1.txt
No Match chr1 35696 36106 DTE3504500000004
PXL-A0000005 chr1 69066 69311 DTE3504500000005
Contents of file2.txt
RefPrimer ref antiref omosome PrimerSet SeqRxn
AntirefPrimer antiref ref omosome
DTE3504500000001ref 34529 35031 1 DTE3504500000001 SeqRxn4
DTE3504500000001antiref 35031 34529 1
DTE3504500000002ref 35032 35283 1 DTE3504500000002 SeqRxn4
DTE3504500000002antiref 35283 35032 1
DTE3504500000003ref 35284 35506 1 DTE3504500000003 SeqRxn4
DTE3504500000003antiref 35506 35284 1
DTE3504500000004ref 35696 36106 1 DTE3504500000004 SeqRxn4
DTE3504500000004antiref 36106 35696 1
DTE3504500000004ref 69066 69311 1 DTE3504500000004 SeqRxn4
DTE3504500000004antiref 69311 69066 1
For example,
"DTE3504500000004" is the value of $5
in file1.txt and that matches row 3 of file2.txt $5
, since the value in $1
of file1.txt is "No Match", rows 3 and 4 are removed from file2.txt.
"DTE3504500000005" is the value of $5
in file1.txt and that matches row 9 of file2.txt $5
, since the value in $1
of file1.txt is not "No Match", rather "PXL-A0000005" that new value is used to replace all occurrences of the old value.
Desired output:
RefPrimer ref antiref omosome PrimerSet SeqRxn
AntirefPrimer antiref ref omosome
(rows 3 and 4 removed)
DTE3504500000002ref 35032 35283 1 DTE3504500000002 SeqRxn4
DTE3504500000002antiref 35283 35032 1
DTE3504500000003ref 35284 35506 1 DTE3504500000003 SeqRxn4
DTE3504500000003antiref 35506 35284 1
DTE3504500000004ref 35696 36106 1 DTE3504500000004 SeqRxn4
PXL-A0000005ref 69066 69311 1 PXL-A0000005 SeqRxn4
PXL-A0000005antiref 69311 69066 1
I'm lost.
The first line in file1.txt
has DTE3504500000004
in field 5. From your description (with the 1st field on that line being No Match
), the last four lines of file2.txt
should have been removed; not the 3rd and 4th lines.
The second line in file1.txt
has DTE3504500000005
in field 5. Since that string does not appear in file2.txt
, why should anything in file2.txt
be changed because of that line?
I hope this is more clear:
I am trying to use the two files shown below to either remove or rename contents in one of those files. If in combine.txt $5 matches $5 of output.txt and the value in $1 of combine.txt is not "No Match" then that value is substituted for all values in $5 and $1 of output.txt. If however in $1 of combine.txt the value is "No Match", then the row in output.txt with that $5 value in it and the one below it are removed. Thank you :).
For example,
"DTE3504500000004" is the value of $5 in combine.txt and that matches row 3 of output.txt $5 , since the value in $1 of combine.txt is "No Match", rows 9 and 10 are removed from output.txt.
"DTE3504500000005" is the value of $5 in combine.txt and that matches row 11 of output.txt $5 , since the value in $1 of combine.txt is not "No Match", rather "PXL-A0000005" that new value is used to replace all occurrences of the old value in output.txt.
file1.txt
No Match chr1 35696 36106 DTE3504500000004
PXL-A0000005 chr1 69066 69311 DTE3504500000005
Initial output.txt:
RefPrimer ref antiref omosome PrimerSet SeqRxn
AntirefPrimer antiref ref omosome
DTE3504500000001ref 34529 35031 1 DTE3504500000001 SeqRxn4
DTE3504500000001antiref 35031 34529 1
DTE3504500000002ref 35032 35283 1 DTE3504500000002 SeqRxn4
DTE3504500000002antiref 35283 35032 1
DTE3504500000003ref 35284 35506 1 DTE3504500000003 SeqRxn4
DTE3504500000003antiref 35506 35284 1
DTE3504500000004ref 35696 36106 1 DTE3504500000004 SeqRxn4
DTE3504500000004antiref 36106 35696 1
DTE3504500000005ref 69066 69311 1 DTE3504500000005 SeqRxn4
DTE3504500000005antiref 69311 69066 1
Desired output.txt:
RefPrimer ref antiref omosome PrimerSet SeqRxn
AntirefPrimer antiref ref omosome
DTE3504500000001ref 34529 35031 1 DTE3504500000001 SeqRxn4
DTE3504500000001antiref 35031 34529 1
DTE3504500000002ref 35032 35283 1 DTE3504500000002 SeqRxn4
DTE3504500000002antiref 35283 35032 1
DTE3504500000003ref 35284 35506 1 DTE3504500000003 SeqRxn4
DTE3504500000003antiref 35506 35284 1
PXL-A0000005ref 69066 69311 1 PXL-A0000005 SeqRxn4
PXL-A0000005antiref 69311 69066 1
Assuming that when you said:
file1.txt
No Match chr1 35696 36106 DTE3504500000004
PXL-A0000005 chr1 69066 69311 DTE3504500000005
is what is in combine.txt
, you really meant that the file you referred to as combine.txt
is really named file1.txt
(rather than the first line of combine.txt
containing the line file1.txt
, then maybe something like:
awk -F'\t' '
NR == FNR {
r[$5] = $1
next
}
FNR > 2 && m <= 0 && $5 in r {
p = $5
m = 2
}
m-- > 0 {
if(r[p] == "No Match")
next
gsub(p, r[p])
}
1' file1.txt output.txt > output.$$ && cp output.$$ output.txt && rm -f output.$$
will do what you want.
If you want to try this on a Solaris/SunOS system, change awk
to /usr/xpg4/bin/awk
.
1 Like
Works great.... thank you :).