Can't you apply what you learned from your thread Cut & awk four days ago to this thread? You have exactly the same problem assuming that some elements of a pipeline will only process some of the lines they are fed or that lines thrown away by some element of a pipeline will still magically appear in your output.
You didn't ask any questions about the suggestions you were given there, so we assume that you understand how those suggestions work.
Hi Xterra,
You can use system() to run shell commands inside awk , but invoking a shell to invoke rev and tr once for each even numbered line in your file will take at least two orders of magnitude longer to run than building equivalent functionality into your awk script. If we write an awk script to print odd numbered lines and feed even numbered lines through rev and tr :
it is easy to understand and, with an input file containing 10,000 copies of your sample input file, the average of timing 10 runs (with output redirected to a file) is about:
real 1m5.37s
user 0m41.09s
sys 0m49.33s
A similar awk script building the rev and tr functionality into an internal function:
#!/bin/ksh
awk '
BEGIN { c["A"] = "T"; c["C"] = "G"; c["G"] = "C"; c["T"] = "A" }
function revcomp( i, o) {
o = ""
for(i = length; i > 0; i--)
o = o c[substr($0, i, 1)]
return(o)
}
!(FNR % 2) {$0 = revcomp()}
1' ${1:-infile}
produces exactly the same output and takes about:
real 0m0.16s
user 0m0.15s
sys 0m0.00s
In other words this awk script processes a little more than 800 lines in the time it take to process 2 lines firing up a pipeline to process the even lines.
The average timing for Aia's perl suggestion was:
real 0m0.03s
user 0m0.02s
sys 0m0.01s
For some reason the BSD based sed on OS X produced the wrong output (with leading and trailing X characters on even numbered lines; the lines had been translated but not reversed) without producing any diagnostics when running RudiC's sed script. But an equivalent command (splitting on semicolons into separate sed editing commands):