The below awk
improved bu @MadeInGermany, works great as long as the input
file has data in it in the below format:
input
chrX 25031028 25031925 chrX:25031028-25031925 ARX 631 18
chrX 25031028 25031925 chrX:25031028-25031925 ARX 632 14
chrX 25031028 25031925 chrX:25031028-25031925 ARX 633 14
chrX 25031028 25031925 chrX:25031028-25031925 ARX 634 13
chrX 25031028 25031925 chrX:25031028-25031925 ARX 635 12
awk
awk '
# print from stored values
function prt(){
print p1 ":" (p6start==1 ? p2 : p2+p6start) "-" p2+p6, "\t" p5
}
($4!=p4 || $6!=p6+1) {
# new sequence, print the previous sequence
if (NR>1) prt()
p6start=$6
}
{
# store the values that we need later
p1=$1
p2=$2
p4=$4
p5=$5
p6=$6
}
END { prt() }
' input | awk -F"[:-]" ' { print $1 "\t" $2 "\t" $3 "\t" $4}' > out
out
chrX 25031659 25031663 ARX
However, when input
is an empty file:
out
0 0
I am trying to put a condition in the awk
that will check for file being empty or 0 bytes and if file is not empty then the output remains the same, but if file is empty then the output is reformatted to the desired output of 4 zeros, tab-delimited
. My attempt to do this is in the # check if empty below in bold. Since the process after this awk
is expecting 4 tab-delimited fields
I need to add this check. Thank you :).
awk '
# print from stored values
function prt(){
print p1 ":" (p6start==1 ? p2 : p2+p6start) "-" p2+p6, "\t" p5
}
($4!=p4 || $6!=p6+1) {
# new sequence, print the previous sequence
if (NR>1) prt()
p6start=$6
}
{
# store the values that we need later
p1=$1
p2=$2
p4=$4
p5=$5
p6=$6
}
END { prt() }
' file | awk -F"[:-]" '
{ print $1 "\t" $2 "\t" $3 "\t" $4}
' > out # make low coverage
# check if empty
if [ -s aFile ]; then
else
output={ print "0 0 0 0" }
fi
desired out ---- four zeros separated by tabs
0 0 0 0