Run a program-print parameters to output file-replace op file contents with max 4th col

Hi Friends,

This is the only solution to my task. So, any help is highly appreciated.

I have a file

cat input1.bed

chr1 100 200 abc
chr1 120 300 def
chr1 145 226 ghi
chr2 567 600 unix

Now, I have another file by name

input2.bed (This file is a binary file not readable by the terminal).

But, there is a program in our field that executes by taking this

input2.bed
program input_file -chrom -start -end output_file

Now, my task is this

  1. Read input1.bed's each record

  2. Feed it in the following way to the program, so that the program executes in a continuous loop for each record in input1.bed this way and generate the output files with each input1.bed's record as their name

program input2.bed -chrom=chr1 -start=100 -end=200 chr1_100_200_op.bed
program input2.bed -chrom=chr1 -start=120 -end=300 chr1_120_300_op.bed
program input2.bed -chrom=chr1 -start=145 -end=226 chr1_145_226_op.bed
program input2.bed -chrom=chr2 -start=567 -end=600 chr2_567_600_op.bed
  1. For example, I consider the first output file -
chr1_100_200_op.bed

.

cat chr1_100_200_op.bed

chr1 110 120 45.67
chr1 177 189 98.50
chr1 195 200 111.11
  1. Now, ignore the first three columns of the above output file, but consider the maximum fourth column value, which is 111.11 and replace the entire contents of my chr1_100_200_op.bed with just the file name, which will be this one
cat chr1_100_200_op.bed

chr1_100_200 111.11

This is it. Please ask me as many questions as you have for a better solution. Thanks a ton for all your time.

while read CHROM START END NAME
do
        # Create the bed file
        program input2.bed -chrom=$CHROM -start=$START -end=$END ${CHROM}_${START}_${END}_op.bed

        # Replace column 1 with filename,
        # column 2 with the last column,
        # reduce it to 2 columns,
        # and print all lines.
        awk '{$1=F ; $2=$NF; NF=2 } 1' F="${CHROM}_${START}_${END}" ${CHROM}_${START}_${END}_op.bed > /tmp/$$
        cat /tmp/$$ > ${CHROM}_${START}_${END}_op.bed
done < input1.bed
# Remove temporary file
rm -f /tmp/$$

For 3 and 4, you start with 3 lines and end with 1 line. Is this intended? I've assumed it's not, that you want 3 lines out for 3 lines in.

1 Like

Hi Corona,

Thanks for your time.

For 3 and 4, usually the output file has thousands of records. But, I want to consider the maximum value of fourth column and print the filename as another column.

So, the three records will go out and only one record will remain, as in the example.

while read CHROM START END NAME
do
        # Create the bed file
        program input2.bed -chrom=$CHROM -start=$START -end=$END ${CHROM}_${START}_${END}_op.bed

        # Replace column 1 with filename,
        # column 2 with the last column,
        # reduce it to 2 columns,
        # and print all lines.
        awk '(!M)||(M<$NF){ M=$NF } END { print F, M }' F="${CHROM}_${START}_${END}" ${CHROM}_${START}_${END}_op.bed > /tmp/$$
        cat /tmp/$$ > ${CHROM}_${START}_${END}_op.bed
done < input1.bed
# Remove temporary file
rm -f /tmp/$$
1 Like

Thanks corona for your quick solution. It took me a while to make my input files and cross check the output files.

The only problem I am getting here is that, for some combinations of the start and end there is no data in my input2.bed.

So, the output file is printing blank spaces, for example like this

cat output.bed
chr1 100 200 45.09999
chr1 120 130 
chr1 145 178 78.999

How do I replace that empty space on column 4 with "ND"?

My output would be

cat output.bed
chr1 100 200 45.09999
chr1 120 130 ND
chr1 145 178 78.999

Try sed -i 's/ $/ ND/' output.bed

1 Like

Its not generating any output.

It doesn't, -i tells GNU sed to edit and replace the original file.

If you want a new file, leave off the -i and redirect the output to a new file.

1 Like