Size Selecting rows

I have a rather convoluted script that I use to extract information from CSV files:

sed '1d' PeakTable.txt | awk -F ',' '!/Ladder/{ if ( $4 > 430 && $4 < 490 && $5 > 45 ) print $2, $5; else print $2, 0 }' | awk '{a[$1]+=$2}END{for(i in a){print i, a}}' | sed 's/\([A-z]\)\([1-9]\) /\10\2 /' | sort | awk '{if ($2 > 10) print $1, $2, 800 / $2; else print $0, 0}' | awk '{print "PCR_Plate", NR, $3, 1, "Deadpool" }' | sed '1 i\Source Well_Source Volume Destination_Well Destination' | sed 's/ /,/g' > output.txt

Here is the breakdown:

sed '1d'                                                                                             # delete first line
awk -F ',' '!/Ladder/{ if ( $4 > 430 && $4 < 490 && $5 > 45 ) print $2, $5; else print $2, 0 }'      # Exclude rows including "Ladder" and selecting the ones with values between 430-490 in clumn 4; and greather than 45 in column 5. 
awk '{a[$1]+=$2}END{for(i in a){print i, a}}'                                                     # collapsing rows with identical values in column 1
sed 's/\([A-z]\)\([1-9]\) /\10\2 /' | sort                                                           # adding 0 to single digits and sorting
awk '{if ($2 > 10) print $1, $2, 800 / $2; else print $0, 0}'                                        # if column 2 > 10, divide 800 by value in column 2 and add it to the 3rd column. Otherwise print the line and add a 0 in 3rd column
awk '{print "PCR_Plate", NR, $3, 1, "Deadpool" }'                                                    # add constants to columns 1, 2 (consecutive numbers), value in column 3, 1, Deadpool 
sed '1 i\Source Well_Source Volume Destination_Well Destination'                                     # add headers
sed 's/ /,/g'                                                                                        # replace blank spaces with commas -CSV format

I would like to have one well structured awk script handling the entire process. Any help will be greatly appreciated.

PS. I am attaching both the infile and outfiles.

Unfortunately I'm not at my own *nix system but on a - hrrrrgh - windows machine, so I can't test what I'm proposing here. It might still be a pointer in a desireable direction. Read and apply to taste:

awk -F, '
NR == 1    {print "Source,Well_Source,Volume,Destination_Well,Destination"
            OFS = ","
            next
           }
 
!/Ladder/  {A[sprintf ("%s%02d", substr ($2, 1, 1), substr ($2, 2))] += ( $4 > 430 && $4 < 490 && $5 > 45 ) ? $5 : 0
           }
END        {for (a in A) print "PCR_Plate", ++CNT, A[a] > 10)?800/A[a]:0, 1, "Deadpool"
           }
 ' peaktable.txt

and sort to taste as my awk doesn't offer a sort function.
I'm not sure I caught all subtleties of your request, please come back with any questions that might arise.

EDIT: Well, back at my desk, able to test, and I modified above to satisfy the request:

sort -t, -k2.1,2.1 -k2.2,2n /tmp/PeakTable.txt |
awk -F, '
NR == 1         {print "Source,Well_Source,Volume,Destination_Well,Destination"
                 OFS = ","
                 next
                }

/Ladder/        {next
                }

LAST2 && 
$2 != LAST2     {print  "PCR_Plate", ++CNT, (SUM > 10)?800/SUM:0, 1, "Deadpool"
                 SUM = 0
                }

                {SUM   += ( $4 > 430 && $4 < 490 && $5 > 45 ) ? $5 : 0
                 LAST2  = $2
                }

 ' > /tmp/output.txt

yields exactly the output desired. No END section needed as the header is sort ed last and prints the H12 entry...

1 Like

Rudi
Thanks a TON! One more favor, could you please explain me how

LAST2 && 
$2 != LAST2

works in your script?

Appreciate it!

It means
IF LAST2 has a value which means we're at least in input line 2 (remember variables are unset thus "empty" thus FALSE at the beginning of the script; minor risk that $2 will be 0 or empty doesn't exist with your data)

AND $2 differs from LAST2 which means we have a new $2 value,
THEN execute the print and reset SUM.

1 Like

Thanks !