awk to find maximum and minimum from column and store in other column

Need your support for below. Please help to get required output

If column 5 is INV then only consider column1 and take out duplicates/identical rows/values from column1 and then put minimum value of column6 in column7 and put maximum value in column 8 and then need to do subtract values of column 7 from column 8 and then plus one to result and store final result in column 9 and column 11. Also need to ommit/delete column(column5 and column 6) in output file

For example in input file 27AAACH1458C1ZZ is 8 times, but need to consider only 7 rows as one row in column 5 has value REVERSED which we need to be discard(need to consider rows where column 5 has values "INV").

So among 7 rows, in column 6 minimum value will be "IN27201800023182" and maxmimum value will be "IN27201800024289", so need to put minimum in column7 and maximum in column8.

Thereafter need to consider last 8 digits from both column(7 and 8) and subtract column 7 from column 8. Then need to add/plus 1 to subtraction result and store in column 9 and column 11. Need to skip/delete column 5 and column 6 in output file.

    a|b|c|d|e|f|g|h|i|j|k
    27AAAC8C1ZZ|042018|||INV|IN27201800023521|||||
    27AAAC8C1ZZ|042018|||INV|IN27201800024289|||||
    27AAAC8C1ZZ|042018|||INV|IN27201800023356|||||
    27AAAC8C1ZZ|032018|||REVERSED|IN27201800022431|||||
    27AAAC8C1ZZ|042018|||INV|IN27201800023400|||||
    27AAAC8C1ZZ|042018|||INV|IN27201800023182|||||
    

output should be

a|b|c|d|g|h|i|j|k
27AAAC8C1ZZ|042018|||N27201800023182|IN27201800024289|1108||1108

Code im trying to find maximum and minimum value, but getting wrong output.

 awk 'BEGIN{OFS=FS="|"} {if ($5=="INV"){ getline; min=$6;max=$6}}
          {(min>$6)?min=$7:"";(max>$6)?"":max=$8}
     END{print min, max}' input.txt

Wrong output

IN546474DGDGD00|

Did you read and consider the comments in your other recent thread(s)? A specification that doesn't need reading thrice or even more often helps people help you.
Why the leading white space in the input, and why is that removed in the output?
Are the key values ($1) in contiguous order, or are they scattered through the file? Is that order to be retained?
Which $2 value to retain; should they differ?
How would you define a minimum and / or maximum of the last 8 chars of IN546474DGDGD00 , or their difference?
What to do with the values that have just one record (the last two in the sample)?
Why assign values to fields 7 and 8, and then remove fields 5 and 6 resulting in the new fields being 5 and 6?

Hi Rudic,

I posted my response long back, but now i found out it did not went.
so posting again..
please help and let me know in case have any query

There are no leading spaces, i think it came in post when ..sorry
Values in ($1) are scattered through file.
Consider the last eight digit in column 6 to find max and min.
No need to consider last 2 records as one dnt have INV in column 5 in input and also both are unique and no duplicate, we dnt need to print those in output file.
we required, max and min in seperate column, so want them in separate cloumn.

Did you read my questions, and relate them to your sample data?
WHAT are the last 8 digits in IN546474DGDGD00 , and their min, max, and difference?
HOW to handle $2?

Hi Rudic,

Have edited my post and removed last two rows as they were creating confusion.
also there nothing to be done with $2, i will hard code $2 value..later..

We need to check column 5 and need to consider value of $1 only when we have values as "INV" in column 5 and this will take care $2 as well.

Try

awk -F\| -vOFS=\| '
NR==1           {sub ("\\"FS $5 "\\"FS $6 "\\"FS, FS)
                 print
                 next
                }
$5 == "INV"     {if (MIN[$1] > $6 || MIN[$1] == "") MIN[$1] = $6
                 if (MAX[$1] < $6 || MAX[$1] == "") MAX[$1] = $6
                 F2[$1] = $2
                }
END             {for (m in MIN)         {MN = substr (MIN[m], length(MIN[m])-7)
                                         MX = substr (MAX[m], length(MAX[m])-7)
                                         TMP = MX - MN + 1
                                         if (TMP != 1) print m, F2[m], _, _, MIN[m], MAX[m], TMP, _, TMP
                                        }
                }
' file
    a|b|c|d|g|h|i|j|k
    27AAAC8C1ZZ|042018|||IN27201800023182|IN27201800024289|1108||1108
1 Like

Hi Rudic,

Just one more help,
need to handle one more thing
if value in $1 is unique then need to show output as like below:
In $5 and $6 in ouput file have to make min and max value same as in input file in $6 in input and then do subtraction and then add/plus 1 to result to make it 1

a|b|c|d|e|f|g|h|i|j|k

27AAAC8C1ZZ|042018|||INV|IN27201800023521|||||
    27AAAC8C1ZZ|042018|||INV|IN27201800024289|||||
    27AAAC8C1ZZ|042018|||INV|IN27201800023356|||||
    27AAAC8C1ZZ|032018|||REVERSED|IN27201800022431|||||
    27AAAC8C1ZZ|042018|||INV|IN27201800023400|||||
    27AAAC8C1ZZ|042018|||INV|IN27201800023182|||||
    27PPPC5C1PP|042018|||INV|IN27201565757575|||||

output

a|b|c|d|g|h|i|j|k

27AAAC8C1ZZ|042018|||N27201800023182|IN27201800024289|1108||1108
27PPPC5C1PP|042018|||IN27201565757575|IN27201565757575|1||1|

---------- Post updated at 11:30 AM ---------- Previous update was at 06:39 AM ----------

Hi Rudic,

Sorry for bothering you.
Can you please send me some web link to study and learn these kind of awk scenarios.
It would be really great help.

To you last request: these forums are an excellent source of examples on how to apply awk (and other tools) to solve tricky, special problems.

In general: I think you need to learn to exercise due care when posting a request, like correctly collecting and formatting representative sample data, describing logical relations and context, and phrasing conditions. And, don't ignore questions to unclear specifications and / or data.
All that done, you won't need several other posts to hop around correcting / modifying sample data, and explaining (or even correcting) statements.

A very small adaption of my post#6 proposal will make the output

    a|b|c|d|g|h|i|j|k
    27AUFDDJD45474|032018|||IN546474DGDGD00|IN546474DGDGD00|1||1
    27AAAC8C1ZZ|042018|||IN27201800023182|IN27201800024289|1108||1108
    27AAAC87PP|042018|||IN272094674547|IN272094674547|1||1

when applied to the ORIGINAL data in post#1, before you modified it (which, BTW, is regarded impolite at least as it makes later references look silly...). You are wholeheartly invited to find and apply the adaption should the output come close to what you need.