Need to print duplicate row along with highest version of original

There are some duplicate field on description column .I want to print duplicate row along with highest version of number and corresponding description column.

file1.txt
number   Description
===     ============
34567  nl21a00is-centerdb001:ncdbareq:Error in loading init
34577  nl21a00is-centerdb001:ncdbareq:Error in loading init
45678  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing
43567  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info
24578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn
45890  nl21a00is-centerdb001:testingQA:FSFO has configuration errors
45698  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing
43599  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info
25578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn
51890  nl21a00is-centerdb001:ncdbareq:Error in loading init
out.txt 
34567  nl21a00is-centerdb001:ncdbareq:Error in loading init  IS DUPLICATE OF "51890  nl21a00is-centerdb001:ncdbareq:Error in loading init"
34577  nl21a00is-centerdb001:ncdbareq:Error in loading init  IS DUPLICATE OF "51890  nl21a00is-centerdb001:ncdbareq:Error in loading init"
45678  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing IS DUPLICATE OF "45698  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing"
43567  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info IS DUPLIATE OF "43599  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info"
24578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn IS DUPLICATE OF "25578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn"

An awk approach:

awk '
        NR > 2 {
                V = $0
                sub ( $1, X, V )
                gsub ( /^[ ]*|[ ]*$/, X, V )
                R[++c] = $1 "," V

                if ( V in A )
                {
                        if ( A[V] < $1 )
                        {
                                M[V] = $1
                        }
                }
                else
                {
                        A[V] = $1
                }
        }
        END {
                for ( i = 1; i <= c; i++ )
                {
                        n = split ( R, T, "," )
                        if ( M[T[n]] != T[1] && M[T[n]] )
                                print A[T[n]], T[n], "IS DUPLICATE OF \"" M[T[n]], T[n] "\""
                }
        }
' file

The second duplicate entry 34577 is not reported ...the output.txt as per script given..

34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"
34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"
45678 nl21a00is-centerdb001:ncdbareq:Error in loading Sizing IS DUPLICATE OF "45698 nl21a00is-centerdb001:ncdbareq:Error in loading Sizing"
43567 nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info IS DUPLICATE OF "43599 nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info"
24578 nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn IS DUPLICATE OF "25578 nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn"

and in second line its showing 34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"

instead of

34577 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"

Change

print A[T[n]], T[n], "IS DUPLICATE OF \"" M[T[n]], T[n] "\""

To

print T[1], T[n], "IS DUPLICATE OF \"" M[T[n]], T[n] "\""
1 Like

This is perfect....can you please explain this code..thanks

Here is a brief explanation:

awk '
        # Skip first two records of input file
        NR > 2 {
                # Set variable V = $0 (current record)
                V = $0
                # Remove first field to get the description in variable: V value
                sub ( $1, X, V )
                # Remove leading and trailing space from description in variable: V value
                gsub ( /^[ ]*|[ ]*$/, X, V )
                # Create indexed array: R with 1st and 2nd field separated by comma
                R[++c] = $1 "," V
                Check if associative array: A contain record indexed by variable: V value
                if ( V in A )
                {
                        If yes, compare if existing vale is less that 1st field value
                        if ( A[V] < $1 )
                        {
                                Set associate array: M = $1 (maximum value)
                                M[V] = $1
                        }
                }
                # If associative array: A does not contain record indexed by V value
                else
                {
                        Set associative array: A indexed by V = $1
                        A[V] = $1
                        Set associative array: M indexed by V = $1
                        M[V] = $1
                }
        }
        # END Block
        END {
                # For each element in indexed array: R
                for ( i = 1; i <= c; i++ )
                {
                        # Split record separated by comma into array: T
                        n = split ( R, T, "," )
                        # Print records that are having duplicates and not having maximum value
                        if ( M[T[n]] != T[1] && M[T[n]] )
                                print T[1], T[n], "IS DUPLICATE OF \"" M[T[n]], T[n] "\""
                }
        }
' file
1 Like

Hi

I want to print the below output file in tabular format.

output

34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"
34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"
45678 nl21a00is-centerdb001:ncdbareq:Error in loading Sizing IS DUPLICATE OF "45698 nl21a00is-centerdb001:ncdbareq:Error in loading Sizing"
43567 nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info IS DUPLICATE OF "43599 nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info"
24578 nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn IS DUPLICATE OF "25578 nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn"

means i need to put all output file in tabular format using html.
can you please help me

DESIRE outfile

DUPLICATE ENTRY NEWLY GENERATED TICKET

34567  nl21a00is-centerdb001:ncdbareq:Error in loading init  	51890  nl21a00is-centerdb001:ncdbareq:Error in loading init
34577  nl21a00is-centerdb001:ncdbareq:Error in loading init 	51890  nl21a00is-centerdb001:ncdbareq:Error in loading init
45678  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing	45698  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing
43567  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info	43599  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info
24578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn	25578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn

Hi, what have you tried yourself? Have a look at what Yoda took the trouble to explain. Look for the print statement and try out a couple of modifications. Just experiment.