Duplicate check by passing external parameter

ginrkf · January 3, 2017, 6:37am

I have a code which is using to find duplicates in a files based on column.Below is the same code which is used to find duplicates in my file based on column 1

awk -F'|' '{if (x[$1]) { x_count[$1]++; print $0; if (x_count[$1] == 1) { print x[$1] } } x[$1] = $0}' FileName >Dup_File.txt

But my requirement here is that, I want to make this command global , where I will be able to pass the column position as an external parameter.

if we pass the external parameter to the above code to check for column two and three then the code should look like below,

awk -F'|' '{if (x[$1]) { x_count[$2,$3]++; print $0; if (x_count[$2,$3] == 1) { print x[$2,$3] } } x[$2,$3] = $0}' FileName >Dup_File.txt

RudiC · January 3, 2017, 6:51am

A few questions:

WHAT is your request? A malfunction? An error? A non-satisfactory result?
Do you know how to pass variables to awk ?
Why do you consistently use $2,$3 as the array index except for the first x[$1] ?
Why do you print $0 from its second occurrence and again for the exact second time?
Some sample data might help...

ginrkf · January 3, 2017, 7:05am

Ok You can leave ignore my previous post.So let me put my requirement here.

I have a pipe delimited file. I am trying to create a UNIX script which can be used across different files to print the duplicates line from the file.

For ex. I have two files A.txt and B.txt

A.txt

123|345|asd
122|ASD|DEF
123|ASW|231

For this file A.txt I need to print duplicate records based on first column.So my expected out put is

123|345|asd
123|ASW|231

B.txt

34|aw|asd
33|aq|qw
54|aq|qw

For this file B.txt I need to print duplicate records based on 2nd and 3rd column.My expected output is

33|aq|qw
54|aq|qw

So I need to write one script, where I can pass the column positions based on which the duplicate need to be checked can be passed as external parameter.

For the first file, sh test_script.sh 1
For the second file , sh test_script.sh "2,3" something like this

RudiC · January 3, 2017, 8:38am

Try

awk -F'|' '
NR == 1 {n = split (FLDS, ARRX, ",")
        }

        {IX = ""
         for (i=1; i<=n; i++) IX = IX FS $ARRX
        }

!x[IX]  {x[IX] = $0
         L[IX] = 1
         next
        }
L[IX]   {print x[IX]
         L[IX] = 0
        }
1
' FLDS="2,3" file
33|aq|qw
54|aq|qw