Delete duplicated fields in a line

Hi,

I have files with this kind of format (separator is space):

A1 B1 C1 D1 E1 F1 D1 C1 G1 H1
A2 B2 C2 D2 E2 F2 D2 C2 G2 H2 
A3 B3 C3 D3 E3 F3 G3 D3 C3 H3
A4 B4 C4 D4 E4 F4 G4 D4 C4 H4

I want the output to be:

A1 B1 E1 F1 G1 H1
A2 B2 E2 F2 G2 H2
A3 B3 E3 F3 G3 H3
A4 B4 E4 F4 G4 H4 

Any clue? Can I use awk for this?

Try:

awk '{for (i=1;i<=NF;i++) a[$i]++;for (i=1;i<=NF;i++) if (a[$i]==1) printf $i" ";printf "\n"}' file

Try :

$ awk '{delete B;for(i=1;i<=NF;i++){if($i in B){$i=$(B[$i])=x}B[$i]=i};$0=$0;$1=$1}1' file

A1 B1 E1 F1 G1 H1
A2 B2 E2 F2 G2 H2
A3 B3 E3 F3 G3 H3
A4 B4 E4 F4 G4 H4

Hi Bartus and Akhsay,

Both script working but for only first line.. the rest are not.

May be need few modifications? The field containing strings with different format (characters, numbers, etc)

Another approach that will work for posted data:

awk '
        {
                for ( i = 1; i <= NF; i++ )
                {
                        n = gsub ( "\\<"$i"\\>", "&", $0 )
                        if ( n > 1 )
                                gsub ( "\\<"$i"\\>", X, $0 )
                }
                $1 = $1
                print $0
        }
' file
1 Like

Hi Scrunitzer,

It doesnt work.

This is the input format:

SEKK101 1C23.delay multiLink=0 dtx=0 sequence=1 >>> dtx=0 multiLink=0 sequence=0 >>>done.                      
SEKK106 1C22.delay multiLink=0 dtx=0 sequence=1 >>> dtx=0 multiLink=0 sequence=0 >>>done.                      
SEKK102 1C24.delay multiLink=0 dtx=0 sequence=1 >>> dtx=0 multiLink=0 sequence=0 >>>done.                      
SEKK101 1C20.delay multiLink=0 dtx=0 sequence=1 >>> dtx=0 multiLink=0 sequence=0 >>>done.                      
SEKK104 1C10.delay multiLink=0 dtx=0 sequence=1 >>> dtx=0 multiLink=0 sequence=0 >>>done.                      
SEKK104 1C11.delay multiLink=0 dtx=0 sequence=1 >>> dtx=0 multiLink=0 sequence=0 >>>done.                      
SEKK101 1C12.delay algoRithm=0 thresHold=10 upThresh=10 >>> upThresh=11 thresHold=10 algoRithm=0 >>>done.      
SEKK101 1C15.delay algoRithm=0 thresHold=10 upThresh=11 >>> upThresh=11 thresHold=11 algoRithm=0 >>>done.      
SEKK106 1C16.delay algoRithm=0 thresHold=10 upThresh=10 >>> upThresh=11 thresHold=10 algoRithm=0 >>>done.      
SEKK106 1C17.delay algoRithm=0 thresHold=10 upThresh=11 >>> upThresh=11 thresHold=11 algoRithm=0 >>>done.      
SEKK102 1C18.delay algoRithm=0 thresHold=10 upThresh=10 >>> upThresh=11 thresHold=10 algoRithm=0 >>>done.

Hi Gr4wk, I had deleted my post already since it was not-fool proof anyway.. Try this one instead:

awk '{for(i=1; i<NF; i++) for(j=i+1; j<=NF; j++) if($i==$j) $i=$j=x; $0=$0; $1=$1}1' file
1 Like

Thanks Scrutinizer.. it works!

Can you explain what is the meaning of the code?

awk '{delete a; delete b; for(i = 1; i <= NF; i++) {a = $i; b[$i]++}; for(i = 1; i <= length(a); i++) {if(b[$i] == 1) {printf "%s%s", a, FS}}; print ""}' file
1 Like

Good job SriniShoo.. your code working too.. can you explain please?

delete a; delete b

to clear arrays a & b

for(i = 1; i <= NF; i++) {a = $i; b[$i]++}

Parse through the line and and store each field value int to different arrays
a - to print the output in an order
b - to cehck duplicates

for(i = 1; i <= length(a); i++) {if(b[$i] == 1) {printf "%s%s", a, FS}}

After I read the line, I am printint the values from array a if array b says it doesn't have duplicate values

printf "%s%s", a, FS

for formatting the output

1 Like

Small addition to my old code, which I missed yesterday :slight_smile:

$ awk '{delete B;for(i=1;i<=NF;i++){if($i in B){$i=$(B[$i])=x}B[$i]=i}$0=$0;$1=$1}1' file

SEKK101 1C23.delay sequence=1 >>> sequence=0 >>>done.
SEKK106 1C22.delay sequence=1 >>> sequence=0 >>>done.
SEKK102 1C24.delay sequence=1 >>> sequence=0 >>>done.
SEKK101 1C20.delay sequence=1 >>> sequence=0 >>>done.
SEKK104 1C10.delay sequence=1 >>> sequence=0 >>>done.
SEKK104 1C11.delay sequence=1 >>> sequence=0 >>>done.
SEKK101 1C12.delay upThresh=10 >>> upThresh=11 >>>done.
SEKK101 1C15.delay thresHold=10 >>> thresHold=11 >>>done.
SEKK106 1C16.delay upThresh=10 >>> upThresh=11 >>>done.
SEKK106 1C17.delay thresHold=10 >>> thresHold=11 >>>done.
SEKK102 1C18.delay upThresh=10 >>> upThresh=11 >>>done.

---------- Post updated at 02:48 PM ---------- Previous update was at 02:44 PM ----------

Add delete a to bartus11's approach it works here is modified version of bartus11

$ awk '{delete a;for (i=1;i<=NF;i++) a[$i]++;for (i=1;i<=NF;i++) if (a[$i]==1) printf $i" ";printf "\n"}' file
1 Like

Sure:

awk '
{                              # For every line in file "file"
  for(i=1; i<NF; i++)          # Iterate variable "i" over the number of fields-1
    for(j=i+1; j<=NF; j++)     # Do the same for variable j from i+1 to the number of fields
      if($i==$j) $i=$j=x       # If two of these fields are equal then make their values ""
  $0=$0                        # Recalculate the fields, if previously fields were made equal to "" 
                                    #then there are now fewer fields..
  $1=$1                        # Recalculate the record, so that any amount of spacing between fields 
                                    # is converted to the OFS which is a single space.  
}
1                              # Print the record
' file                         # Read the file "file"

Hope this helps..

2 Likes