Delete duplicate strings in a line

redse171 · December 12, 2013, 5:47pm

Hi,

i need help to remove duplicates in my file. The problem is i need to delete one duplicate for each line only. the input file as follows and it is not tab delimited:-

The output need to remove 2nd word (in red) that duplicate with 1st word (in blue). Other duplicates should remained unchanged. my output should be like this:-

i don't know how to do this. i did try but it deleted all the other duplicates as well in that lines. tried to google too and it seems that most of the issue is the duplicate lines. Please kindly help. Thanks

mjf · December 12, 2013, 6:19pm

Here is an awk solution. Note that the 3rd record in your output file does not match your input and requirement as the fields do not match. Assuming that the '&' in the beginning of field 1 is not included when matching field 2 even though you highlighted in blue.

awk '{if (substr($1,2,length($1)-1)==$2) print $1,$3,$4,$5; else print $1,$2,$3,$4,$5 }' file.txt
&aff2g0440 aspl2221 nos:scad1 blablablabla
&aff2g0740 aspl5221 nos:scad1 blablablabla
&aff4g0160 aff4g01600 aspl2251 nos:scad1 blablablabla
&aff9g0020 aspl3391 nos:scad2 blablablabla

Yoda · December 12, 2013, 6:29pm

awk '$1~$2{$2=x}1' file

Scrutinizer · December 12, 2013, 6:57pm

$ sed 's/&\([^ ]* \)\1/\&\1/' file
&aff2g0440 aspl2221 nos:scad1 blablablabla
&aff2g0740 aspl5221 nos:scad1 blablablabla 
&aff4g0160 aff4g01600 aspl2251 nos:scad1 blablablabla
&aff9g0020 aspl3391 nos:scad2 blablablabla

redse171 · December 12, 2013, 9:10pm

Hi guys,

Thanks so much for your fast responses. I tried all of your codes, and Yoda codes perfectly solved my problem. mjf, your codes worked too but it deleted some of the strings that i have in my file. I have a huge files that has many weird things, and i tried changing your codes to see it how it goes. There are still strings missing though i managed to get some. and Scrutinizer, i have a problem with your codes too. But, i really appreciate your ideas on this. Thanks a lot guys!

---------- Post updated at 09:10 PM ---------- Previous update was at 09:09 PM ----------

hi Yoda,

if possible, can u explain your code here? thanks

Akshay_Hegde · December 12, 2013, 9:25pm

OR

$ awk 'substr($1,2) == $2{$2=x;$0=$0;$1=$1}1' file

&aff2g0440 aspl2221 nos:scad1 blablablabla
&aff2g0740 aspl5221 nos:scad1 blablablabla
&aff4g0160 aff4g01600 aspl2251 nos:scad1 blablablabla
&aff9g0020 aspl3391 nos:scad2 blablablabla

redse171 · December 12, 2013, 9:42pm

Hi Akshay,

your codes work perfectly..thanks..can you pls explain it? thanks

Akshay_Hegde · December 12, 2013, 9:55pm

substr($1,2) ---> if your input is &aff2g0440 after using substr($1,2) you will get aff2g0440 second char onwards from column 1 and it searches for exact match in column2, if condition is true,

$2 = x ---> since x is not set, its NULL so field will be masked here(or empty field2)

$0 = $0 ---> recalculate field

$1=$1 ---> recalculate record, and remove space

finally

}1 --> 1 is true 0 is false since its one so prints all the line

redse171 · December 12, 2013, 9:57pm

Hi Akshay,

thanks so much!! your explanation is simple and clear

RavinderSingh13 · January 23, 2014, 3:52pm

one more approach for awk .

cat <<eof | awk '{gsub(/\&/,X)}  $1==$2 {$2=X;$1="&"$1}1'
&aff2g0440 aff2g0440 aspl2221 nos:scad1 blablablabla
&aff2g0740 aff2g0740 aspl5221 nos:scad1 blablablabla
&aff4g0160 aff4g01600 aspl2251 nos:scad1 blablablabla
&aff9g0020 aff9g0020 aspl3391 nos:scad2 blablablabla
eof

Output will be as follows.

&aff2g0440  aspl2221 nos:scad1 blablablabla
&aff2g0740  aspl5221 nos:scad1 blablablabla
aff4g0160 aff4g01600 aspl2251 nos:scad1 blablablabla
&aff9g0020  aspl3391 nos:scad2 blablablabla

Thanks,
R. Singh

Scrutinizer · January 23, 2014, 4:03pm

@ravinder, that is UUOC ... compare:

awk ... <<eof
[...]

--
The awk's gsub will delete all ampersands on the line (instead of the first character ampersand in $1), which happens to work with the given input..

RavinderSingh13 · January 23, 2014, 4:35pm

Thank you Scrutinizer for correcting me.

Akshay_Hegde · January 23, 2014, 9:40pm

@RavinderSingh13

Desired output is not the one which you have shown. please read what is thread is about and all answers (if answered earlier) before you reply something.