i need help to remove duplicates in my file. The problem is i need to delete one duplicate for each line only. the input file as follows and it is not tab delimited:-
The output need to remove 2nd word (in red) that duplicate with 1st word (in blue). Other duplicates should remained unchanged. my output should be like this:-
i don't know how to do this. i did try but it deleted all the other duplicates as well in that lines. tried to google too and it seems that most of the issue is the duplicate lines. Please kindly help. Thanks
Here is an awk solution. Note that the 3rd record in your output file does not match your input and requirement as the fields do not match. Assuming that the '&' in the beginning of field 1 is not included when matching field 2 even though you highlighted in blue.
Thanks so much for your fast responses. I tried all of your codes, and Yoda codes perfectly solved my problem. mjf, your codes worked too but it deleted some of the strings that i have in my file. I have a huge files that has many weird things, and i tried changing your codes to see it how it goes. There are still strings missing though i managed to get some. and Scrutinizer, i have a problem with your codes too. But, i really appreciate your ideas on this. Thanks a lot guys!
---------- Post updated at 09:10 PM ---------- Previous update was at 09:09 PM ----------
substr($1,2) ---> if your input is &aff2g0440 after using substr($1,2) you will get aff2g0440 second char onwards from column 1 and it searches for exact match in column2, if condition is true,
$2 = x ---> since x is not set, its NULL so field will be masked here(or empty field2)
$0 = $0 ---> recalculate field
$1=$1 ---> recalculate record, and remove space
finally
}1 --> 1 is true 0 is false since its one so prints all the line
--
The awk's gsub will delete all ampersands on the line (instead of the first character ampersand in $1), which happens to work with the given input..
Desired output is not the one which you have shown. please read what is thread is about and all answers (if answered earlier) before you reply something.