Kindly check:remove duplicates with similar data in front of it

manigrover · July 28, 2012, 3:40am

Hi all,

I have 2 files containing data like this:

so if there is same entry repeated in the column like1,2,3,4
I have to check if there is different entries column like 2,4
but similar entries for duplicatein column 2 like1,3

the output shuld be like this for first file

Please let me know scripting regarding this.

In the same way for second file as well if data in colmn 2 is diferent print for duplicate entries arranged it like this

rangarasan · July 28, 2012, 4:18am

Hi,

Try this one,

awk '{t=$0;r=$1" ";sub(r,"",t);if(a[$1]!~t){a[$1]=a[$1]" "t;}else{if(!a[$1]){a[$1]=t;}}}END{for(i in a){print i,a;}}' file1

It will work for both the files. I have not yet tested this.
Do you want combine these two files and do the rest?
Cheers,
Ranga:-)

manigrover · July 28, 2012, 7:09am

Hi

Thanks a lot Ranga

it has worked with the first file but nor with tthe second file

I dont have to combine both files

I have run separately

it has worked with first file but not with second

and it shows some sort of error like this, u might not able to understand because values are not like 1,2,3 and xyz as mentione din input but it follow the same pattern.there seems a littile error. Kindly check it

rangarasan · July 28, 2012, 7:26am

Hi,
The input file has some pattern match related characters like []. I have not tested the below code. Make a try with this.

awk '{$0=gensub(/([\]\[\(\)\{\}])/,"\\\1","g",$0);t=$0;r=$1"";sub(r,"",t);if(a[$1]!~t){a[$1]=a[$1]""t;}else{if(!a[$1]){a[$1]=t;}}}END{for(i in a){print i,a;}}' file1

you have to escape the special characters before going to use those in regex. You can also use quotemeta function in perl and then pass those output lines to awk.
Cheers,
Ranga:-)

manigrover · July 28, 2012, 7:36am

Thankyouvery much:):):):)
I want to write many!!

rangarasan · July 28, 2012, 7:45am

Please use code tags to wrap your post so that future user's will get benefit:-)
Cheers,
Ranga:-)