Hi,
I have a file which looks like:ke this : chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11131583 11131618 chr1 11127067 11132181 89 chr1 11131908 11132010 chr1 11130990 11131025 chr1 11127067 11132181 89 chr1 11131908 11132010 chr1 11131583 11131618 chr1 11127067 11132181 89 chr1 11130992 11131108 chr1 11130990 11131025 chr1 11127067 11132181 89 chr1 11130992 11131108 chr1 11131583 11131618
the expected output should be like this
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025
I want all the duplicate lines to be removed from all the columns
and I want an output which should be able to remove duplicate entries from all the columns....
Please use code tags to preserve formatting in the data samples. Your input is barely readable.
That one line? or multiple lines?
What's the expected output?
1 Like
the duplicate entries in all the columns should be removed...and sorry for the bad post
can you see the below post and use CODE tag
1 Like
Like this?
sed 's/chr1/\
&/g' file|awk 'NF{
a=""
for(i=1;i<=NF;i++)
a=a " " $i
if(!(a in exists))
{
print
exists[a]
}
}'|paste -sd\\0 -
Thanks but this doesn't work.
all the duplicate entries in all the columns should be removed....
Amit,
can you post your data with code tag. otherwise, we all give the solutions with some assumption.
Sorry for that.
I've edited my earlier post. Check it. Should work.
Thanks for your effort ....but still it does not work.
What do you mean by "duplicate entries"? Please define clearly in the context of your example.
I should've asked you this first (and paid heed to itkamaraj's implicit advice) and shouldn't have made an effort in the first place . :wall:
Scott
October 29, 2012, 11:14am
11
Maybe you should repost your data using code tags. Then everyone could reasonably expect to know exactly what they're working with.
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 5 chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11131583 11131618 1 chr1 11127067 11132181 89 chr1 11131908 11132010 chr1 11130990 11131025 5 chr1 11127067 11132181 89 chr1 11131908 11132010 chr1 11131583 11131618 1 chr1 11127067 11132181 89 chr1 11130992 11131108 chr1 11130990 11131025 5 chr1 11127067 11132181 89 chr1 11130992 11131108 chr1 11131583 11131618 1 chr1 11127067 11132181 89 chr1 11128311 11128447 chr1 11130990 11131025 5 chr1 11127067 11132181 89 chr1 11128311 11128447 chr1 11131583 11131618 1 chr1 11127067 11132181 89 chr1 11130630 11130711 chr1 11130990 11131025 5 chr1 11127067 11132181 89 chr1 11130630 11130711 chr1 11131583 11131618 1 chr1 11127067 11132181 89 chr1 11130729 11130979 chr1 11130990 11131025 5 chr1 11127067 11132181 89 chr1 11130729 11130979 chr1 11131583 11131618 1 chr1 11127067 11132181 89 chr1 11131263 11131553 chr1 11130990 11131025 5 chr1 11127067 11132181 89 chr1 11131263 11131553 chr1 11131583 11131618 1 chr1 11127067 11132181 89 chr1 11131587 11131709 chr1 11130990 11131025 5 chr1 11127067 11132181 89 chr1 11131587 11131709 chr1 11131583 11131618 1 chr1 11127067 11132181 89 chr1 11132034 11132488 chr1 11130990 11131025 5 chr1 11127067 11132181 89 chr1 11132034 11132488 chr1 11131583 11131618 1
and the output should look like this for all the lines...sorry for the trouble friends...I am not an expert with computers
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 5
chr1 11131908 11132010
chr1 11130992 11131108 chr1 11131583 11131618 1
hmm,, better you can attach your file ( save it as .txt format ) and click "Go Advanced"
choose the "manage attachements" and attach your file
Hi,
I have attached the txt file...it contains all the details
1 Like
save the below code as a.awk
!($2 in a){printf("%s %s ",$1,$2);a[$2]}
!($3 in b){printf("%s ",$3);b[$3]}
!($5 in c){printf("%s %s ",$4,$5);c[$5]}
!($6 in d){printf("%s %s ",$5,$6);d[$6]}
!($7 in e){printf("%s ",$7);e[$7]}
!($9 in f){printf("%s %s ",$8,$9);f[$9]}
!($10 in g){printf("%s %s",$10,$11);g[$10]}
{printf("\n")}
execute the awk command by
awk -f a.awk input.txt
$
$ cat input
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11131908 11132010 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11131908 11132010 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11130992 11131108 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11130992 11131108 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11128311 11128447 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11128311 11128447 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11130630 11130711 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11130630 11130711 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11130729 11130979 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11130729 11130979 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11131263 11131553 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11131263 11131553 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11131587 11131709 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11131587 11131709 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11132034 11132488 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11132034 11132488 chr1 11131583 11131618 1
$
$
$ perl -F"\t" -lane 'for ($i=0; $i<=$#F; $i++) {
if (not defined $tokens{$i.":".$F[$i]}) {push @x, $F[$i]}
else {push @x, ""}
}
if (join("",@x) ne "") {
for ($i=0; $i<=$#x; $i++) { $line .= sprintf ("%-10s", $x[$i]) }
print $line;
}
for ($i=0; $i<=$#F; $i++) { $tokens{$i.":".$F[$i]}++ };
$line = ""; @x = ();
' input
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 5
11131583 11131618 1
11131908 11132010
11130992 11131108
11128311 11128447
11130630 11130711
11130729 11130979
11131263 11131553
11131587 11131709
11132034 11132488
$
$
$
tyler_durden
pamu
October 30, 2012, 10:02am
18
try
awk '{for(i=1;i<=NF;i++){if(X[$i,i]++){$i=""}}}1' OFS="\t" file
Doesn't work...I have attached the result as output.txt. Kindly have a look.
pamu
October 30, 2012, 10:20am
20
Your expected output doesn't replicate what you say.
please look
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 5
chr1 11131908 11132010 chr1 11131583 11131618 1
chr1 11130992 11131108
chr1 11128311 11128447
duplicate lines in column 1,2,3,5,6,7,8,9,10 should be removed while those that are not duplicate lines should be retained.
1) From column 6 and 7 only 4 lines are printed in expected output.(you can see there few more)
2) See red chr1 this also duplicates.(why they are printed.
3) And if you don't want to consider column 4 it should be present for all the lines right.?
Assuming you don't want consider column 4 for duplicates.
$ cat file
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11131908 11132010 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11131908 11132010 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11130992 11131108 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11130992 11131108 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11128311 11128447 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11128311 11128447 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11130630 11130711 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11130630 11130711 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11130729 11130979 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11130729 11130979 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11131263 11131553 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11131263 11131553 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11131587 11131709 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11131587 11131709 chr1 11131583 11131618 1
chr1 11127067 11132181 89 chr1 11132034 11132488 chr1 11130990 11131025 5
chr1 11127067 11132181 89 chr1 11132034 11132488 chr1 11131583 11131618 1
$ awk '{for(i=1;i<=NF;i++){if((X[$i,i]++) && i!=4){$i=""}}}1' OFS="\t" file
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 5
89 11131583 11131618 1
89 11131908 11132010
89
89 11130992 11131108
89
89 11128311 11128447
89
89 11130630 11130711
89
89 11130729 11130979
89
89 11131263 11131553
89
89 11131587 11131709
89
89 11132034 11132488
89
And considering all the columns..
]$ awk '{for(i=1;i<=NF;i++){if(X[$i,i]++){$i=""}}}1' OFS="\t" file
chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 5
11131583 11131618 1
11131908 11132010
11130992 11131108
11128311 11128447
11130630 11130711
11130729 11130979
11131263 11131553
11131587 11131709
11132034 11132488
Hope this helps you:)
pamu