hbar
1
I want to find which pattern or strings have occurred more than one time so that I can remove unnecessary redundancy.
For example:
If I have the sentence:
A quick brown brown fox jumps jumps jumps over the lazy dog
in a file, then I want to know that
- the word "brown" has occurred 2 times.
- the word "jump" has occurred 3 times.
in the above mentioned sentence.
Note that I have no idea which words have been repeated.
So I cannot make a pattern match search.
So I just need to know what are the texts/strings are redundant in a file. Is it possible?
Thanks.
Try:
perl -0ne 'while (/(\w+ )\1+/g){@x=split / /,$&;print "$x[0]: " . ($#x+1) . " times\n"}' file
hbar
3
Sorry I didn't get any output !
Suppose I have a file called test.sh
cat test.sh
gives
abc dfg
ecd xkl mno
abc
dfg asj kllll
jkl p
dfg
o
Now you see 'abc' is repeated in the 1st and 3rd line.
'dfg' is repeated in 1st, 4th, and 5th line.
I may expect to see 'abc' and 'dfg' to be printed out on the screen with highlights in the corresponding lines or something similar.
I have attached the sample file.
Thanks.
abc dfg
ecd xkl mno
abc
dfg asj kllll
jkl p
dfg
o
I thought you need only consecutive repetitions. Try this:
perl -ne 'while (/\w+/g){$c{$&}++};END{for $i (keys %c){print "$i: $c{$i}\n" if $c{$i}>1}}' file
hbar
5
Thanks what if a file contain names like this:
Bat:Ball
Bat:Wicket
Bat:Ball
Bat:Bat
Wicket:Bat
I wish to get "Bat:Ball" to be printed, not the "Bat" or "Ball" individually.
Thanks.
hbar
6
Please some one reply. It seems quite important to me. Thanks.
Try this...
awk '{for(i=1;i<=NF;i++){a[$i]++}}END{for(i in a){if(a>1){print i,a}}}' input_file
--ahamed
perl -ne 'while (/[\w:]+/g){$c{$&}++};END{for $i (keys %c){print "$i: $c{$i}\n" if $c{$i}>1}}' file