Highlighting duplicate string on a line

brighty · September 11, 2014, 10:30am

Hi all

I have a grep written to pull out values; below (in the code snip-it) is an example of the output.
What I'm struggling to do, and looking for assistance on, is identifying the lines that have duplicate strings.
For example 74859915K74859915K in the below is 74859915K repeated twice but 32575310100014 is not a whole repeating value so I don't want to see it.

In my head (and what I'm unable to do) I want to do something like count it's length, split it in half and confirm the first half matches the second half... I'm open to suggestions as there may be a better way to do it.

Background - these values are in multiple files within an xml tag <foo></foo>. My grep is extracting them and removing the xml tags with sed leaving just the below output... it's the next step where I want to only have the true dupes.

Many thanks in advance.

74859915K74859915K
0B153858340B15385834
MUNS0-0000000001MUNS0-0000000001
10594556C10594556C
0B982730630B98273063
Q1818002FQ1818002F
78883385D78883385D
44871376D44871376D
B14513386B14513386
016797265C016797265C
0A120861950A12086195
025691290Z025691290Z
31262294G31262294G
B57312068B57312068
16803742B16803742B
723029268723029268
A50470772A50470772
B64841927B64841927
32575310100014
50836566B50836566B
499984

shamrock · September 11, 2014, 12:01pm

Post the original xml file as it may be easier to extract dupes out of it instead of parsing it first with grep and sed...since an xml file delimited with tags like <foo> maybe easier to parse for dupes...

brighty · September 11, 2014, 12:04pm

Apologizes it seems I have double posted some how. This is now solved in the other thread. I can't post the link as I'm not allowed to post links yet.

Edit: that post ticket me up other the 5 post limit for posting links http://www.unix.com/shell-programming-and-scripting/250963-highlighting-duplicate-string-line.html\#post302916707

Neo · September 11, 2014, 12:07pm

Duplicate post. Continue here.