Extract lines that appear twice

LeftoverStew · March 10, 2014, 4:44am

I have a text file that looks like this :

root/user/usr1/0001/abab1*
root/user/usr1/0001/abab2*
root/user/usr1/0002/acac1*
root/user/usr1/0002/acac2*
root/user/usr1/0003/adad1*
root/user/usr1/0004/aeae1*
root/user/usr1/0004/aeae2*

How could I code this to extract just the subjects that appear twice? I originally thought each subject would appear twice, so I just wanted to use the awk 'NR % 2 == 0' command, but that is no longer the case and now I don't know where to start. Help is much appreciated!

anbu23 · March 10, 2014, 4:47am

Which field is subject?

LeftoverStew · March 10, 2014, 4:49am

The 000* series.

anbu23 · March 10, 2014, 4:52am

awk -F'/' ' ++arr[$4] == 2 { print $4 } ' file

LeftoverStew · March 10, 2014, 5:04am

I tried and it didn't seem to work... It just says Unmatched ' ?

anbu23 · March 10, 2014, 5:07am

You might have missed a quote

$ awk -F'/' ' ++arr[$4] == 2 { print $4 } ' file
0001
0002
0004

LeftoverStew · March 10, 2014, 5:13am

Ok I realized I forgot to type an extra ' after F (dumb), but now it returns just usr1 printed once?

anbu23 · March 10, 2014, 5:33am

What did you try? Are you trying to get subjects or users?

SadBunny · March 10, 2014, 7:08am

Do the lines in your testfile actually begin with another slash? Because then $4 would be the /-delimited column with the usr1 and you would need /-delimited column 5.

On another note: do you need anything that appears at least 2 times, or exactly 2 times? Because the example code posted does the first.

shamrock · March 10, 2014, 3:08pm

See if this awk script works for you...

awk -F/ '{x[$4] = (x[$4] ? x[$4] RS $0 : $0); y[$4]++} END {for(i in y) if(y==2) print x}' file