uniq and grep help

Linux-wannabe · November 16, 2009, 10:37pm

linux-wannabe

Sorry if this seems easy to some here-but my experiments are not working.

Say I have a file with duplicate records-how can I grep out the occurences of that file into another file. I have tried grep -F -f output original > answer and this fails.

Here is how my original file looks.

A10001,BOXA01
A10002,BOXA01
A10003,BOXA01
A10004,BOXA01
A10001,BOXA01
A10003,BOXA04

the database program I am using will take out duplicates (the first field duplicate), so I will be missing A10001 and A10003 for two labels.

(Audit tapes using a scangun and how do we know if it is a true duplicate tape or a duplicated scan-duplicates only come into play every once in a while-but more times that not when destroying tapes or moving to another location-hence the location of a box#)

I used to do something like this.

cut -c1-6 original > output
uniq -d output > duplicates

grep 'A10001' original
produces the result (output to crt-gave a 0)!!!!
A10001,BOXA01
A10001,BOXA01
or
grep 'A10003' original
A10003,BOXA01
A10003,BOXA03

or use Agent Ransack Windows Gui for free to search folders for containing text=try it MythicSoft...

I actually use berkeley utilities for dos and has uniq, grep, cut, and wc ,etc... but if need be I will try to make another "live-cd" for the road-
but this is not working for a large file of say 1000 duplicates and 10000 in original file to find the duplicates and location I need for auditing purposes-

Is there a way to take my duplicate file "known duplicate volsers" and make a batch command to run thru grep and then
redirection?

For example;

grep 'known duplicate volser' original > output

but have grep take the entire file at one time?

Any suggestions-

Like I said Linux-wannabe-I know a little bit about linux and that 8 bits is a byte.!
But to know how magic works and gcc takes more effort!!

:)

rdcwayx · November 16, 2009, 11:53pm

Before provide the solution, I need know, if you got these duplicate records, which one do you need reserve? The first one, the last one, or by other rules?

A10003,BOXA01
A10003,BOXA03

dennis.jacob · November 16, 2009, 11:56pm

Try this:

awk '{ if (arr[$0]==$0) print; else arr[$0]=$0; }'  < original > duplicate

Linux-wannabe · November 17, 2009, 7:25am

rdcwayx:

Before provide the solution, I need know, if you got these duplicate records, which one do you need reserve? The first one, the last one, or by other rules?
A10003,BOXA01
A10003,BOXA03

Need to reserve both-use a software program that will drop
A10003,BOXA03

but lets pretend it is a real object that needs to stay

When I do an inventory I keep a count
50 boxes with 50 tapes = 2500
once we run thru software it becomes 2309 tapes (hence duplicate tapes with same volser)
but I can confirm count
is true with wc -l or opening in excel or original text file.

Thanks.

dennis.jacob · November 17, 2009, 11:02am

One more approch using sort/uniq

sort file | uniq -D