Bash - remove duplicates without sort

locoroco · August 31, 2013, 7:16pm

I need to use bash to remove duplicates without using sort first.

I can not use:

cat file | sort | uniq

But when I use only

cat file | uniq

some duplicates are not removed.

Don_Cragun · August 31, 2013, 8:18pm

Obviously, the simple way to do this is:

sort -u file

but you tell us we can't do that without saying why. Is there a requirement to output lines in the same order they were in in the input file? If so, is it important to keep a particular one of the duplicated lines in the output? Or, do you want every line that had one or more duplicates removed from the output?

In the 2.5 years you've been a member of this forum, there have seen dozens of examples using awk to do this where the 1st duplicated input line is kept or the last duplicated input line is kept, or all duplicated lines are removed. If keeping the same order is important, it is more difficult to keep the last duplicate than it is to keep the 1st duplicate.

So, what are the real requirements?

Jotne · September 1, 2013, 2:15am

Try

awk '!a[$0]++' file

ZeZeUnA.User · September 1, 2013, 4:39am

Hi.
I tried:

echo -e "prova\012zappa\012prova\012quadro\012cesto\012zappa" | awk '!a[$0]++'

and it works :).

Please explain me what does it means:

'!a[$0]++'

Jotne · September 1, 2013, 5:00am

Use google
Or read section 43 here: Famous Awk One-Liners Explained, Part II: Text Conversion and Substitution - good coders code, great reuse