Bash - remove duplicates without sort

I need to use bash to remove duplicates without using sort first.

I can not use:

cat file | sort | uniq 

But when I use only

cat file | uniq 

some duplicates are not removed.

Obviously, the simple way to do this is:

sort -u file

but you tell us we can't do that without saying why. Is there a requirement to output lines in the same order they were in in the input file? If so, is it important to keep a particular one of the duplicated lines in the output? Or, do you want every line that had one or more duplicates removed from the output?

In the 2.5 years you've been a member of this forum, there have seen dozens of examples using awk to do this where the 1st duplicated input line is kept or the last duplicated input line is kept, or all duplicated lines are removed. If keeping the same order is important, it is more difficult to keep the last duplicate than it is to keep the 1st duplicate.

So, what are the real requirements?

Try

awk '!a[$0]++' file

Hi.
I tried:

echo -e "prova\012zappa\012prova\012quadro\012cesto\012zappa" | awk '!a[$0]++'

and it works :).

Please explain me what does it means:

'!a[$0]++'

Use google
Or read section 43 here: Famous Awk One-Liners Explained, Part II: Text Conversion and Substitution - good coders code, great reuse

1 Like