I need to use bash to remove duplicates without using sort first.
I can not use:
cat file | sort | uniq
But when I use only
cat file | uniq
some duplicates are not removed.
I need to use bash to remove duplicates without using sort first.
I can not use:
cat file | sort | uniq
But when I use only
cat file | uniq
some duplicates are not removed.
Obviously, the simple way to do this is:
sort -u file
but you tell us we can't do that without saying why. Is there a requirement to output lines in the same order they were in in the input file? If so, is it important to keep a particular one of the duplicated lines in the output? Or, do you want every line that had one or more duplicates removed from the output?
In the 2.5 years you've been a member of this forum, there have seen dozens of examples using awk to do this where the 1st duplicated input line is kept or the last duplicated input line is kept, or all duplicated lines are removed. If keeping the same order is important, it is more difficult to keep the last duplicate than it is to keep the 1st duplicate.
So, what are the real requirements?
Try
awk '!a[$0]++' file
Hi.
I tried:
echo -e "prova\012zappa\012prova\012quadro\012cesto\012zappa" | awk '!a[$0]++'
and it works :).
Please explain me what does it means:
'!a[$0]++'
Use google
Or read section 43 here: Famous Awk One-Liners Explained, Part II: Text Conversion and Substitution - good coders code, great reuse