Command to remove duplicate lines with perl,sed,awk

cola · October 15, 2010, 7:53pm

Input:

hello hello
hello hello
monkey
donkey
hello hello
drink
dance
drink

Output should be:

hello hello
monkey
donkey
drink
dance

Scott · October 15, 2010, 7:58pm

Hi.

I'm sure we must have gone over this

awk '!_[$0]++' file

kurumi · October 15, 2010, 10:37pm

$ ruby -00 -ne 'puts $_.split("\n").uniq' file
hello hello
monkey
donkey
drink
dance

cola · October 16, 2010, 1:16am

Anybody knows the solution with sed,perl?

bartus11 · October 16, 2010, 3:57am

perl -ne 'print unless $a{$_}++' file

cola · October 16, 2010, 8:53am

Nice.

fpmurphy · October 16, 2010, 10:14am

There is no general solution for sed unless the file is sorted. If sorted, the following deletes the duplicate lines:

EAGL · October 18, 2010, 3:22am

Hello Murphy,

Could you recommend us any page that could explain how the "N, P, D" options work in SED command?

sed '$!N; /^\(.*\)\n\1$/!P; D'

thanks in advance

kurumi · October 18, 2010, 4:04am

man page

ygemici · October 19, 2010, 4:57am

Actually sed has not automate ready functions for this issues..

But for me the sed is still more powerfull others..
i can try to write specific some sed for sed lovers

# cat file
hello hello
hello hello
monkey
donkey
hello hello
drink
vay
dance
drink

# ./fsed.sedv1.uniq file
hello hello
monkey
donkey
drink
vay
dance

# ## fsed-Sedv1-Uniq ##
 
#!/bin/bash
xsed="";sedarr=""
while read -r l
 do
  x=( $( echo $(sed '=' 1 | sed -n 'N;s/\n/ /;p' | sed -n "s/^\(.\).*$l/\1/p") | sed 's/ .*//') );
  xsed=("$xsed $x" )
 done <"$1"
 
fsed=( $(echo ${xsed[@]}|sed 's/ /\n/g' | sed -n '/^1/p'|sed -n '1p') )
sedarr=("$fsed" )
 
for i in ${xsed[@]}
 do
  sedarr=( "$sedarr $( echo ${xsed[@]}|sed 's/ /\n/g' | sed -ne "/^$i/p"| sed -n '1p' | sed -e "/[${sedarr[@]}]/d" )" )
 done
 
for i in ${sedarr[@]}
 do
  sed -n "$i p" "$1"
 done

Little/Big Problem Correction
But I can discover this cant process for that file has 10 or more lines.
I can try to rewrite for this problem.
lets try this..

# cat newfile
hello hello
hello hello
monkey
donkey
hello hello4
drink
dance2
dance
drink4
hello hello1
donkey2
hello hello1
hello hello2
hello hello5
donkey3
donkey2
hello hello3
hello hello3
hello hello5
monkey3
dance3
dance3
monkey3
dance3

# ./fsed.sedv2.uniq newfile
hello hello
monkey
donkey
hello hello4
drink
dance2
dance
drink4
hello hello1
donkey2
hello hello2
hello hello5
donkey3
hello hello3
monkey3
dance3

# ## fsed-Sedv2-Uniq ##
 
#!/bin/bash
xsed="" ;uniq="" ;sedarr="" ;fsed=""
while read -r l
 do
  x=( $( echo $(sed '=' 1 | sed -n 'N;s/\n/ /;p' | sed -n "s/\(.*\) \b$l\b/\1/p")  ) );
  xsed=("$xsed ${x}\b\|" )
 done <"$1"
 
fsed=( $(echo ${xsed[@]}|sed 's/ /\n/g' | sed -n '/^1/p'|sed -n '1p') )
sedar=("\b$fsed" )
 
for i in ${xsed[@]}
 do
  newi=$(echo $i | sed 's/..$//')
  sedar=( $(echo $sedar|sed 's/..$//') )
  sedax=$(echo "${xsed[@]}"|sed 's/ /\n/g' | sed -ne "/^${newi}/p"| sed -n '1p'|sed -e "/${sedar[@]}/d" )
  x=("$(echo ${sedar[@]}|sed 's/\\|/\\b&\\b/g')" )
  sedar=("${x}\|${sedax}" )
 done
 
for i in $(echo ${sedar[@]} | sed 's/[^0-9]/ /g')
 do
  sed -n "$i p" "$1"
 done

PS:there are maybe some bugs!!..I dont guaranteed works wery well(like slow results)

Regards
ygemici

---------- Post updated at 11:57 AM ---------- Previous update was at 11:54 AM ----------

This source is very usefull and very excellent for sed lovers
Thank you Bruce Barnett for this

Sed - An Introduction and Tutorial