Extract duplicate fields in rows

anhtt · November 28, 2007, 5:07am

I have a input file with formating:

6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

Each fields is seperated by semi-comma. Sometime, the second files is duplicated. So I'd like to extract all the lines which have duplicated second field to a new file. Example for output file:

2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

How to do this with awk or shell script ?
Other programming, I don't know.

Franklin52 · November 28, 2007, 7:43am

Try:

sort -t ';' -k 2,2 | awk 'dat==$2{print $0}{dat=$2}' file

Regards

radoulov · November 28, 2007, 7:46am

Cannot find a better solution right now

awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file

Use nawk or /usr/xpg4/bin/awk on Solaris.

summer_cherry · November 29, 2007, 12:47am

hi

code:

awk 'BEGIN{FS=" ;"}
{
if (temp=="")
{
	temp=$2
	t_line=$0
}
else if (temp==$2)
{
	print t_line
	print $0
	temp=""
	t_line=""
}
else
{
	temp=$2
	t_line=$0
}
}' filename

radoulov · November 29, 2007, 3:40am

summer_cherry,
I meant this (nonconsecutive duplicates):

$ cat file
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
$ awk 'BEGIN{FS=" ;"}                                       
{
if (temp=="")
{
temp=$2
t_line=$0
}
else if (temp==$2)
{
print t_line
print $0
temp=""
t_line=""
}
else
{
temp=$2
t_line=$0
}
}' file
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
$ awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file
6000000901 ;36200103 ;h3a01f496 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;

anhtt · December 1, 2007, 12:36pm

This is my script code
It's very simple shell script

cut -f2 -d";" $1 > /tmp/mdn1
sort /tmp/mdn1 | uniq -d > /tmp/mdn2
cat /tmp/mdn2 | while read line;
do
echo $line > /tmp/mdn3
x=`cut -f1 -d" " /tmp/mdn3`
echo $x
y=`grep "$x" "$1"`
echo $y >> duplicate
done
rm -f /tmp/mdn*

$1 is the input file and duplicate is the output file.

summer_cherry · December 2, 2007, 8:58pm

Hi,

I am not quiet sure whether your output should be in the same sequence as original file.

If not, why not sorting it first and then use my awk code.

6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;

sort +1 file:

6000000901 ;36200103 ;h3a01f496 ;
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

Then it is ok to use my awk code.