anhtt
November 28, 2007, 5:07am
1
I have a input file with formating:
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
Each fields is seperated by semi-comma. Sometime, the second files is duplicated. So I'd like to extract all the lines which have duplicated second field to a new file. Example for output file:
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
How to do this with awk or shell script ?
Other programming, I don't know.
Try:
sort -t ';' -k 2,2 | awk 'dat==$2{print $0}{dat=$2}' file
Regards
Cannot find a better solution right now
awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file
Use nawk or /usr/xpg4/bin/awk on Solaris.
hi
code:
awk 'BEGIN{FS=" ;"}
{
if (temp=="")
{
temp=$2
t_line=$0
}
else if (temp==$2)
{
print t_line
print $0
temp=""
t_line=""
}
else
{
temp=$2
t_line=$0
}
}' filename
summer_cherry,
I meant this (nonconsecutive duplicates):
$ cat file
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
$ awk 'BEGIN{FS=" ;"}
{
if (temp=="")
{
temp=$2
t_line=$0
}
else if (temp==$2)
{
print t_line
print $0
temp=""
t_line=""
}
else
{
temp=$2
t_line=$0
}
}' file
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
$ awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file
6000000901 ;36200103 ;h3a01f496 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
anhtt
December 1, 2007, 12:36pm
6
This is my script code
It's very simple shell script
cut -f2 -d";" $1 > /tmp/mdn1
sort /tmp/mdn1 | uniq -d > /tmp/mdn2
cat /tmp/mdn2 | while read line;
do
echo $line > /tmp/mdn3
x=`cut -f1 -d" " /tmp/mdn3`
echo $x
y=`grep "$x" "$1"`
echo $y >> duplicate
done
rm -f /tmp/mdn*
$1 is the input file and duplicate is the output file.
Hi,
I am not quiet sure whether your output should be in the same sequence as original file.
If not, why not sorting it first and then use my awk code.
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
sort +1 file:
6000000901 ;36200103 ;h3a01f496 ;
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
Then it is ok to use my awk code.