Hi Experts,
Please check the following new requirement. I got data like the following in a file.
FILE_HEADER
01cbbfde7898410| 3477945| home| 1
01cbc275d2c122| 3478234| WORK| 1
01cbbe4362743da| 3496386| Rich Spare| 1
01cbc275d2c122| 3478234| WORK| 1
This is pipe separated file with column 2,3 as key columns. The file should be formatted to the following output files
1) All records other than the duplicates
FILE_HEADER
01cbbfde7898410| 3477945| home| 1
01cbbe4362743da| 3496386| Rich Spare| 1
2) The dupicate key file
3478234| WORK
Any thoughts on this.:wall::wall:
Note:- The 'FILE_HEADER' should be there in the first file.
jimmymj
2
Hi Tinu,
Please try out the below command
sort -t "|" +1 -3 test_file |uniq -u
or
sed '1d' test_file|sort -t "|" +1 -3|uniq -u (removing the header line)
~jimmy
jimmymj
3
see a better solution for unique and duplicate records from a file:
sed '1d' $FILE1 | sort -t "|" +1 -3 > temp1
cat temp1 | awk -F"|" ' BEGIN{a=0}{a++; b[a]=$2$3; c[a]=$0} \
END { for(i=0; i<=a; ++i) if(b[i+1]==b) print c"\n"c[i+1]}'|uniq >temp2
cat temp1 temp2 > temp3
sort temp3 | uniq -u > temp4
echo $HEADER > $FILE1
cat temp4 >> $FILE1
Another one with awk:
awk -F\| 'END {
for (i = 1; ++i <= NR;) {
split(d, t)
if (c[t[2], t[3]] > 1) {
if (!s[t[2], t[3]]++)
print t[2], t[3] > dups
}
else
print d > uniq
}
}
NR == 1 {
print > dups
print > uniq
next
}
{
c[$2, $3]++; d[NR] = $0
}' OFS=\| dups=dups.txt uniq=uniq.txt infile