Hi,
I have a requirement where I have to identify duplicates from a file based on the first 6 chars (It is fixed width file of 12 chars length) and whenever a duplicate row is found, its original and duplicate row's last 2 chars should be updated to all 0's if they are not same. (I mean last 2 digits of original and duplicate row should be same, if not then default to 00 else keep them as is)
I thought of using
sort -u -k 1.1,1.6 inputfile
and then manipulating the output but I am stuck...
here is the sample input and output
input:
1251233Y1234
1221249N8821
1231116Y9945
1231113Y2123
1231109Y3212
1231123N1214
1231126N1214
output should be:
1251233Y1234
1221249N8821
1231116Y9900
1231113Y2100
1231109N3212
1231123N1214
1231126N1214 (Since last 2 digits are same nothing changed)
Any help in achieving the above result using either awk/sed will be greatly appreciated.
Thanks,
Faraway
binlib
June 15, 2012, 1:26pm
2
sort -k1.1,1.6 inputfile |awk '
{
if (substr($0,1,6) == substr(x,1,6) &&
substr($0,11,2) != substr(x,11,2)) {
sub(/..$/, "00", x)
sub(/..$/, "00")
}
if (x) print x
x = $0
}
END { if (x) print x }'
Alternatively try:
awk '{k=substr($0,1,6)} NR==FNR{A[k]++; next} A[k]>1{sub(/..$/,"00")}1' input input
( sic, input is 2x )
# awk '{x0=$0;;x=split($1,a,"");xlast=substr($1,x-1,2);x1=substr($1,1,6);if(x2==x1){if(xlast!=x2last){
if(zz<1){print substr(x00,1,x-2)"00" RS substr(x0,1,x-2)"00";z=0;zz++;}}else{zz=0;if(z==0)print x00;
if(z==1)print x00 RS x0;z=1}}else{if(z!=0)print x00;z=1;zz=0};x00=$0;x2last=substr(x00,x-1,2);x2=substr($1,1,6);}' infile
1251233Y1234
1221249N8821
1231116Y9900
1231113Y2100
1231109Y3212
1231123N1214
1231126N1214
another file
# cat try2
1251233Y1234
1251234Y1235
1221249N8821
1231116Y9945
1231113Y2123
1231109Y3212
1231123N1214
1231126N1215
1231127N1216
1231128N1216
1231129N1218
12311X7N1217
12311X8N1217
# awk '{x0=$0;;x=split($1,a,"");xlast=substr($1,x-1,2);x1=substr($1,1,6);if(x2==x1){if(xlast!=x2last){
if(zz<1){print substr(x00,1,x-2)"00" RS substr(x0,1,x-2)"00";z=0;zz++;}}else{zz=0;if(z==0)print x00;
if(z==1)print x00 RS x0;z=1}}else{if(z!=0)print x00;z=1;zz=0};x00=$0;x2last=substr(x00,x-1,2);x2=substr($1,1,6);}' try2
1251233Y1200
1251234Y1200
1221249N8821
1231116Y9900
1231113Y2100
1231109Y3212
1231123N1200
1231126N1200
1231127N1216
1231128N1200
1231129N1200
12311X7N1217
12311X8N1217
note: code checks the one-to-one method on the all lines (previous records (print) if not "00" duplicate counts)
regards
ygemici