Remove duplicates and update last 2 digits of the original row with 0's

Hi,

I have a requirement where I have to remove duplicates from a file based on the first 8 chars (It is fixed width file of 10 chars length) and whenever a duplicate row is found, its original row's last 2 chars should be updated to all 0's.

I thought of using

sort -u -k 1.1,1.8 inputfile

but that will give me the result after remove duplicates and with the original last digits as is for the duplicate records

here is the sample input and output

Any help in achieving the above result using either awk/sed will be greatly appreciated.

Thanks,
Faraway

Try:

awk '{a[substr($0,1,8)]++;b[substr($0,1,8)]=$0}END{for (i in a){if (a>1) {print i"00"}else print b}}' file
1 Like

Hi,

Try this one,

awk '{k=substr($0,1,8);if(a[k]){a[k]=k"00";next;}a[k]=$0;}END{for(i in a)print a;}' file

Cheers,
Ranga:-)

1 Like

Assuming that whitespace does not occur in those 10 characters:

sed 's/\(.*\)\(..\)/\2 \1/' file | sort -k2,2 | uniq -cf1 | awk '$1>1 {$2="00"} {print $3$2}'

Regards,
Alister

Thank you....it works as expected