Remove duplicates and update last 2 digits of the original row with 0's

farawaydsky · June 13, 2012, 2:57pm

Hi,

I have a requirement where I have to remove duplicates from a file based on the first 8 chars (It is fixed width file of 10 chars length) and whenever a duplicate row is found, its original row's last 2 chars should be updated to all 0's.

I thought of using

sort -u -k 1.1,1.8 inputfile

but that will give me the result after remove duplicates and with the original last digits as is for the duplicate records

here is the sample input and output

Any help in achieving the above result using either awk/sed will be greatly appreciated.

Thanks,
Faraway

bartus11 · June 13, 2012, 3:02pm

Try:

awk '{a[substr($0,1,8)]++;b[substr($0,1,8)]=$0}END{for (i in a){if (a>1) {print i"00"}else print b}}' file

rangarasan · June 13, 2012, 3:32pm

Hi,

Try this one,

awk '{k=substr($0,1,8);if(a[k]){a[k]=k"00";next;}a[k]=$0;}END{for(i in a)print a;}' file

Cheers,
Ranga:-)

alister · June 13, 2012, 5:00pm

Assuming that whitespace does not occur in those 10 characters:

sed 's/\(.*\)\(..\)/\2 \1/' file | sort -k2,2 | uniq -cf1 | awk '$1>1 {$2="00"} {print $3$2}'

Regards,
Alister

farawaydsky · June 14, 2012, 3:52pm

Thank you....it works as expected