Help with replace duplicate content

Input file:

CCNI	data564_input1	264
CORO1A	data564_input2	155
ABC-B	data17_input1	3466
ABC-B	data17_input2	1133
ABC-B	data17_input3	2162
ABC-B	data17_input4	2019
HNRNPA2B1	data95_input1	101
HNRNPA2B1	data95_input2	340
IFITM1	data105_input2	291
IFITM2	data105_input1	505
MYL12A	data352_input2	212
MYL12B	data352_input1	131
MYL12B	data352_input3	76

Desired output file:

CCNI	data564_input1	264
CORO1A	data564_input2	155
ABC-B	data17_input1	3466
	data17_input2	1133
	data17_input3	2162
	data17_input4	2019
HNRNPA2B1	data95_input1	101
		data95_input2	340
IFITM1	data105_input2	291
IFITM2	data105_input1	505
MYL12A	data352_input2	212
MYL12B	data352_input1	131
	data352_input3	76

A tab delimiter "\t" is located in between each column.
I would like to replace the those duplicate content in column 1 with empty.
Thanks for any advice.

$ nawk '{print $1}' test | sort -u | while read a; do grep $a test | nawk '{if(NR>1){printf("\t%s\t%s\n",$2,$3)}else{print $0}}'; done            
ABC-B   data17_input1   3466
        data17_input2   1133
        data17_input3   2162
        data17_input4   2019
CCNI    data564_input1  264
CORO1A  data564_input2  155
HNRNPA2B1       data95_input1   101
        data95_input2   340
IFITM1  data105_input2  291
IFITM2  data105_input1  505
MYL12A  data352_input2  212
MYL12B  data352_input1  131
        data352_input3  76

---------- Post updated at 02:06 PM ---------- Previous update was at 02:03 PM ----------

in the above example test is the input file

1 Like

here is your code :slight_smile:

first="   "
while read line
do
        first2=$( echo $line | awk -F' ' '{print $1}' )
        if [[ "$first" == "$first2" ]]
        then
                gg=$( echo "$line" | awk -F' ' '{print $2"        "$3}' )
                echo "         "$gg
        else
                echo $line
        fi
        first=$( echo $line | awk -F' ' '{print $1}' )
done < infile
1 Like

try this

nawk '{y=x;x=$1}x==y{sub($1,"")}1' yourfile
1 Like