It means 1st columns of a line should be appended to that of next line. And in front of that common of these two lines should be printed. First white space is tab and subsequent single spaces in each line. Common word may be anywhere in line, like ctg_6843 is in 5th column in 3rd line.
Compare line1 with line 2 and take out the common
Compare line 2 with line 3 and take out tthe common
Compare line 3 with line 4 and take out the common
-- - -- -- -- -
Compare line (n-1) with line n and take out the common
First field of every line is unique and it is tab separated from rest of the line, so in awk u can declare an array a[$1]=$2 with FS="\t". So the only problem is to compare $2 of two adjacent lines.
Now I want to print out
first field of line 1 and line 2 and the common
first field of line 2 and line 3 and the common
-- - -- --
first field of line (n-1) and line n and the common
Hence the output will be like this
PFA0165c PFA0335w ctg_6843
PFA0335w PFA0155c ctg_6843 ctg_6871
I think I understand now, for any given line, you want to print the first element, followed by the first element of the line below, followed by any items common to both lines - right?