Print the row element till the next row element appear in a column

Hi all

I have file with columns

F3       pathway      CPS
F2
H2
H4
H5
H6       no pathway    CMP
H7
H8
H9
H10

My expected output is

F3       pathway      CPS
F2        pathway      CPS
H2        pathway      CPS
H4       pathway      CPS
H5       pathway      CPS
H6       no pathway    CMP
H7       no pathway    CMP
H8       no pathway    CMP
H9       no pathway    CMP
H10      no pathway    CMP

Kindly guide me

awk ' { if($2 !~ /^ *$/) { a=substr($0,length($1)+1,index($0,$NF)-length($NF)); b=$NF; } print $1, a, b } ' file

Hi

awk 'NF>1{x=substr($0,index($0," "));}{print $1 x;}' file

Guru.

Hi

Thanks for reply.

It seems my file is complex so I am attaching here in this file rows of 2nd and 3rd column should be replicated.

u represents fourth columns and t represents fifth column aonwards and others.

Please guide me

Hi
With the solutions provided to you till now, show us what have you tried?

Guru.

Hi

My output is attached fle! which is not as expected.:confused:

Thanks I got it now myself

Hi all

I am getting slight eror in my out put

My input file is

CALR    Antigen processing and presentation    CPSab
KIR2DL5A        
KIR2DS1        
KIR2DS2        
KIR2DS3        
KIR2DS5        
PSME1        
PSME2        
PTK2    Aspirin Blocks Signaling Pathway Involved in Platelet Activation    CPSab
SYK        
PIK3C2G    CCR3 signaling in Eosinophils    CPSab
PTK2        
CHUK    CD40L Signaling Pathway    CPSab
DUSP1        
IKBKAP        
MAP3K1        
TRAF6        
CCNE1    CDK Regulation of DNA Replication    CPSab
KITLG        
MCM5        
ORC4L        
PIK3C2G    CXCR4 Signaling Pathway    CPSab
PTK2        
CCNE1    Cyclin E Destruction Pathway    CPSab
CDC34        
TFDP1        
CCNE1    Cyclins and Cell Cycle Regulation    CPSab
CCNH        
CDC2        
TFDP1        
ACVR2A    Cytokine-cytokine receptor interaction    CPSab
AMH        

But My output is

CALR     Antigen processing and presentation     CPSab
KIR2DL5A     Antigen processing and presentation     CPSab
KIR2DS1     Antigen processing and presentation     CPSab
KIR2DS2     Antigen processing and presentation     CPSab
KIR2DS3     Antigen processing and presentation     CPSab
KIR2DS5     Antigen processing and presentation     CPSab
PSME1     Antigen processing and presentation     CPSab
PSME2     Antigen processing and presentation     CPSab
PTK2     Aspirin Blocks Signaling Pathway Involved in Platelet Activation     CPSab
SYK     Aspirin Blocks Signaling Pathway Involved in Platelet Activation     CPSab
PIK3C2G     CCR3 signaling in Eosinophils    CPS CPSab
PTK2     CCR3 signaling in Eosinophils    CPS CPSab
CHUK     CD40L Signaling Pathway     CPSab
DUSP1     CD40L Signaling Pathway     CPSab
IKBKAP     CD40L Signaling Pathway     CPSab
MAP3K1     CD40L Signaling Pathway     CPSab
TRAF6     CD40L Signaling Pathway     CPSab
CCNE1     CDK Regulation of DNA Replication    C CPSab
KITLG     CDK Regulation of DNA Replication    C CPSab
MCM5     CDK Regulation of DNA Replication    C CPSab
ORC4L     CDK Regulation of DNA Replication    C CPSab
PIK3C2G     CXCR4 Signaling Pathway    CPS CPSab
PTK2     CXCR4 Signaling Pathway    CPS CPSab
CCNE1     Cyclin E Destruction Pathway    C CPSab
CDC34     Cyclin E Destruction Pathway    C CPSab
TFDP1     Cyclin E Destruction Pathway    C CPSab
CCNE1     Cyclins and Cell Cycle Regulation    C CPSab
CCNH     Cyclins and Cell Cycle Regulation    C CPSab
CDC2     Cyclins and Cell Cycle Regulation    C CPSab
TFDP1     Cyclins and Cell Cycle Regulation    C CPSab

using code

awk ' { if($2 !~ /^ *$/) { a=substr($0,length($1)+1,index($0,$NF)-length($NF)); b=$NF; } print $1, a, b } ' BDchangeoutfile.txt >BDchangeoutfile2.txt

In out put third column repeatition is bit wired it is not cpying preoperly as mentioned in the above row till the next row appears . Please check it.

Looking more closely at the input file you provided in message #8 in this thread, I see that your field separators are exactly four spaces and the separators are present in all input ilnes even if the contents of fields 2 and 3 are empty strings. This makes the logic needed to get the output you want much simpler:

awk -F "    " 'BEGIN {  OFS = "    "}
{       if($2)  f2 = $2
        else    $2 = f2
        if($3)  f3 = $3
        else    $3 = f3
        print
}' BDchangeoutfile.txt >BDchangeoutfile2.txt

and produces the output:

KIR2DL5A    Antigen processing and presentation    CPSab
KIR2DS1    Antigen processing and presentation    CPSab
KIR2DS2    Antigen processing and presentation    CPSab
KIR2DS3    Antigen processing and presentation    CPSab
KIR2DS5    Antigen processing and presentation    CPSab
PSME1    Antigen processing and presentation    CPSab
PSME2    Antigen processing and presentation    CPSab
PTK2    Aspirin Blocks Signaling Pathway Involved in Platelet Activation    CPSab
SYK    Aspirin Blocks Signaling Pathway Involved in Platelet Activation    CPSab
PIK3C2G    CCR3 signaling in Eosinophils    CPSab
PTK2    CCR3 signaling in Eosinophils    CPSab
CHUK    CD40L Signaling Pathway    CPSab
DUSP1    CD40L Signaling Pathway    CPSab
IKBKAP    CD40L Signaling Pathway    CPSab
MAP3K1    CD40L Signaling Pathway    CPSab
TRAF6    CD40L Signaling Pathway    CPSab
CCNE1    CDK Regulation of DNA Replication    CPSab
KITLG    CDK Regulation of DNA Replication    CPSab
MCM5    CDK Regulation of DNA Replication    CPSab
ORC4L    CDK Regulation of DNA Replication    CPSab
PIK3C2G    CXCR4 Signaling Pathway    CPSab
PTK2    CXCR4 Signaling Pathway    CPSab
CCNE1    Cyclin E Destruction Pathway    CPSab
CDC34    Cyclin E Destruction Pathway    CPSab
TFDP1    Cyclin E Destruction Pathway    CPSab
CCNE1    Cyclins and Cell Cycle Regulation    CPSab
CCNH    Cyclins and Cell Cycle Regulation    CPSab
CDC2    Cyclins and Cell Cycle Regulation    CPSab
TFDP1    Cyclins and Cell Cycle Regulation    CPSab
ACVR2A    Cytokine-cytokine receptor interaction    CPSab
AMH    Cytokine-cytokine receptor interaction    CPSab

Can you try this? This shall work in any worst case too..

perl -alne '{if($#F>0){@a=@F[1..$#F];} print "$F[0] @a";}' input
awk 'NF==1{sub($1,$1 x)}NF>1{x=$0;sub($1,z,x)}1' yourfile