Hi all
I have file with columns
F3 pathway CPS
F2
H2
H4
H5
H6 no pathway CMP
H7
H8
H9
H10
My expected output is
F3 pathway CPS
F2 pathway CPS
H2 pathway CPS
H4 pathway CPS
H5 pathway CPS
H6 no pathway CMP
H7 no pathway CMP
H8 no pathway CMP
H9 no pathway CMP
H10 no pathway CMP
Kindly guide me
Yoda
November 8, 2012, 11:38pm
2
awk ' { if($2 !~ /^ *$/) { a=substr($0,length($1)+1,index($0,$NF)-length($NF)); b=$NF; } print $1, a, b } ' file
Hi
awk 'NF>1{x=substr($0,index($0," "));}{print $1 x;}' file
Guru.
Hi
Thanks for reply.
It seems my file is complex so I am attaching here in this file rows of 2nd and 3rd column should be replicated.
u represents fourth columns and t represents fifth column aonwards and others.
Please guide me
Hi
With the solutions provided to you till now, show us what have you tried?
Guru.
Hi
My output is attached fle! which is not as expected.
Thanks I got it now myself
Hi all
I am getting slight eror in my out put
My input file is
CALR Antigen processing and presentation CPSab
KIR2DL5A
KIR2DS1
KIR2DS2
KIR2DS3
KIR2DS5
PSME1
PSME2
PTK2 Aspirin Blocks Signaling Pathway Involved in Platelet Activation CPSab
SYK
PIK3C2G CCR3 signaling in Eosinophils CPSab
PTK2
CHUK CD40L Signaling Pathway CPSab
DUSP1
IKBKAP
MAP3K1
TRAF6
CCNE1 CDK Regulation of DNA Replication CPSab
KITLG
MCM5
ORC4L
PIK3C2G CXCR4 Signaling Pathway CPSab
PTK2
CCNE1 Cyclin E Destruction Pathway CPSab
CDC34
TFDP1
CCNE1 Cyclins and Cell Cycle Regulation CPSab
CCNH
CDC2
TFDP1
ACVR2A Cytokine-cytokine receptor interaction CPSab
AMH
But My output is
CALR Antigen processing and presentation CPSab
KIR2DL5A Antigen processing and presentation CPSab
KIR2DS1 Antigen processing and presentation CPSab
KIR2DS2 Antigen processing and presentation CPSab
KIR2DS3 Antigen processing and presentation CPSab
KIR2DS5 Antigen processing and presentation CPSab
PSME1 Antigen processing and presentation CPSab
PSME2 Antigen processing and presentation CPSab
PTK2 Aspirin Blocks Signaling Pathway Involved in Platelet Activation CPSab
SYK Aspirin Blocks Signaling Pathway Involved in Platelet Activation CPSab
PIK3C2G CCR3 signaling in Eosinophils CPS CPSab
PTK2 CCR3 signaling in Eosinophils CPS CPSab
CHUK CD40L Signaling Pathway CPSab
DUSP1 CD40L Signaling Pathway CPSab
IKBKAP CD40L Signaling Pathway CPSab
MAP3K1 CD40L Signaling Pathway CPSab
TRAF6 CD40L Signaling Pathway CPSab
CCNE1 CDK Regulation of DNA Replication C CPSab
KITLG CDK Regulation of DNA Replication C CPSab
MCM5 CDK Regulation of DNA Replication C CPSab
ORC4L CDK Regulation of DNA Replication C CPSab
PIK3C2G CXCR4 Signaling Pathway CPS CPSab
PTK2 CXCR4 Signaling Pathway CPS CPSab
CCNE1 Cyclin E Destruction Pathway C CPSab
CDC34 Cyclin E Destruction Pathway C CPSab
TFDP1 Cyclin E Destruction Pathway C CPSab
CCNE1 Cyclins and Cell Cycle Regulation C CPSab
CCNH Cyclins and Cell Cycle Regulation C CPSab
CDC2 Cyclins and Cell Cycle Regulation C CPSab
TFDP1 Cyclins and Cell Cycle Regulation C CPSab
using code
awk ' { if($2 !~ /^ *$/) { a=substr($0,length($1)+1,index($0,$NF)-length($NF)); b=$NF; } print $1, a, b } ' BDchangeoutfile.txt >BDchangeoutfile2.txt
In out put third column repeatition is bit wired it is not cpying preoperly as mentioned in the above row till the next row appears . Please check it.
Looking more closely at the input file you provided in message #8 in this thread, I see that your field separators are exactly four spaces and the separators are present in all input ilnes even if the contents of fields 2 and 3 are empty strings. This makes the logic needed to get the output you want much simpler:
awk -F " " 'BEGIN { OFS = " "}
{ if($2) f2 = $2
else $2 = f2
if($3) f3 = $3
else $3 = f3
print
}' BDchangeoutfile.txt >BDchangeoutfile2.txt
and produces the output:
KIR2DL5A Antigen processing and presentation CPSab
KIR2DS1 Antigen processing and presentation CPSab
KIR2DS2 Antigen processing and presentation CPSab
KIR2DS3 Antigen processing and presentation CPSab
KIR2DS5 Antigen processing and presentation CPSab
PSME1 Antigen processing and presentation CPSab
PSME2 Antigen processing and presentation CPSab
PTK2 Aspirin Blocks Signaling Pathway Involved in Platelet Activation CPSab
SYK Aspirin Blocks Signaling Pathway Involved in Platelet Activation CPSab
PIK3C2G CCR3 signaling in Eosinophils CPSab
PTK2 CCR3 signaling in Eosinophils CPSab
CHUK CD40L Signaling Pathway CPSab
DUSP1 CD40L Signaling Pathway CPSab
IKBKAP CD40L Signaling Pathway CPSab
MAP3K1 CD40L Signaling Pathway CPSab
TRAF6 CD40L Signaling Pathway CPSab
CCNE1 CDK Regulation of DNA Replication CPSab
KITLG CDK Regulation of DNA Replication CPSab
MCM5 CDK Regulation of DNA Replication CPSab
ORC4L CDK Regulation of DNA Replication CPSab
PIK3C2G CXCR4 Signaling Pathway CPSab
PTK2 CXCR4 Signaling Pathway CPSab
CCNE1 Cyclin E Destruction Pathway CPSab
CDC34 Cyclin E Destruction Pathway CPSab
TFDP1 Cyclin E Destruction Pathway CPSab
CCNE1 Cyclins and Cell Cycle Regulation CPSab
CCNH Cyclins and Cell Cycle Regulation CPSab
CDC2 Cyclins and Cell Cycle Regulation CPSab
TFDP1 Cyclins and Cell Cycle Regulation CPSab
ACVR2A Cytokine-cytokine receptor interaction CPSab
AMH Cytokine-cytokine receptor interaction CPSab
msabhi
November 17, 2012, 2:32pm
10
Can you try this? This shall work in any worst case too..
perl -alne '{if($#F>0){@a=@F[1..$#F];} print "$F[0] @a";}' input
ctsgnb
November 17, 2012, 5:18pm
11
awk 'NF==1{sub($1,$1 x)}NF>1{x=$0;sub($1,z,x)}1' yourfile