How to append strings with whitespace?

Hi,

Need help. This seems simple but I tried many things but failed to get what I wanted. Below is the input file:

Chr1	lnci	exon	83801516	83803251	.	-	.	gene_id"LINC01725";	transcript_id"LINC01725:44";	gene_alias_1"ENSG00000233008";	gene_alias_2"RP11-475O6.1";	gene_alias_3"ENSG00000233008.1";	gene_alias_4"OTTHUMG00000009930.1";	gene_alias_5"ENSG00000233008.5";	gene_alias_6"LINC01725";	gene_alias_7"LOC101927560";	transcript_alias_1"ENST00000457273";	transcript_alias_2"ENST00000457273.1";	transcript_alias_3"RP11-475O6.1-005";	transcript_alias_4"OTTHUMT00000027496.1";	transcript_alias_5"NONHSAT004171";	transcript_alias_6"NR_119374";	transcript_alias_7"ENST00000457273.5";	transcript_alias_8"NR_119374.1";
chr16	lnci	exon	83849907	83850022	.	-	.	gene_id"LINC01725";	transcript_id"LINC01725:44";	gene_alias_1"ENSG00000233008";	gene_alias_2"RP11-475O6.1";	gene_alias_3"ENSG00000233008.1";	gene_alias_4"OTTHUMG00000009930.1";

I need to append each row by adding a whitespace after field id starting from column 9 onwards. The output should be like below:-

Chr1	lnci	exon	83801516	83803251	.	-	.	gene_id "LINC01725";	transcript_id "LINC01725:44";	gene_alias_1 "ENSG00000233008";	gene_alias_2 "RP11-475O6.1";	gene_alias_3 "ENSG00000233008.1";	gene_alias_4 "OTTHUMG00000009930.1";	gene_alias_5 "ENSG00000233008.5";	gene_alias_6 "LINC01725";	gene_alias_7 "LOC101927560";	transcript_alias_1 "ENST00000457273";	transcript_alias_2 "ENST00000457273.1";	transcript_alias_3 "RP11-475O6.1-005";	transcript_alias_4 "OTTHUMT00000027496.1";	transcript_alias_5 "NONHSAT004171";	transcript_alias_6 "NR_119374";	transcript_alias_7 "ENST00000457273.5";	transcript_alias_8 "NR_119374.1";
chr16	lnci	exon	83849907	83850022	.	-	.	gene_id "LINC01725";	transcript_id "LINC01725:44";	gene_alias_1 "ENSG00000233008";	gene_alias_2 "RP11-475O6.1";	gene_alias_3 "ENSG00000233008.1";	gene_alias_4 "OTTHUMG00000009930.1";

Really appreciate your kind help. Thanks

Can you post what you tried?

Also note that I do not see any difference between your input and output examples. It is hard to help without an idea of what your tried. It also is extremely helpful for good answers to include your OS and shell. Thank you.

1 Like

One of the codes that i did :-

sed -e 's/\.*id/& \ /' -e 's/\.*alias_./& \ /' inputfile

It worked in certain columns only.

--- Post updated at 02:33 PM ---

The difference is the "whitespace" before the quote. For instance, in column 9,


gene_id"LINC01725";  ---> gene_id "LINC01725";

I am using MacOS

$ sed -e 's/[^\t]*id/& \ /g' -e 's/[^\t]*alias_./& \ /g' file
Chr1    lnci    exon    83801516        83803251        .       -       .       gene_id  "LINC01725";   transcript_id  "LINC01725:44";    gene_alias_1  "ENSG00000233008";        gene_alias_2  "RP11-475O6.1";   gene_alias_3  "ENSG00000233008.1";gene_alias_4  "OTTHUMG00000009930.1";   gene_alias_5  "ENSG00000233008.5";      gene_alias_6  "LINC01725";      gene_alias_7  "LOC101927560";     transcript_alias_1  "ENST00000457273";  transcript_alias_2  "ENST00000457273.1";        transcript_alias_3  "RP11-475O6.1-005";   transcript_alias_4  "OTTHUMT00000027496.1";     transcript_alias_5  "NONHSAT004171";    transcript_alias_6  "NR_119374";  transcript_alias_7  "ENST00000457273.5";        transcript_alias_8  "NR_119374.1";
chr16   lnci    exon    83849907        83850022        .       -       .       gene_id  "LINC01725";   transcript_id  "LINC01725:44";    gene_alias_1  "ENSG00000233008";        gene_alias_2  "RP11-475O6.1";   gene_alias_3  "ENSG00000233008.1";gene_alias_4  "OTTHUMG00000009930.1";
1 Like

Sorry, it still did not work on my actual data. It worked for "transcript_alias_#" but it created 2 whitespaces. I just need 1 whitespace. and It did not work for "gene_alias_#" at all. Also, it created whitespaces at wrong location. for instance,

gene_alias_1  0"LOC101928035";   it supposed to be gene_alias_10 "LOC101928035";

thanks

Looks like you want to prefix every double quoted string with a space. How far would

sed 's/"[^"]*"/ &/g' file

get you, provided the double quotes certainley, reliably appear in pairs?

1 Like

It worked like a charm!! Thanks a million. :slight_smile: