Concatenate text file

cmccabe · October 27, 2014, 10:54am

I have a text file in the below format:

chr1	10002681	10002826	LZIC
chr1	10002980	10003083	NMNAT1
chr1	10003485	10003573	NMNAT1
chr1	100111430	100111918	PALMD
chr1	100127874	100127955	PALMD
chr1	100133197	100133322	PALMD
chr1	100152231	100152346	PALMD
chr1	100152485	100152519	PALMD
chr1	100152631	100152745	PALMD
chr1	100154330	100155428	PALMD

Is it possible to concatenate $1":"$2"-"@3 in one column withe gene name next to that for each row?

 
chr1:10002681-10002826	LZIC
chr1:10002980-10003083	NMNAT1
chr1:10003485-10003573	NMNAT1
chr1:100111430-100111918	PALMD
chr1:100127874-100127955	PALMD
chr1:100133197-100133322	PALMD
chr1:100152231-100152346	PALMD
chr1:100152485-100152519	PALMD
chr1:100152631-100152745	PALMD
chr1:100154330-100155428	PALMD

Thanks :).

Akshay_Hegde · October 27, 2014, 11:18am

awk '{print $1 ":" $2 "-" $3 , $4}' file

cmccabe · October 27, 2014, 11:54am

 awk '{print $1 ":" $2 "-" $3 , OFS= /t$4}' file

Would the above concatenate $1,$2,$3 in column 1 and $4 in column 2? Thanks :).

RavinderSingh13 · October 27, 2014, 11:59am

Hello cmccabe,

Following will do the same what you have asked now, Akshay's solution is only printing the seprators in between fields.

awk '{$1=$1":"$2"-"$3;$2=$NF;$3=$NF="";print $0}'  Input_file

Thanks,
R. Singh

Akshay_Hegde · October 27, 2014, 12:03pm

Your syntax is wrong, or else try like this

awk 'NF{$1=sprintf("%s:%s-%s",$1,$2,$3); $2=$4; NF-=2}1'  file

---------- Post updated at 10:33 PM ---------- Previous update was at 10:30 PM ----------

--

ravindersingh13:

Hello cmccabe,

Following will do the same what you have asked now, Akshay's solution is only printing the seprators in between fields.
awk '{$1=$1":"$2"-"$3;$2=$NF;$3=$NF="";print $0}'  Input_file
Thanks,
R. Singh

@Ravinder : $3=$NF="" will not delete fields actually, try your command with OFS=','

RudiC · October 27, 2014, 12:46pm

What you call columns is called fields in awk etc. terms which in turn are separated by field separators (FS). So what will be interpreted as a field depends on the definition of the FS. It defaults to whitespace (space, tab, newline) in awk . With the defaults, your input will have four fields. Should FS be set to e.g. "#" or "," , your input will have just one single field.

So, the answer to your above question is "yes" if default FS are used, but might be "possible" or "no" if you define different FS.

cmccabe · October 27, 2014, 5:38pm

 awk 'NF{$1=sprintf("%s:%s-%s",$1,$2,$3); $2=$4; NF-=2}1'  file

chr1:10002681-10002826 LZIC
chr1:10002980-10003083 NMNAT1
chr1:10003485-10003573 NMNAT1

Seems to be printing a continuous string of text. Thanks :).

bakunin · October 27, 2014, 6:20pm

Yes, because the "printf"-family of functions do not terminate (lines of) output with a newline per default. You have to explicitly state that:

 awk 'NF{$1=sprintf("%s:%s-%s\n",$1,$2,$3); $2=$4; NF-=2}1'  file

I hope this helps.

bakunin

Don_Cragun · October 27, 2014, 9:05pm

I must be missing the point here. Other than printing a space (instead of a tab) between the two output fields, I don't see what was wrong with Akshay Hegde's oriiginal suggestion.

To change the space in the output to a tab, any of the following would work:

awk '{print $1 ":" $2 "-" $3 "\t" $4}' file.txt
                    or
awk '{print $1 ":" $2 "-" $3 OFS $4}' OFS="\t" file.txt
                    or
awk '{print $1 ":" $2 "-" $3, $4}' OFS="\t" file.txt

or, if there are empty lines in your input (not shown in your sample input) that are to be printed without change:

awk 'NF == 0 {print; next}
{print $1 ":" $2 "-" $3 "\t" $4}' file.txt