cmccabe
October 27, 2014, 10:54am
1
I have a text file in the below format:
chr1 10002681 10002826 LZIC
chr1 10002980 10003083 NMNAT1
chr1 10003485 10003573 NMNAT1
chr1 100111430 100111918 PALMD
chr1 100127874 100127955 PALMD
chr1 100133197 100133322 PALMD
chr1 100152231 100152346 PALMD
chr1 100152485 100152519 PALMD
chr1 100152631 100152745 PALMD
chr1 100154330 100155428 PALMD
Is it possible to concatenate $1":"$2"-"@3 in one column withe gene name next to that for each row?
chr1:10002681-10002826 LZIC
chr1:10002980-10003083 NMNAT1
chr1:10003485-10003573 NMNAT1
chr1:100111430-100111918 PALMD
chr1:100127874-100127955 PALMD
chr1:100133197-100133322 PALMD
chr1:100152231-100152346 PALMD
chr1:100152485-100152519 PALMD
chr1:100152631-100152745 PALMD
chr1:100154330-100155428 PALMD
Thanks :).
awk '{print $1 ":" $2 "-" $3 , $4}' file
cmccabe
October 27, 2014, 11:54am
3
awk '{print $1 ":" $2 "-" $3 , OFS= /t$4}' file
Would the above concatenate $1,$2,$3 in column 1 and $4 in column 2? Thanks :).
cmccabe:
awk '{print $1 ":" $2 "-" $3 , OFS= /t$4}' file
Would the above concatenate $1,$2,$3 in column 1 and $4 in column 2? Thanks :).
Hello cmccabe,
Following will do the same what you have asked now, Akshay's solution is only printing the seprators in between fields.
awk '{$1=$1":"$2"-"$3;$2=$NF;$3=$NF="";print $0}' Input_file
Thanks,
R. Singh
cmccabe:
awk '{print $1 ":" $2 "-" $3 , OFS= /t$4}' file
Would the above concatenate $1,$2,$3 in column 1 and $4 in column 2? Thanks :).
Your syntax is wrong, or else try like this
awk 'NF{$1=sprintf("%s:%s-%s",$1,$2,$3); $2=$4; NF-=2}1' file
---------- Post updated at 10:33 PM ---------- Previous update was at 10:30 PM ----------
--
ravindersingh13:
Hello cmccabe,
Following will do the same what you have asked now, Akshay's solution is only printing the seprators in between fields.
awk '{$1=$1":"$2"-"$3;$2=$NF;$3=$NF="";print $0}' Input_file
Thanks,
R. Singh
@Ravinder : $3=$NF=""
will not delete fields actually, try your command with OFS=','
RudiC
October 27, 2014, 12:46pm
6
What you call columns is called fields in awk
etc. terms which in turn are separated by field separators (FS). So what will be interpreted as a field depends on the definition of the FS. It defaults to whitespace (space, tab, newline) in awk
. With the defaults, your input will have four fields. Should FS be set to e.g. "#" or "," , your input will have just one single field.
So, the answer to your above question is "yes" if default FS are used, but might be "possible" or "no" if you define different FS.
awk 'NF{$1=sprintf("%s:%s-%s",$1,$2,$3); $2=$4; NF-=2}1' file
chr1:10002681-10002826 LZIC
chr1:10002980-10003083 NMNAT1
chr1:10003485-10003573 NMNAT1
Seems to be printing a continuous string of text. Thanks :).
cmccabe:
awk 'NF{$1=sprintf("%s:%s-%s",$1,$2,$3); $2=$4; NF-=2}1' file
Seems to be printing a continuous string of text.
Yes, because the "printf"-family of functions do not terminate (lines of) output with a newline per default. You have to explicitly state that:
awk 'NF{$1=sprintf("%s:%s-%s\n",$1,$2,$3); $2=$4; NF-=2}1' file
I hope this helps.
bakunin
I must be missing the point here. Other than printing a space (instead of a tab) between the two output fields, I don't see what was wrong with Akshay Hegde's oriiginal suggestion.
To change the space in the output to a tab, any of the following would work:
awk '{print $1 ":" $2 "-" $3 "\t" $4}' file.txt
or
awk '{print $1 ":" $2 "-" $3 OFS $4}' OFS="\t" file.txt
or
awk '{print $1 ":" $2 "-" $3, $4}' OFS="\t" file.txt
or, if there are empty lines in your input (not shown in your sample input) that are to be printed without change:
awk 'NF == 0 {print; next}
{print $1 ":" $2 "-" $3 "\t" $4}' file.txt