genome
1
Hi,
I've data like these:
Gene1,Gene2 snp1
Gene3 snp2
Gene4 snp3
I'd like to split line if comma and then print remaining information for the respective gene.
My code:
awk '{
if($1 ~ /,/){
n = split($0, t, ",")
for (i = 0; ++i <= n;) {
print t,$2
}
}
else{
print $0
}
}' smalldata.txt
It gives me output:
Gene1 snp1
Gene2 snp1 snp1
Gene3 snp2
Gene4 snp3
I want an output like:
Gene1 snp1
Gene2 snp1
Gene3 snp2
Gene4 snp3
Line can have multiple commas.
Linux platform: 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64 GNU/Linux
Yoda
2
Try:-
awk '
{
if ( $0 ~ /,/ )
{
n = split ( $1, T, "," )
for ( i = 1; i <= n; i++ )
print T, $NF
}
else
print $1, $NF
}
' file
rdrtx1
3
awk ' { for (i=1; i<=NF-1; i++) print $i, $NF } ' FS="[ ,]" file
Aia
4
In case you would not mind to use Perl.
perl -pale 's/,/ $F[1]\n/' genome.file
Output:
Gene1 snp1
Gene2 snp1
Gene3 snp2
Gene4 snp3
rdrtx1
5
s/,/ $F[1]\n/g
for multiple comas.
Your original code needs to split on $1 (first field) not $O.
...
n = split($1, t, ",")
...
1 Like
RudiC
7
And, you don't need the test for existence of commas in $1
; split
will yield one single element in the absence of separators.