Help with replace data content

Format of one input file:

# >length=1
seq	program	data	909	1992
seq	program	record	909	1190

Desired output result:

# >length=1
length=1	program	data	909	1992
length=1	program	record	909	1190

I wanna to replace all the column 1 content (exclude the content start with "#") with the info in # but remove the ">"
Command I try:

step 1:
awk '{print $2}' input_file.txt | fgrep '>' | sed 's/>//g' > tmp.txt
cat
step 2:
cat tmp.txt | awk '{print "sed \047s/seq/"$1"/g\047 input_file.txt"}' > run.sh
step 3:
./run.sh

The command I try is also able to generate my desired output result.
Thanks for any advice to improve it.

$ a=$(head -1 test | cut -d\> -f2); nawk -v a=$a 'NR>1{$1=a;print}' inputfile.txt 
length=1 program data 909 1992
length=1 program record 909 1190
1 Like

any other advice is appreciated :slight_smile:

Another one...

awk 'NR==1{split($0,a,">");next}{$1=a[2]}1' input_file

--ahamed

A solution in Perl.

#! /usr/bin/perl
use warnings;
use strict;

my (@x, @y, $line);
open SOURCE, "< source.txt";
for $line (<SOURCE>) {
    if ($line =~ /^# >/) {
        @x = split /\s+/, $line;
        $x[1] =~ s/>//;
        open DEST, ">> output.txt";
        print DEST "$line";
        close DEST;
        next;
    }
    else {
        @y = split /\t/, $line;
        $y[0] =~ s/$y[0]/$x[1]/;
        open DEST, ">> output.txt";
        print DEST join "\t", @y;
        close DEST;
    }
}
close SOURCE;

source.txt:
------------------
# >length=1
seq program data 909 1992
seq program record 909 1190
# >length=2
seq program data 909 1992
seq program record 909 1190

output.txt:
-------------------
# >length=1
length=1 program data 909 1992
length=1 program record 909 1190
# >length=2
length=2 program data 909 1992
length=2 program record 909 1190

awk -F\> '/^#/{_=$NF}{sub(/^seq/,_)}_' file
1 Like