Suggestions for adding columns to text file

Good afternoon to everyone,

I have some input and output from various widgets that I am trying to get to play nicely together. Basically I would like to stay out of excel and be able to automate the entire process. I have read some posts here about how to use awk, nawk, etc, to do similar operations, but nothing quite the same.

One thing I need to do is to add a leading column with "index" as the value for the first row, and then sequentially number the remaining rows. I know how to number all rows in sed, but not how to skip the first row. I also need to remove column three at some later point, etc.

I think if I can get these two operations working, I can adapt for the rest.

Thanks a bunch for any suggestions, I think these operations are simple, but many things are simple once you know how.

...not so much otherwise

LMHmedchem

Based on your description where file1 has the original columns

sed -n '2,$p' file1 | awk '{ print NR" "$0 }' > file2
awk '
  NR == 1 { printf "index", $0; next }
          { print NR - 1, $0 }
' "$file"

Hello, LMHmedchem:

Welcome to the forum. In the future, please provide some sample of the data being worked with (feel free to obfuscate sensitive info, so long as the format isn't affected). With nothing to go on, here's my shot in the dark (i've assumed the data is a comma-delimited file):

awk 'NR==1{print "index,"$0} NR>1{print NR-1","$0}' data

Regards,
Alister

awk '$0=((NR-1)?NR-1:"index")","$0'  infile

Thanks for the responses so far,

This works for adding the index column,

   awk '$0=((NR-1)?NR-1:"index")"\t"$0'  $INFILE > $OUTFILE

This is a tab delimited file, so I made that change.

The input looks like,

name   input1   input2   input3   input4   input5
1972257   16.1762   3.38945   56.1511   58.7836   6.71878
2803235   46.5833   8.39445   28.0059   47.72   6.21068
17957224   32.5622   6.57671   40.5624   47.5545   7.44461
17957228   32.5622   6.57671   40.6967   47.5268   7.44241   

and for the first transformation, I need

index    name   ID   input1   input2   input3   input4   input5
1     1972257   1   16.1762   3.38945   56.1511   58.7836   6.71878
2      2803235   2   46.5833   8.39445   28.0059   47.72   6.21068
3      17957224   3   32.5622   6.57671   40.5624   47.5545   7.44461
4      17957228   4   32.5622   6.57671   40.6967   47.5268   7.44241   

The code above gets me the index column, for the third column I tried,

   awk '$2=((NR-1)?NR-1:"ID")"\t"$2'  $INFILE > $OUTFILE

but that definitely didn't work.

LMHmedchem

AWK:

awk 'BEGIN{OFS="\t"} {$2=(NR>1?NR-1:"ID")"\t"$2; print (NR>1?NR-1:"index"),$0}' data

SED (each red, underscored area is one literal tab, which can be inserted at the command line by typing control-v control-i or control-v tabkey):

sed -n 'p;=' data | sed '$d; 1{s/____/____ID____/; s/^/index____/;n;}; N; y/\n/____/; s/\(\([^____]*\)____[^____]*____\)/\1\2____/;'

Regards,
Alister

Well I have it working more or less, thanks to everyone for the help. This is the script at the moment, warning, there should be an ugly flag on this somewhere.

A friend of mine said you should never be too embarrassed about something that actually works, but I am not so sure about this one.

After several attempts to put the script in code tags, I have attached it. The zip contains the script, the output from the first bin called by the script, and the output from the second bin, in case anyone wants to hot wire this and view the ugliness. It also contains the two output files, which are formatted as needed.

I am interested in learning, so any comments as to better methods of getting from here to there would be appreciated.

Thanks again,

LMHmedchem

And just for the heck of it, a sh solution (red underscore is one tab):

#!/bin/sh

file="$1"

i=0
IFS='____'
while IFS='' read -r line; do
    set -- $line
    if [ $i -gt 0 ]; then
        a=$i b=$i
    else
        a=index b=ID
    fi
    printf '%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n' $a $1 $b $2 $3 $4 $5 $6
    : $((++i))
done < "$file"

Regards,
Alister

Try

awk '$1=(NR-1)?NR-1OFS$1OFS NR-1OFS:"index"OFS$1OFS"ID"OFS' OFS="\t" infile