Split a file using 2-D indexing system

kristinu · March 4, 2013, 9:24pm

I have a file and want to split it using a 2-D index system

for example

if the file is p.dat with 6 data sets separated by ">".
I want to set nx=3, ny=2. I need to create files

p.dat.1.1
p.dat.1.2
p.dat.1.3
p.dat.2.1
p.dat.2.2
p.dat.2.3

I have tried using a single index and want to modify it to 2 indices

awk -v ft=$ftmp '/>/{x=ft"."++i; next} {print > x;}' p.dat

mirni · March 4, 2013, 11:00pm

Try:

awk -v ft=$ftmp '{i=i++%3+1;  i2=(i==1)?i2+1:i2; print > ft"."i2"."i}' RS="\n>\n"  p.dat

Don_Cragun · March 5, 2013, 12:38am

If the awk on your system only supports single character settings for RS, or if you'd like to base the output filenames on the input filenames, be able to specify more than one input file, and be able to specify the number of files to be produced before updating the value of the 1st numeric value in the output filename, you could try the following script:

#!/bin/ksh
cnt=3
Usage="Usage: $(basename $0) [-n cnt] file..."
# Split input file(s) into files named file.X.Y where X and Y reset to 1
# and 1, respectively, for each file operand.  A new file is created
# when a line in an input file starts with a <greater-than> character
# (">").  Lines starting with a <greater-than> character are not
# included in any of the output files, but all other lines are copied 
# unchanged into the corresponding output file.  When a new file is
# created, Y is incremented until it exceeds cnt (which defaults to 3 if
# the -n option is not given on the command line.  When Y exceeds cnt, X
# is incremented and Y is reset to 1.
while getopts n: opt
do      case $opt in
        (n)     cnt="$OPTARG";;
        (?)     echo "$Usage" >&2
                exit 1
        esac
done
shift $(($OPTIND - 1))
if [ $# -lt 1 ]
then    echo "$(basename $0): At least one file operand is required." >&2
        echo "$Usage" >&2
        exit 2
fi
awk -v cnt=$cnt '
FNR == 1 {
        # This is the first record of a new input file.
        # If this is not the first input file, close the last output file for
        # the previous input file.
        if(NR != FNR) close(fn)
        # Create output filename based on input filename.
        x = y = 1
        fn = FILENAME "." x "." y
}
/^>/ {  # Close current output file
        close(fn)
        if(y == cnt) {
                y = 1
                x++
        } else  y++
        fn = FILENAME "." x "." y
        next
}
{       print > fn
}' "$@"

It uses the Korn shell, but will also work with any other shell that accepts parameter expansions specified by the POSIX Standards (including bash).

Note that if the first line in an input file or two or more adjacent lines in an input file start with a > , empty files will not be created; the corresponding filename will just be skipped.

kristinu · March 5, 2013, 5:38am

I love your second version