Break a large file

jacobs.smith · March 8, 2013, 10:58am

Hi Friends,

I have the following file and I would like to split it after every line that starts with

done

The file is like this

cat script

#!/bin/bash
#
# Name: name
# Task: name
#
#$ -N name
#$ -S /bin/bash
#$ -m be
#$ -M xxx
#$ -e xxx
#$ -o xxx

cd /path

#!/bin/bash

while read line
do
  	c=$(program ${line} | wc -l )
        line=${line/:/ }
        line=${line/-/ }
        echo "${line} ${c}"
done < input > output

#!/bin/bash
#
# Name: name
# Task: name
#
#$ -N name
#$ -S /bin/bash
#$ -m be
#$ -M xxx
#$ -e xxx
#$ -o xxx

cd /path

#!/bin/bash

while read line
do
  	c=$(program ${line} | wc -l )
        line=${line/:/ }
        line=${line/-/ }
        echo "${line} ${c}"
done < input1 > output1

After each done, there can be a blank line or an immediate line starting after it. Whatever it is, I would like to split the file based on the done criteria.So, from the above input, I get two output files.

cat script1

#!/bin/bash
#
# Name: name
# Task: name
#
#$ -N name
#$ -S /bin/bash
#$ -m be
#$ -M xxx
#$ -e xxx
#$ -o xxx

cd /path

#!/bin/bash

while read line
do
  	c=$(program ${line} | wc -l )
        line=${line/:/ }
        line=${line/-/ }
        echo "${line} ${c}"
done < input > output

cat script2

#!/bin/bash
#
# Name: name
# Task: name
#
#$ -N name
#$ -S /bin/bash
#$ -m be
#$ -M xxx
#$ -e xxx
#$ -o xxx

cd /path

#!/bin/bash

while read line
do
  	c=$(program ${line} | wc -l )
        line=${line/:/ }
        line=${line/-/ }
        echo "${line} ${c}"
done < input1 > output1

So, far I tried this

awk '/done/{x="F"++i;next}{print > x;}' script

But, I see an error saying

awk: null file name in print or getline
 input record number 1, file om
 source line number 1

Yoda · March 8, 2013, 11:04am

awk ' BEGIN {
                i = 1
                F = "script" i
} /^done/ {
                print $0 > F
                close(F)
                i += 1
                F = "script" i
                next
} {
                print $0 > F
} ' script

vgersh99 · March 8, 2013, 11:08am

awk '{print >f}/done/{close(f);f="script" ++n}' f='script1' n=1 myFile

drl · March 9, 2013, 9:05am

Hi.

Utility csplit was designed for situations like this:

#!/usr/bin/env bash

# @(#) s1	Demonstrate context splitting with csplit.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C csplit

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

# Remove debris from previous runs.
rm -f xx*
pl " Results, split files:"
csplit -z -s $FILE '/^done/+1' {*}
head xx*

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
csplit (GNU coreutils) 6.10

-----
 Input data file data1:
stuff1
done other tokens1
stuff2
done other tokens2

-----
 Results, split files:
==> xx00 <==
stuff1
done other tokens1

==> xx01 <==
stuff2
done other tokens2

See man csplit for details.

Best wishes ... cheers, drl