Split a file into multiple files

Hi,

i have a file like this:

1|2|3|4|5|
1|2|8|4|6|
Trailer1|||||
1|2|3|
Trailer2|||
3|4|5|6|
3|4|5|7|
3|4|5|8|
Trailer2|||

I want to generate 3 files out of this based on the trailer record. Trailer record string can be different for each file or it may be same for one or two.

No of files to be generated can vary as per the number of trailer records inside the input file.

Pl suggest how to implement this shell script?

Hi,
Try this...

#!/usr/bin/perl

$i=1;
open (FH,">${i}.txt");
while (<>) {
        if (/Trailer/){
                print FH $_;
                close(FH);
                if (eof()){
                close ARGV ;
                exit;
                }
                $i = $i + 1;
                open (FH,">${i}.txt");
                }
        else {
                print FH $_ ;
             }
}

hi pravin,

i need the soluton in unix shell script.

try this

 
#!/bin/bash
i=1
IFS=$'\n'
for line in `cat input_file`
do
  echo $line >> ${i}.txt
  $(echo ${line} | grep -q Trailer)
  if [ $? -eq 0 ] ; then i=$(($i+1)) ;fi
done
 

input is read from file: input_file

solution below will work in all awk versions.

/usr/xpg4/bin/awk  -F"|" -v n=0 '
/^Trailer[0-9]*/{ close("out"n) ; n++ ; print > "out"n ; next}
{ print >> "out"n}'  infile.txt

:D:D:D

Try:

i=1
while read line
do
  echo $line >> $i.out
  case $line in
    Trailer*) i=$((i+1))
  esac
done<infile

awk equiv:

awk '{print > i".out"}/^Trailer/{++i}' i=1 infile

Hi xoops,

in your script, script is reading the file again and again which hampers the performance. and besides that, grep command returns the all the matched patterns, for eg,

|1|2|3|
|T||||
|1|2|
|T||||
|1||2|3|4|5|
|T1||||

In this case. grep will always start from first.

---------- Post updated at 04:01 AM ---------- Previous update was at 03:50 AM ----------

hi ahmed,

trailer record format is different. its nt like Trailer1, Trailer2...it comes as an input parameter.

Input parameters to the script are

-s file name to be split (filetoBeSplit.dat)
-f split file names (filesplit1.dat, filesplit2.dat, filesplit3.dat...)
-t search pattern for trailer record (|T| |F| |Z|)

how can we specify different trailer record regexp in awk?

/usr/xpg4/bin/awk  -F"|" -v n=0 '
/^[TFZ]/{ close("filesplit"n".dat") ; n++ ; print > "filesplit"n".dat" ; next}
{ print >> "filesplit"n".dat"}'  filetoBeSplit.dat

perl:

my $n=1;
my $file="file_".$n.".txt";
open FH,">>$file";
while(<DATA>){
  if(/Trailer/){
  	$n++;
  	$file="file_".$n.".txt";
  	close FH;
  	open FH,">>$file";
  	next;
  }
  print FH $_;
}
__DATA__
1|2|3|4|5|
1|2|8|4|6|
Trailer1|||||
1|2|3|
Trailer2|||
3|4|5|6|
3|4|5|7|
3|4|5|8|
Trailer2|||

Try:

pat=Trailer
i=1
while read line
do
  echo $line >> $i.out
  case $line in
    ${pat}*\|) i=$((i+1))
  esac
done<infile

awk equiv:

awk '{print > i".out"}$0~pat{++i}' pat="Trailer" i=1 infile

Hi ahmed,
i tried the following command for

awk  -F"|" -v n=0 '/^[TFZ]/{ close("filesplit"n".dat") ; n++ ; print > "filesplit"n".dat" ; next} { print >> "filesplit"n".dat"}'  test1.dat

file

|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
|T|one||||
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
|4|
|5|
|6|
|Z|four||||

but its generating one file filesplit0.dat containing all data...

modify the code to below:-

/usr/xpg4/bin/awk  -F"|" -v n=0 '
($2 ~/^[TFZ]/){ close("filesplit"n".dat") ; n++ ; print > "filesplit"n".dat" ; next}
{ print >> "filesplit"n".dat"}'  filetoBeSplit.dat

because the first filed now is null "" after putting "|" at the begining.

BR


  1. TFZ ↩︎

hi,

its nt generating the files properly:

/testDir> cat filesplit0.dat
|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
/testDir> cat filesplit1.dat
|T|one||||
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
|4|
|5|
|6|
/testDir> cat filesplit2.dat
|Z|four||||

Is it what you want or not?

No, its nt spliting the file on the basis of trailer record regular expression.
I need split like this:

filetoBeSplit.dat

|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
|T|one||||
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
|4|
|5|
|6|
|Z|four||||

After split:

cat filesplit0.dat
 
|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
|T|one||||
 
cat filesplit1.dat

|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||

cat filesplit2.dat
 
|4|
|5|
|6|
|Z|four||||

Or simply:

while change specification; do
  generate new awk
  generate alternative new awk
done

output:

awk '{print > i".out"}$0~pat{++i}' pat='^\|[DTZF]' i=1 infile
awk -F '|' '{print > i".out"}$1=/[DTZF]/{++i}' i=1 infile

ok ...do the below modification it is just re-arranging the
commands orders:- :D:D:D

/usr/xpg4/bin/awk  -F"|" -v n=0 '
($2 ~/^[TFZ]/){
print > > "filesplit"n".dat"
close("filesplit"n".dat")
n++
next
}
{ print >> "filesplit"n".dat"}'  filetoBeSplit.dat

hi,

i am nt able to understand
change specification; do
generate new awk
generate alternative new awk

can't we have done this thing in one awk without using loop?

Is it the right now? with correct o/p?

No,

its nt generating correct o/p:

testDir> cat filesplit0.dat
|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
/testDir> cat filesplit1.dat
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
|4|
|5|
|6|
/testDir> cat filesplit1.dat0
|Z|four||||
testDir"]/testDir> cat filesplit0.dat0
|T|one||||

this is wrong. there has to be three files only...with correct o/p...