Split a file into multiple files

Scrutinizer · December 30, 2009, 6:41am

awk -F '|' '{print > i".out"}$1=/[DTZF]/{++i}' i=1 infile

$ grep '' *.out
1.out:|1|2|3|4|5|
1.out:|1|2|3|4|4|
1.out:|1|2|3|4|3|
1.out:|T|one||||
2.out:|1|2|3|4|5|6|7|8|9|
2.out:|2|3|4|5|6|7|8|9|1|
2.out:|D|three|||||
3.out:|4|
3.out:|5|
3.out:|6|
3.out:|Z|four||||

ahmad.diab · December 30, 2009, 6:45am

No you are wrong ..I alraedy exute the code and get 3 file only. with correct o/p

see below

cat filetoBeSplit.dat
|1|2|3|4|5|
|1|2|8|4|6|
|Trailer1|||||
|1|2|3|
|Trailer2|||
|3|4|5|6|
|3|4|5|7|
|3|4|5|8|
|Trailer2|||

cat filesplit0.dat
|1|2|3|4|5|
|1|2|8|4|6|
|Trailer1|||||

cat filesplit1.dat
|1|2|3|
|Trailer2|||

cat filesplit2.dat
|3|4|5|6|
|3|4|5|7|
|3|4|5|8|
|Trailer2|||

why it is wrong when you excute..did you copy/paste the code as I write it.

V.important note:- you need to delete the old files before executing the code again.

if you want to add new reqexp add it in the box in below

/usr/xpg4/bin/awk  -F"|" -v n=0 '
($2 ~/^[TFZD]/){print > > "filesplit"n".dat" ; close("filesplit"n".dat");n++;next}{print > "filesplit"n".dat"}' filetoBeSplit.dat

:rolleyes::rolleyes:

pparthji · December 30, 2009, 7:58am

hi scritunizer,

it wrks fine, but it generates file like 1.out, 2.out, and so on,
actually i ve to generate files /test1/filename1.out, /test2/filename2.out (there is no such specific format) n they can be created at different location after split...how can i give file names in awk?

---------- Post updated at 07:09 AM ---------- Previous update was at 07:04 AM ----------

this is my filetobesplit.dat:

|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
|T|one||||
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
|4|
|5|
|6|
|Z|four||||

after split following files r getting generated with wrong data.

-rw-r--r--  1 r245347 fwsAPP      36 2009-12-30 07:07 filesplit0.dat
-rw-r--r--  1 r245347 fwsAPP      40 2009-12-30 07:07 filesplit1.dat
-rw-r--r--  1 r245347 fwsAPP      12 2009-12-30 07:07 filesplit2.dat
-rw-r--r--  1 r245347 fwsAPP      12 2009-12-30 07:07 filesplit2.dat0
-rw-r--r--  1 r245347 fwsAPP      14 2009-12-30 07:07 filesplit1.dat0
-rw-r--r--  1 r245347 fwsAPP      11 2009-12-30 07:07 filesplit0.dat0

============

/testDir> cat filesplit0.dat
|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
/testDir> cat filesplit1.dat
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
/testDir> cat filesplit2.dat
|4|
|5|
|6|
/testDir> cat filesplit2.dat0
|Z|four||||
/testDir> cat filesplit1.dat0
|D|three|||||
/testDir> cat filesplit0.dat0
|T|one||||

---------- Post updated at 07:58 AM ---------- Previous update was at 07:09 AM ----------

[/COLOR]hi, ahmed,

its nt wrking still at my end ...

can u try the following data file:

|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
|T|one||||
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
|4|
|5|
|6|
|Z|four||||

xoops · December 30, 2009, 8:16am

Hi, pparthji

As per my understanding the script is reading the input file only once,
i.e. 1 line at a time so there shud not be any performance issue with this.
Further the grep command is run only against a line of a file, so i dont think it will always match.

pparthji:

Hi xoops,

in your script, script is reading the file again and again which hampers the performance. and besides that, grep command returns the all the matched patterns, for eg,
|1|2|3|
|T||||
|1|2|
|T||||
|1||2|3|4|5|
|T1||||
In this case. grep will always start from first.

 
#!/bin/bash
i=1
IFS=$'\n'
for line in `cat $1`
do
  echo $line >> filesplit${i}.dat
  $(echo ${line} | egrep -q "T|D|Z|F")
  if [ $? -eq 0 ] ; then i=$(($i+1)) ;fi
done

 
 
 
>> cat filetoBeSplit
|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
|T|one||||
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
|4|
|5|
|6|
|Z|four||||
 
> sh script.sh filetoBeSplit
 
> more *.dat
::::::::::::::
filesplit1.dat
::::::::::::::
|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
|T|one||||
::::::::::::::
filesplit2.dat
::::::::::::::
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
::::::::::::::
filesplit3.dat
::::::::::::::
|4|
|5|
|6|
|Z|four||||

ahmad.diab · December 30, 2009, 8:54am

pparthji:

hi scritunizer,

it wrks fine, but it generates file like 1.out, 2.out, and so on,
actually i ve to generate files /test1/filename1.out, /test2/filename2.out (there is no such specific format) n they can be created at different location after split...how can i give file names in awk?

---------- Post updated at 07:09 AM ---------- Previous update was at 07:04 AM ----------

this is my filetobesplit.dat:
|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
|T|one||||
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
|4|
|5|
|6|
|Z|four||||
after split following files r getting generated with wrong data.
-rw-r--r--  1 r245347 fwsAPP      36 2009-12-30 07:07 filesplit0.dat
-rw-r--r--  1 r245347 fwsAPP      40 2009-12-30 07:07 filesplit1.dat
-rw-r--r--  1 r245347 fwsAPP      12 2009-12-30 07:07 filesplit2.dat
-rw-r--r--  1 r245347 fwsAPP      12 2009-12-30 07:07 filesplit2.dat0
-rw-r--r--  1 r245347 fwsAPP      14 2009-12-30 07:07 filesplit1.dat0
-rw-r--r--  1 r245347 fwsAPP      11 2009-12-30 07:07 filesplit0.dat0
============
/testDir> cat filesplit0.dat
|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
/testDir> cat filesplit1.dat
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
/testDir> cat filesplit2.dat
|4|
|5|
|6|
/testDir> cat filesplit2.dat0
|Z|four||||
/testDir> cat filesplit1.dat0
|D|three|||||
/testDir> cat filesplit0.dat0
|T|one||||
---------- Post updated at 07:58 AM ---------- Previous update was at 07:09 AM ----------

[/COLOR]hi, ahmed,

its nt wrking still at my end ...

can u try the following data file:
|1|2|3|4|5|
|1|2|3|4|4|
|1|2|3|4|3|
|T|one||||
|1|2|3|4|5|6|7|8|9|
|2|3|4|5|6|7|8|9|1|
|D|three|||||
|4|
|5|
|6|
|Z|four||||

With the data you provide my code still working find...

code:-

bash-3.00$ rm filesplit*.dat

bash-3.00$ /usr/xpg4/bin/awk  -F"|" -v n=0 '($2 ~/^[TFZD]/){print > > "filesplit"n".dat";close("filesplit"n".dat");n++;next}{print > "filesplit"n".dat"}' filetoBeSplit.dat

just copy/paste the commands