Splitting large file into multiple files in unix based on pattern

jimmy12 · July 2, 2011, 8:23am

I need to write a shell script for below scenario
My input file has data in format:

qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 
qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43  
qwerty0101CFG 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 
qwerty0101CFG 12342 01022005 07022009 datainalc hitalbert 43

the records are tab separated.
I want to read the input file, based on the last three characters of the first field
qwerty0101TWE i.e. TWE I want to put this record in file TWE.txt
thennext record mxz in MXZ.txt.
Like wise all TWE records in 1 file all MXZ records in one file.
Kindly help to write shell script for same. As i'm new to unix

bartus11 · July 2, 2011, 8:32am

perl -ne '/(.{3})\t/;open O,">>$1.txt";print O;close O' file

jayan_jay · July 2, 2011, 9:25am

try this..

 % awk ' { print $1 } ' input_file  | cut -c 11- | uniq | awk ' { print "grep \"[0-9]"$0"\" input_file > "$0".txt" } ' | sh

Franklin52 · July 2, 2011, 10:28am

awk '{f=substr($1,length($1)-2)".txt";print > f;close(f)}' file

jimmy12 · July 5, 2011, 4:00am

Thanks for the reply....

Also there are few variations plzz help

How to skip the first and last line of the file before splitting the file.

Also I want to keep the orignal file as it is

Franklin52 · July 5, 2011, 4:22am

awk -v ll=$(wc -l < file) 'NR>1 && NR<ll{f=substr($1,length($1)-2)".txt";print > f;close(f)}' file > newfile

jimmy12 · July 5, 2011, 7:44am

Its giving me error n I'm nt able to figure it out pl help

$ awk -v ll=$(wc -l < test) 'NR>1 &&
NR<ll{f=substr($1,length($1)-2)".txt";print > f;close(f)}' test > 
 
newfilesyntax error: `(' unexpected
$

bartus11 · July 5, 2011, 7:54am

Try:

awk -v ll=`wc -l < file` 'NR>1 && NR<ll{f=substr($1,length($1)-2)".txt";print > f;close(f)}' file > newfile

ygemici · July 5, 2011, 8:04am

try with sed

# for((i=1;i<$(sed -n '$=' file);i++));do sed -n "$i p" file > $(sed -n "$i s/^.\{10\}\(...\).*/\1/p" file).txt;done

mirni · July 5, 2011, 9:02am

Use nawk on Solaris.

Franklin's code needs just a little modification:

nawk -v ll=$(wc -l < test) 'NR>1 && NR<ll{f=substr($1,length($1)-2)".txt"; print > f}' test

jimmy12 · July 6, 2011, 12:38am

---------- Post updated at 11:38 PM ---------- Previous update was at 10:19 PM ----------

Hi
None of the commands are working all having same error error: `(' unexpected
Below is requirement:

txtytg09dfgdfg
qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 
qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 
qwerty0101CFG 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 
qwerty0101CFG 12342 01022005 07022009 datainalc hitalbert 43
byetekr 09

In the above file I have to skip the first&last line i.e not to the process these lines.
Based on the last three characters of the first field of the second record
qwerty0101TWE i.e. TWE I want to put this record in file TWE.txt
then next record mxz in MXZ.txt.
With the above queries it is creating files for first and last line also
like dgf.txt anf kr0.txt> I don't to create this files.
Also need to keep the input file as it is.
Plzz help its urgent!!!!

---------- Post updated at 11:38 PM ---------- Previous update was at 11:38 PM ----------

itkamaraj · July 6, 2011, 12:47am

 
$nawk '{f=substr($1,length($1)-2)".txt";print $2,$3,$4,$5,$6 >> f;close(f)}' test
$
$ cat TWE.txt
12345 01022005 01022005 datainala alanfernanded
12342 01022005 07022009 datainalc hitalbert
$ cat mXZ.txt
12349 01022005 06022008 datainalb johngalilo
12349 01022005 06022008 datainalb johngalilo
$ cat CFG.txt
12345 01022005 01022005 datainala alanfernanded
12342 01022005 07022009 datainalc hitalbert

jimmy12 · July 6, 2011, 1:08am

Sample file

$ cat test
nala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28
qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43
qwerty0101CFG 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28
qwerty0101CFG 12342 01022005 07022009 datainalc hitalbert 43
byetekr 09

Command:

nawk '{f=substr($1,length($1)-2)".txt";print $2,$3,$4,$5,$6 >> f;close(f)}' test

its creating files as

ala.txt -- created for first file
ekr.txt -- created for last file n other files too
Actually don't want to create files for 1st n last line
also there are n no. of records n not just 6 I don't want to hard code anything

$ cat TWE.txt
12342 01022005 07022009 datainalc hitalbert

I do want the qwerty0101mXZ field i.e first field in the output file
this command works fine

awk '{f=substr($1,length($1)-2)".txt";print > f;close(f)}' test

but want to modify cmd to skip the first and line for processing

itkamaraj · July 6, 2011, 1:33am

 
$ lineno=`wc -l < test`; nawk -v lineno="$lineno" '{ if (NR>1 && NR < lineno){f=substr($1,length($1)-2)".txt";print >> f;close(f)}}' test

$ cat TWE.txt
qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43

$ cat mXZ.txt
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28

$ cat CFG.txt
qwerty0101CFG 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101CFG 12342 01022005 07022009 datainalc hitalbert 43

$ cat test
nala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28
qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43
qwerty0101CFG 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28
qwerty0101CFG 12342 01022005 07022009 datainalc hitalbert 43
byetekr 09

jimmy12 · July 6, 2011, 1:49am

Thanks for all your help n effort...Just 1 flaw.. plzz help $ lsCFG.txt mXZ.txtTWE.txt ekr.txt test after running the ls command 4 files are created.I do not want the file to be created for last line i.e ekr.txt is unwanted file which should not be created as it is last line and hence no need to process it

itkamaraj · July 6, 2011, 1:59am

make sure you dont have any blank line in your main file

do

cat -n filename

if you have the last line as empty, then the provided awk command will create the file called ekr.txt

it is not flaw, it is the problem with your input file

jimmy12 · July 6, 2011, 2:44am

Thanks a ton I got my mistake
All working fine.....

---------- Post updated at 01:44 AM ---------- Previous update was at 01:27 AM ----------

Need 1 more favor!!!

I need to write shell script for same which will read input file i.e test from particular location n then will execute awk command on same.

Help me with same.

May be it may sound too basic... but i'm beginner in Unix

itkamaraj · July 6, 2011, 2:49am

which will read input file i.e test from particular location

will the filename change everyday ?
how the filename looks like ?

 
filename="/tmp/test"
lineno=`wc -l < $filename`
nawk -v lineno="$lineno" '{ if (NR>1 && NR < lineno){f=substr($1,length($1)-2)".txt";print >> f;close(f)}}' $filename

jimmy12 · July 6, 2011, 3:13am

Yes the filename will change but it will be placed at one location.

Nothing specific format for filename.

itkamaraj · July 6, 2011, 3:14am

then pass the filename as argment to your script. and read using $1

filename=$1