Split and Rename Split Files

techedipro · March 14, 2018, 1:06am

Hello,

I need to split a file by number of records and rename each split file with actual filename pre-pended with 3 digit split number.

What I have tried is the below command with 2 digit numeric value

split -l 3 -d abc.txt F (# Will Produce split Files as F00  F01  F02)

How to produce split files with file name as 001_abc.txt , 002_abc.txt ?

i.e 3 digit split number_InputFileName that was passed.

Thanks

RudiC · March 14, 2018, 4:08am

Try

for FN in F[0-9][0-9]; do echo mv $FN 0${FN#F}_abc.txt; done

Remove echo if happy with proposed results.

apmcd47 · March 14, 2018, 5:18am

This works for me on my Ubuntu 16.04 (xenial) system:

split -l3 -a3 --additional-suffix=_abc.txt -d abc.txt ''

Note that I have put prefix to be null and added an additional suffix so that the filenames that are generated have a suffix rather than a prefix.

Andrew

techedipro · March 14, 2018, 11:22pm

I am on AIX 7100-03

I am getting an error on the -d flag

admin@tst(/script)$ split -l 3 -a 1 -d abc.txt F
split: Not a recognized flag: d
Usage: split [-l Line_Count] [-a Suffix_Length] [File [Prefix]]
   or: split -b Number[k|m] [-a Suffix_Length] [File [Prefix]]

Also a minor change.

I need to Run the split command in a script with numer of lines to split as parameter1 and input file name as parameter2 additionally the script should reside is some other directory other than the current or working directory as I will end up deleting the working/current directory after picking up the splitted files.

The split filenames should have 3 digit prefix and the input file name example : 001_abc.txt, 002_abc.txt e.t.c

Thanks

apmcd47 · March 15, 2018, 6:00am

techedipro:

I am on AIX 7100-03

I am getting an error on the -d flag

admin@tst(/script)$ split -l 3 -a 1 -d abc.txt F
split: Not a recognized flag: d
Usage: split [-l Line_Count] [-a Suffix_Length] [File [Prefix]]
   or: split -b Number[k|m] [-a Suffix_Length] [File [Prefix]]

Yes, sorry. I was using GNU extensions to the split command, which is why I mentioned the OS I tested it on. Had I known you are on AIX I would never have suggested it.

So depending on whether you are writing this script for yourself or multiple users the script should be placed in $HOME/bin or /usr/local/bin and your profile file (the one that is run at login) should ensure that directory is in your path ($PATH for sh , ksh or bash ; $path for csh ).

You can use $1 and $2 for your parameters but reassigning them gives them meaning.

lines=$1
fname=$2

Assuming you have the -a suffix_length opton to split:

split -l${lines} -d -a3 ${fname} F

Then use RudiC's solution (replacing abc.txt with ${fname}.

I'll let you figure the rest out.

Andrew

RudiC · March 15, 2018, 6:08am

How did your post#1 work, then?

techedipro · March 15, 2018, 1:55pm

RudiC & apmcd47,

Initailly I did not tested the command with -d flag . I realized it only after running it on AIX machine that -d is not an option.

-d flag will not work.

Any other work around ?

Thanks for your time and patience on this!

Don_Cragun · March 15, 2018, 2:35pm

Yes. Look at post #5 and post #2 in this thread!

Of course, whether or not that will work in the directories in which you want to run this code will depend on what other files are already there. (But, of course that is information you haven't bothered to tell us. Just like you didn't bother to tell us you were using AIX. And you haven't bothered to tell us what shell you're using. And complaining about getting suggestions using split -d after you said you were successfully using it in post #1 in this thread is really disconcerting. If you don't tell us what other files are present in a directory where you want us to use commands that create a lot of filenames, there is a good chance we'll make suggestions that destroy some of your other files. If you don't tell us what shell you're using (including the version number of that shell), there is a good chance we'll make suggestions using the shell(s) we like to use even though the rest of your script may not support our suggestions. Please get into the habit of clearly specifying the environment you're working in or be prepared to translate the suggestions we make so that they will work in your environment. Leaving out information about your environment just wastes your time and ours.)

techedipro · March 15, 2018, 5:30pm

Understood..Going forward I will provide enough information about the environment.

techedipro · March 17, 2018, 2:14pm

I am on AIX 7100-03

I tried splitting the input file using awk and I am still stuck in passing the input file name as variable

awk 'NR%3==1{x=sprintf("%03d_Test",++i);}{print > x}' abc.txt

The above command results in output files as 001_Test ,002_Test..

I want the output as 001_abc.txt , 002_abc.txt..

Please advise

Don_Cragun · March 17, 2018, 5:37pm

techedipro:

I am on AIX 7100-03

I tried splitting the input file using awk and I am still stuck in passing the input file name as variable
awk 'NR%3==1{x=sprintf("%03d_Test",++i);}{print > x}' abc.txt
The above command results in output files as 001_Test ,002_Test..

I want the output as 001_abc.txt , 002_abc.txt..

Please advise

So change:

awk 'NR%3==1{x=sprintf("%03d_Test",++i);}{print > x}' abc.txt

to:

awk 'NR%3==1{x=sprintf("%03d_%s",++i,FILENAME)}{print > x}' abc.txt

RudiC · March 17, 2018, 6:11pm

Why not

awk '{print > sprintf ("%03d_%s", 1+int((NR-1)/3), FILENAME)}'  abc.txt

, then?

Or even

awk '{print > sprintf ("%03d_%s", (NR+2)/3, FILENAME)}' abc.txt

techedipro · March 23, 2018, 4:38pm

To further extend this..how can i run the awk command in the Korn shell by passing the input file name as a paramter to the script.

#!/usr/bin/ksh
File1=$1
awk 'NR%3==1{x=sprintf("%03d_%s",++i,FILENAME)}{print > x}' $1

I tried the above script but it did not work.

Please advise

RudiC · March 23, 2018, 7:44pm

Why shouldn't that work? If the first positional parameter holds the file name, it should. What errors do you get? Run with the -x option set and post the log.

techedipro · March 23, 2018, 9:11pm

@RudiC

Correct. It worked.

One last piece of puzzle is how can I parameterize the number of records (In my current code it is "3") in the script.


#!/usr/bin/ksh
File1=$1
awk 'NR%3==1{x=sprintf("%03d_%s",++i,FILENAME)}{print > x}' $1

RudiC · March 24, 2018, 6:23am

Use awk 's -v option to pass variables from outside the awk script to the inside .

Don_Cragun · March 24, 2018, 4:49pm

In addition to what RudiC said about passing variables to awk . note also that the 2nd line in your script is unusual. One would usually expect either:

#!/usr/bin/ksh
awk 'NR%3==1{x=sprintf("%03d_%s",++i,FILENAME)}{print > x}' "$1"

or:

#!/usr/bin/ksh
File1=$1
awk 'NR%3==1{x=sprintf("%03d_%s",++i,FILENAME)}{print > x}' "$File1"

If one wanted to use this awk script to process multiple files, one might try the following instead (assuming that the 1st parameter to your script is the number of lines to put in each file and the remaining parameters are the names of the files you want to process:

#!/usr/bin/ksh
ChunkSize=$1
shift 1
awk -v s="$ChunkSize" 'FNR==1{i=0}FNR%s==1{close(x);x=sprintf("%03d_%s",++i,FILENAME)}{print > x}' "$@"

techedipro · March 26, 2018, 10:16am

Thanks Don.

Can you please explain me in detail as to how the code works.

Don_Cragun · March 26, 2018, 7:53pm

Does the following help?

#!/usr/bin/ksh
# Usage: utlity_name Chunk_Size Filename...
# where:	Chunk_Size is the number of lines to write in each output file.
#
#		Filename is the filename (not pathname) of a file to be split
#		into files each of which contains no more than Chunk_Size lines
#		with a name that is a leading zero-filled 3 digit decimal
#		sequence number of the file being split followed by an
#		underscore follwed by Filename.

# Get the chunk size from the 1st operand.
ChunkSize=$1

# Discard the 1st command-line operand leaving just the filenames as the
# remaining operands.
shift 1

# Invoke awk to process the Filename operands given on the command line:
awk -v s="$ChunkSize" '				# Set s to the chunk size.
FNR == 1 {					# When we find the 1st line in
						# an input file...
	i = 0					#   reset the output file
						#   counter for this input file.
}
FNR%s == 1 {					# When we find the 1st input
						# line in a chunk size set of
						# lines in an input file...
	close(fn)				#   close the previous output
						#   file (if there was one)
						#   and...
	fn = sprintf("%03d_%s", ++i ,FILENAME)	#   set the output filename for
						#   the next output file.
}
{						# For every line read from an
						# input file...
	print > fn				#   write the input line into
						#   the current output file.
}' "$@"						# Mark end of awk script and
						# pass in the remaining
						# command-line arguments as the
						# names of the files to be
						# split.

techedipro · March 29, 2018, 10:56am

That is a detailed explanation...Thanks Dan!