remove portion of file

methos · April 4, 2002, 7:19pm

Can anyone tell me how to remove a portion of a large file to smaller ones? What I have is a large file that was created becasue several similar files were joined together. Each individual file starts with MSG_HEAD. I want to take everything from MSG_HEAD up to were it says MSG_HEAD again and output it to a new file, so that I have one file for everytime MSG_HEAD appears, plus all the line between occurances. Thanks in advance for any help.

system · April 5, 2002, 2:47am

Tries to use the command "csplit" (see man csplit). If it will not be possible, probably you will have that to make one script using the commands grep, tail, head, etc...

I hope help you

Witt

Kelam_Magnus · April 5, 2002, 11:46am

If they are at irregular segments, you can use vi.

1st figure out each line occurence of your Keyword. From line x to line XX for each section that you want to copy out.

Then use the vi command from within the file while you are vi'ing it.

Once you vi a file, type this line " :x,XXw myfile.out "

This will copy out from line #x to line #XX to the file myfile.out.

It may be a little cumbersome, but it is a useful tool.

I believe there is even a way to write all of these vi commands in a file and execute them while you are vi'ing the file. VI is a very powerful command, I am only scratching the surface of it's capabilities.

sudojo · April 5, 2002, 2:53pm

If this is an ongoing issue you would like to automate, I'd write an awk script.

Basically

^MSG_HEAD{
if (NR <> 1) close( filename )
iterate the filename
}
{
print $0 > filename >
}
END {
close(filename)
}

methos · April 5, 2002, 5:08pm

I am really not familiar with awk. Thanks for you suggestions. With the awk and ^MSG_HEAD, I understand that it will write the entire line, but will it write all the lines that follow until it hits MSG_HEAD again?

Thanks

sudojo · April 5, 2002, 5:33pm

Well you'll probably need to read some documentation.
but, if you put the following code in a file code.awk
and run it with
awk -f code.awk input_file
you should get the desired results.
Awk works by reading each line of the input file and if it matches the criteria on the left it executes the code with the {}
BEGIN matches once at begining, END once at end. ^MSG_HEAD matches a line begining with MSG_HEAD, and no criteria matches for everline.
So the program below gets a new filename for every MSG_HEAD line, and writes out every line to that filename.

BEGIN{ i=1}

^MSG_HEAD {
if (NR <> 1) close( filename )
filename="out_file" i
}
{
print $0 > filename
}
END {
close(filename)
}

system · April 6, 2002, 11:41am

Well, as Harrison Ford would say, "All good suggestions".

csplit, as suggested by witt, seems the best solution here, assuming generic file names are OK:

csplit -f small. -n 4 bigfile '/^MSG_HDR/' '{*}'

Above will split bigfile into:
small.0000
small.0001
etc

The first file will have anything prior to the first MSG_HDR line, so it could be empty. To discard that first section (empty or not), you can use:

csplit -f small. -n 4 bigfile '%^MSG_HDR%' '/^MSG_HDR/' '{*}'

And awk would be the way to go if you wanted to control the new filenames, such as picking something out of the header line.

system · April 6, 2002, 12:19pm

I know this is straying from the original topic a little, but I wanted to say that the functionality mentioned by Kelam_Magnus is really cool and powerful. The command is ! and like many vi commands, is followed by a movement to indicate how much text to operate on. In this case, the defined amount of text is passed to the OS for processing, and all the passed text is replaced by the output of the processing.

Say you have a paragraph of plain text comments, and some lines are too long, some too short. Put your cursor at start of paragraph and type !} which says process all text thru EOP, and when prompted at the colon prompt, type adjust or adjust -m66, and the text will be replaced by the output of /bin/adjust.

Or create a script called addhori.sh:
awk '{printf "%9d%9d%9d%9d\n",$1,$2,$3,$1+$2+$3}'

In vi, place your cursor on the first of the four lines:

5 5 5
3 4 5
22 22 22
9 9 9

and this time let's process the current line plus next 3 lines: !3j
and at the colon prompt, type: addhori.sh

The four lines will be replaced with the awk output, in this case it will be the same 4 lines but formatted and with a total column added. It does not have to be line-per-line replacement. All lines could be replaced with a single line, or the 4 lines above could become the same 4 lines plus a total line below them (addvert.sh), each line could become two lines, whatever.

And of course, just type "u" to undo.

And for added functionality, some of the scripts I write for vi external processing utilize passed parameters.

methos · April 6, 2002, 3:32pm

Thanks for all the help it really works slick!

Kelam_Magnus · April 8, 2002, 3:26pm

I always take compliments!

Thanks for the kudos!

Yes, I am a firm believer that vi is greatly underused and vastly misunderstood as a tool for Admins.

Everyone should have a required course in vi. You can run scripts from within a file that you are editing, go and vi another file and copy in the changed file and then :wq! the one you modified! then save the changes that you copied into the original file.

VI is awesome!

methos · August 15, 2002, 6:33pm

Thanks to everyone for all the help. I do have a few questions. I am using the csplit command such as the following

csplit -f small.head -n 4 MOORDERS '%^COHEAD%' '/^COHEAD/' '{4}'

My problem is that I don't know how many "COHEAD" there will be in the file, sometime just one, other times many more. How can I get around having to specifiy how many time it should repeat? (i.e. "4")

Also, the file I am testing has six "COHEAD" in it and in order to get them all into separate files I have to set the count to 4, why is this and not 6?

Thanks, Methos

RTM · August 15, 2002, 8:36pm

The -n option on csplit is for the following:

-n number
Use number decimal digits to form filenames for the
file pieces. The default is 2.

RTM - it works.

methos · August 15, 2002, 9:03pm

Thanks, but I don't quite understand what you mean. The "4" I am referring is the one at the end of the statement. When I change this value, it outputs more or less files, that is unless I am completely off the mark.

Is there some wild card I can use to make it more flexible?

Thanks

methos · August 16, 2002, 9:26am

I got it to where I can pass in a variable, still don't know why it is less than the number of occurances though. thanks everyone.

csplit -f small.head -n 4 MOORDERS '%^COHEAD%' '/^COHEAD/' {$1}

C-Ya, Methos