How can I do this in VI editor?

kraljic · March 8, 2013, 5:11am

version info :
vi availabe with RHEL 5.4
I have a text file with 10,000 lines. I want to copy lines from 5000th line to 7000th and redirect to a file. Any idea how I can do this?

Note:
The above scenario is just an example. In my actual requirement, the file has 14 million lines and I want to copy 2000 lines after the 12th million line

BTW.I've just realised that the vi editor we use in RHEL is actually VIM

 
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
$
$ which vi
/bin/vi

franzpizzo · March 8, 2013, 6:18am

Sorry, but you really want do this with vi? Why?
If you use sed I can help you.
Let me know.

kraljic · March 8, 2013, 7:05am

Ok. How can I do this with sed or any other utility?

cero · March 8, 2013, 7:34am

Not sure how sed will react on a 14 million line file but 10k lines should be no problem:

sed -ne '5000,7000p' /path/to/input >/path/to/output

gary_w · March 8, 2013, 9:59am

If sed chokes on it perl may be a better choice.

anbu23 · March 8, 2013, 10:41am

q - Quit the sed script and avoid further processing

sed -ne '5000,7000{p;7000q;}' /path/to/input >/path/to/output

hanson44 · March 8, 2013, 11:10pm

sed would never choke on this task, no matter how many lines. Even billions of lines. sed just reads one line at a time, so the number of lines is totally not an issue. If I'm wrong about this, I'd like to be educated.

Here is similar (but tad simpler) solution to previous valid solution posted:

sed -n "5000,7000 p; 7000 q" infile

alister · March 9, 2013, 10:57am

I don't have time at this very moment to consult an implementation's source or run some tests (I might in a few hours), but I would bet that you are wrong about this. The integral type with which line numbers (current input line number and command address line numbers) are implemented will eventually overflow.

Regards,
Alister

---------- Post updated at 10:57 AM ---------- Previous update was at 09:43 AM ----------

Current GNU sed source uses a function named compile_address() to parse command addresses. For numeric addresses (as opposed to regular expression addresses), this function then calls in_integer() to construct the corresponding integer. [1]

/* Read an integer value from the program.  */
static countT in_integer (int ch);
static countT
in_integer(ch)
  int ch;
{
  countT num = 0;

  while (ISDIGIT(ch))
    {
      num = num * 10 + ch - '0';
      ch = inchar();
    }
  savchar(ch);
  return num;
}

countT is an unsigned long integral type [2]. On 32-bit platforms where long is 32-bit, overflow is not implausible, but on a 64-bit UNIX platform, where the LP64 data model specifies a 64-bit long, even extremely large files shouldn't be an issue. That's not to say that there is no limit. There is. It's just much higher, on the order of billions of billions, 2^64 (1.8446744 x 10^19).

Since GNU tools are ported to Microsoft Windows, it's noteworthy that unsigned long remains only 32 bits even on 64 bit Windows (which uses the LLP64 data model, instead of the LP64 adopted by UNIX systems [3], [4]).

A demonstration of overflow on a 32-bit Debian Linux machine with GNU sed 4.1.5 (yes, this is an old version, but that code above, which matches the following behavior, is from the current repository):

$ echo hi | sed -n 4294967296p
sed: -e expression #1, char 11: invalid usage of line address 0
$ echo hi | sed -n 4294967297p
hi

A simple overflow check in in_integer() could detect all of these problems, generate a useful error message, abort compilation of the sed script before a single line of data has been read, and return a meaningful exit status.

There's a similar overflow issue with the current line number. It is also a countT integer and it is also not checked for overflow when incremented while reading in the next line of text into the pattern space [5].

++input->line_number;

This means that that the 4294967297th line in a file is the second instance of a first line. This will cause commands which were only intended for the first line of the file to match a later line. For example, the simple command 1d will delete multiple lines. Specifically, every line for which the following expression is true: (real line number) % 4294967296 == 1

To summarize, in general, the number of lines in a file and the magnitude of the addresses used in the sed script are both restricted by the underlying data type's range. In particular, GNU sed does a terrible job of handling out of range values. Other implementations may not be much better; I did not check.

Regards,
Alister

References:
[1] GNU sed compile.c
[2] GNU sed basicdefs.h
[3] 64-Bit Programming Models: Why LP64?
[4] Why did the Win64 team choose the LLP64 model?
[5] GNU sed execute.c

durden_tyler · March 9, 2013, 9:29pm

If you have your file open in Vim, and you want to copy lines from 5000 to 7000 both inclusive to a new file, then use the following editor command:

:5000,7000 w /path/to/new/file

hanson44 · March 9, 2013, 9:47pm

Thank you for educating me. I appreciate the incredibly detailed research you did.

You are right eventually the line count gets too high for that unsigned long, and the data overflows. So I was flat wrong in saying "no matter how many lines", or even saying "billions of lines". When I heard the word "choke", I thought the poster was thinking "running out of memory", and perhaps they didn't understand about sed pattern space. I should have done a simple test before speaking.

I would suggest for the vast majority of real-world situations sed will not "choke" on files with many lines. My guess is very few situations use billions of lines. The previous poster cero was worried about a file with 14 million lines. I hope I'm correct in saying it would at least work with that file. But you are absolutely right that it screws up when that unsigned long overflows. And yes, as you pointed out, sed should abort with an error code before reading a line.

Anyway, I apologize for the totally unrealistic assertion "no matter how many lines".