Grep Command

dwgi32 · July 29, 2009, 6:54am

Hi,

I have encountered a problem with grep command. The max characters that grep can supported is 2048 as defined in LINE_MAX in hp-ux. I try setting $TK_GREP_LINE_MAX but this is not workable in HP-UX, anyone has experienced with setting the max characters supported by grep command.

Well, grep can be replace with sed, but for curiosity, can anyone give their kind advice on how it can be done using grep

Thanks a lot.

Regards

pileofrogs · July 29, 2009, 1:30pm

What are you actually trying to do? I don't know anything about LINE_MAX, but I might be able to help with the more general question?

If LINE_MAX is the limit of the pattern you can define, that doesn't surprise me, since a very long pattern would be a huge job to process. It does surprise me if it's the limit of a line it can process within a file, and in fact, that makes no sense at all. It can't skip to the next line without first finding the next \n, so it would have to process that far anyway... Maybe it just stops processing and reads to the next \n? That's just plain weird...

Hmmm... the man page for grep on Linux doesn't mention LINE_MAX, but (..google...) the one for sun does... hmmm... undefined behaviour... nice

Another option would be a perl one-liner. See the -p switch in the perlrun man page.

dwgi32 · July 29, 2009, 9:11pm

Hi pileofrogs,

I am trying to do a selection of lines in the file that fullfilled a matching prefix i.e. [ABC]. Initially, i don't know about the limitation on the number of characters permitted by grep command until i trying to grep a line with 7000 characters and i found that it only return 2048. So i perform a google and found that the command is actually limited by the macro define in ulimit.h under the system include folder.

The man page does not mention about it but it does breifly touch on environmental varaibles hence i am trying to find out if anyone has experiences with manipulation of the environmental variable in grep.

Cheers

methyl · July 30, 2009, 8:39am

Please post the script. Truncation to 2048 characters should not happen with grep in a modern HP-UX. Is this a clean plain text file with no extra control codes and a proper record terminator of line-feed ?

For those who are interested, LINE_MAX is mentioned in "man 5 limits" and /usr/include/limits.h .

Beaknit · August 2, 2009, 11:45am

Perl is probably your best bet. It will consume as much system resources as are available - by design. So it'll handle strings as long as you want.

drl · August 2, 2009, 1:55pm

Hi.

I don't use HPUX much, but I keep a login for comparisons.

Here is a script that was run on an HP. I downloaded a version of grep that was written in perl. I created a 3-line data file. The first and last lines are quite short, adding up to 11 characters (with newlines), and the middle line is several thousands of characters long. That second line contains a "9", which will be the string for which we will search:

#!/usr/bin/env bash

# @(#) s1       Demonstrate perl version of grep.
# Found at:
# http://cpansearch.perl.org/src/CWEST/ppt-0.14/src/grep/tcgrep

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) grep ./tcgrep
set -o nounset
echo

FILE=${1-data1}

echo " Data file, first and last line, counts $FILE:"
head -1 $FILE
tail -1 $FILE
wc $FILE

echo
echo " Results standard grep:"
time grep 9 $FILE |
wc 

echo
echo " Results perl grep:"
time ./tcgrep 9 $FILE |
wc 

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: HP-UX, B.11.00, 9000/712
Distribution        : GenericSysName [HP Release B.11.00] (see /etc/issue)
GNU bash 2.05b.0
grep - ( /usr/bin/grep Nov 7 1997 )
./tcgrep - (local: ./tcgrep Jul 29 17:19 )

 Data file, first and last line, counts data1:
First
Last
3 3 6905 data1

 Results standard grep:
1 1 6894

real    0m0.091s
user    0m0.060s
sys     0m0.030s

 Results perl grep:
1 1 6894

real    0m1.349s
user    0m1.190s
sys     0m0.140s

There are several conclusions to be drawn here. The system grep has returned more than 2048 characters. Both the system and the perl grep extracted the same line, and the character count is the same. I agree that perl can handle very long lines, but it uses more resources.

This is evidence that HPUX grep (for this combination of versions) did what was expected.

The URL for the perl version of grep points to CPAN, a large repository of perl code. The options in tcgrep are generally not the same as all system versions of grep. However, it ran correctly directly "out of the box".

Best wishes ... cheers, drl