Hi SkySmart,
To avoid using any external utilities with most shells written since 1985 (including bash
and ksh
), I would use something more like:
#!/bin/ksh
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"
mea="i am on the first line"
meb="i am on the second line"
found=
printf '%s\n' "$MASSIVETEXT" | while read -r line
do
case "$line" in
(*$mea*)
echo "$line"
exit;;
(*$meb*)
found="$line";;
esac
done
[ -n "$found" ] && printf '%s\n' "$found"
to do what I think you're trying to do. I.e., to search the entire variable contents for a match for $mea
before looking for any match for $meb
and to only print the first line in $MASSIVETEXT
that matches the appropriate pattern.
Although echo
and printf
are almost always provided as built-ins in recently developed (i.e., since the 1970s) shells, they are required by the standards to also be available to be executed as stand alone utilities in one of the directories listed in the POSIX-compliant default setting for the PATH environment variable. This is true for all utilities defined by the standards except for the special built-in utilities: break
, :
, continue
, .
, eval
, exec
, exit
, export
, readonly
, return
, set
, shift
, times
, trap
, and unset
. (Beware, however, that not all systems conform to the POSIX requirements. You may find some systems that don't have all required utilities available as a stand-alone utility; read
is an example of a utility that is missing on some non-conforming systems.) To determine whether a given utility is built into your shell, the standards say you can use:
type utility_name...
With ksh
(version 93u+) on macOS 10.13.2, the command:
type echo printf set cat
produces the output:
echo is a shell builtin
printf is a shell builtin
set is a special shell builtin
cat is a tracked alias for /bin/cat
while with bash
(version 3.2.57) on the same system, the output produced is:
echo is a shell builtin
printf is a shell builtin
set is a shell builtin
cat is /bin/cat
Note that bash
doesn't distinguish between regular built-ins and special built-ins like ksh
does. Both meet the required specifications in the standards for the output produced by type
.
Hi bakunin,
Note that SkySmart seems to only want to print the 1st line in $MASSIVETEXT
that is matched by $mea
and, only if no match for that fixed string is found, then print the 1st line in $MASSIVETEXT
that matches $meb
. The code you suggested will print every line matching either fixed string. Using only standard grep
options, the above code may well be faster than fgrep
even for medium sized contents of the variable MASSIVETEXT
especially if neither fixed string is present in the file or if the 1st fixed string appears early in $MASSIVETEXT
.
Since SkySmart hasn't told us what OS and shell are being used, we would need to just use standard options leading to something like:
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"
mea="i am on the first line"
meb="i am on the second line"
{ printf '%s\n' "$MASSIVETEXT" | grep -F "$mea" ||
printf '%s\n' "$MASSIVETEXT" | grep -F "$meb"
} | { read -r line
[ -n "$line" ] && printf '%s\n' "$line"
}
which should work with any POSIX-conforming shell and grep
utility. On many systems, the following would frequently be much faster, but it depends on the non-standard -m max_count
option being supported by the user's grep
utility:
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"
mea="i am on the first line"
meb="i am on the second line"
printf '%s\n' "$MASSIVETEXT" | grep -F -m 1 "$mea" ||
printf '%s\n' "$MASSIVETEXT" | grep -F -m 1 "$meb"
Note that if neither string is present in the file, you have to invoke grep
twice and read the entire "massive" text twice.
And, the shell being used hasn't been specified either. With a recent bash
or ksh
, the above scripts could all avoid most of the invocations of printf
s by using here-strings:
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"
mea="i am on the first line"
meb="i am on the second line"
found=
while read -r line
do
case "$line" in
(*$mea*)
echo "$line"
exit;;
(*$meb*)
found="$line";;
esac
done <<<$MASSIVETEXT
[ -n "$found" ] && printf '%s\n' "$found"
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"
mea="i am on the first line"
meb="i am on the second line"
{ grep -F "$mea" <<<$MASSIVETEXT ||
grep -F "$meb" <<<$MASSIVETEXT
} | { read -r line
[ -n "$line" ] && printf '%s\n' "$line"
}
MASSIVETEXT="i am on the first line
i am on the second line
I am on the third line"
mea="i am on the first line"
meb="i am on the second line"
grep -Fm1 "$mea" <<<$MASSIVETEXT || grep -Fm1 "$meb" <<<$MASSIVETEXT
Hi rovf,
On every system I've seen, for a large amount of data (which one might assume from a variable named MASSIVETEXT
), there is a noticeable difference in performance between grep -F
(fastest), grep
without -E
and without -F
(slower), and grep -E
(slower still). However, with fixed strings as REs, I don't usually see much difference between plain grep
and grep -E
. I don't have any experience with where grep -P
fits into the speed spectrum on systems that include support for perl
's RE extensions in grep
.