When I remove the wc -l I see the output as below:
[casupport@docvlapph005 ca]$ grep '^' TheAgileApproach.dat
[casupport@docvlapph005 ca]$ grep '^' TheAgileApproach.dat
String Filter increment. The most basic implementation that can be used for all data types is shown on the far left. Each version to the right of that adds more functionality.
[casupport@docvlapph005 ca]$
As you can see in the output of point two it brings the whole line but also the word shown is included which should have picked up with the first command?
I want to run so it reads the whole script and prints out the words which start with S or s - what would I edit to do that? I assume I wouldn't use wc -l either?
If you are trying to learn what can be done by grep ( because others would suggest use sed or awk...)
As I only have my mac laptop at the moment this is what I would do if were to use only grep:
If you let me the time to try... I will come back with the result
Im back, result:
$ cat TheAgileApproach.dat
String Filter increment. The most basic implementation that can be used for all data types is shown on the far left. Each version to the right of that adds more functionality.
$ grep -ie"^s" -e" s" -o TheAgileApproach.dat|wc -l
2
Addendum
If it works for you , to understand try little bit of the line at a time and see its output...
What about if I want to print a list of the words that begin with S as when i do it without wc -l it comes up with a list of s
[casupport@docvlapph005 ca]$ grep -i -e"^s" -e" s" -o TheAgileApproach.dat
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
S
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
Here are the important parts of a script that seems to do what you wish (including a sample data file). It runs twice, once considering the underscore as a separator, then as a character:
# Utility functions: print-as-echo, print-line-with-visual-space.
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
pl " Input data file $FILE:"
head $FILE
pl " Results:"
tr -s '[[:punct:][:space:]]' '\n' < $FILE |
tee t1 |
grep '^[sS]' |
tee t2 |
wc -l
pl "Content of intermediate files (columnized by local utility):"
for f in t?
do
pl " File: $f:"
my-columns $f
done
pl " Results, considering "_" as a character:"
# tr -s '[^\w\s_]' '\n' < $FILE |
grep -o -P '[\w_]+' $FILE |
tee t1 |
grep '^[sS]' |
tee t2 |
wc -l
pl "Content of intermediate files (columnized by local utility):"
for f in t?
do
pl " File: $f:"
my-columns $f
done
producing:
$ ./s1 data3
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-7-amd64, x86_64
Distribution : Debian 8.11 (jessie)
bash GNU bash 4.3.30
tr (GNU coreutils) 8.23
grep (GNU grep) 2.20
-----
Input data file data3:
(2) SHALL we see?
(3) Is Sheriff Nokill allowing us to Shoot on Sight.
(4) We are agin "Shoot on site" but OK with "shoot on Sight".
(1) Nothing here to See, move along.
(0) Go USA!
(1) un_Sharpened.
(1) un-Sharpened.
(1) Sharp
(13) total
-----
Results:
13
-----
Content of intermediate files (columnized by local utility):
-----
File: t1:
see Nokill Shoot We on with 1 See Go Sharpened 1
2 3 allowing on are site shoot Nothing move USA 1 Sharp
SHALL Is us Sight agin but on here along 1 un 13
we Sheriff to 4 Shoot OK Sight to 0 un Sharpened total
-----
File: t2:
SHALL Sheriff Sight site Sight Sharpened Sharp
see Shoot Shoot shoot See Sharpened
-----
Results, considering _ as a character:
12
-----
Content of intermediate files (columnized by local utility):
-----
File: t1:
2 Is to We site on to Go un total
SHALL Sheriff Shoot are but Sight See USA Sharpened
we Nokill on agin OK 1 move 1 1
see allowing Sight Shoot with Nothing along un_Sharpened Sharp
3 us 4 on shoot here 0 1 13
-----
File: t2:
SHALL see Sheriff Shoot Sight Shoot site shoot Sight See Sharpened Sharp
How about, given your grep accepts "regular expression extensions" like \b that are not necessarily available in all systems / versions:
grep -o "\b[sS][^ ]*" <<< "String Filter increment. The most basic implementation that can be used for all data types is shown on the far left. Each version to the right of that adds more functionality."
String
shown
Replace the "here string" with your input file when applying / testing it on your system.
It helps to know what operating system and shell you're using since the utilities on various operating systems have options that might help with what you are trying to do that are not available on other operating systems. Whenever you start a new thread here, please always tell us what shell and operating system you're using.
We need a much clearer definition of what you consider to be a word starting with "s" or "S". Is a word just alphabetic characters? Can hyphens be included in a word (e.g., sub-sonic)? Can numeric characters be included in a word (e.g., "straight-6")? Can apostrophes be included in a word (e.g., "she's")? When apostrophes can be included in words, how are we to distinguish between a phrase surrounded by single-quotes and a word containing a hyphen? (Note that regular expressions in grep can only look at a single line and quoted strings can cross many line boundaries.)
And, if we can't see a sample of the data you're working with and the output you expect to get from it, we have no way to verify that anything we might suggest might work for you. Please give us a representative sample input and the corresponding exact output you hope to produce from that input.