How To Count Fields For Cut?

sharingsunshine · June 8, 2015, 9:05pm

I am new to cut and I want to use the field option with a space delimiter on an Apache log file.

For example, if I wanted to find the 200 HTTP code using cut in this manner on the file below

 cat access_abc.log | cut -d' ' -f7 | grep "200"

157.55.39.183 - - [08/Jun/2015:20:48:02 -0400] "GET /content/696-news041305 HTTP/1.1" 200 14574 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

As I look at the file and count the spaces since that is the delimiter I am using I don't get it being f7. I get f8. I'll use the smileys to signify a space.

157.55.39.183 :)- - [08/Jun/2015:20:48:02:) -0400] :)"GET:) /content/696-news041305 :)HTTP/1.1" :)200 14574 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

There may be an easier way or better way to do this but what I am wanting to know is how to count the fields. As it is now, I can do trial and error and find the field but I want to know the correct way to count them because above it shows 8 spaces and that seems to me to be f8 but the command is f7 that works.

Thanks,

pilnet101 · June 8, 2015, 9:22pm

One of the better tools to use in this instance would be awk. The below example does not need grep or cat (cat is very rarely used at all in most situations), it will print field 7 and the number of fields:

awk '$9~/200/{print $7"\n"NF}' access_abc.log

Don_Cragun · June 8, 2015, 11:21pm

You can't look for 200 using grep after you throw away that text.

With your sample input:

157.55.39.183 - - [08/Jun/2015:20:48:02 -0400] "GET /content/696-news041305 HTTP/1.1" 200 14574 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

the fields recognized by cut with the field delimiter set to the <space> character are:

157.55.39.183
-
-
[08/Jun/2015:20:48:02
-0400]
"GET
/content/696-news041305
HTTP/1.1"
200
14574
"-"
"Mozilla/5.0
(compatible;
bingbot/2.0;
+http://www.bing.com/bingbot.htm)"

So, the command:

 cat access_abc.log | cut -d' ' -f7

and, the much more efficient, equivalent command:

cut -d' ' -f7 access_abc.log

will print:

/content/696-news041305

and, since that does not contain the string 200 , there will be no output from the command:

cut -d' ' -f7 access_abc.log | grep "200"

I have no idea what you mean by the 200 HTTP code and you didn't give us the output you're trying to get. If you would give us sample log file lines matching the criteria you're trying to meet and lines that do not match your criteria (in CODE tags), show us the output you're trying to get from that input (in CODE tags), and clearly explain in English which input lines are to be selected and what output is to be produced from the selected lines, we can probably help you find an easy solution to your problem.

Scrutinizer · June 8, 2015, 11:35pm

If you use a space as a field separator with cut, and there are n spaces then there are n+1 fields, the first one being the one before the first space. If the required field is after 8 spaces, then that is field nr. 9 . So to get the value 200 , you would need to use

cut -d ' ' -f9

Note that cut does not squeeze multiple occurrences of the characters, so every space, means a new space..

sharingsunshine · June 10, 2015, 3:52pm

Thanks to all of you for getting me straight on cut and showing me what I was doing wrong.

I have been wanting to learn awk. If you know if a good tutorial please shoot me a link.

I'll avoid the smiley use for spaces in the future. I like the way you did it so I will use that form if needed next time.

Randal