Grep line with all string in the lines and not space.

Junes · August 9, 2012, 9:26am

I want to write the syntax so does not count line with no space.

So currerntly it is showing lines as 5, but i want to show 4.


# cat /tmp/mediacheck | sort -u  | grep -vi " " | awk '{print $1}' | wc -l

BA7552
BAA002
BAA003
BAA004

jim_mcnamara · August 9, 2012, 9:58am

Does the line you do not want have blanks (ascii 32) in it?

Please post the output of

od -c /tmp/mediacheck

od makes a big block of data on the screen, we need to see your problem line and a good line. Probably not the entire file.

Don_Cragun · August 9, 2012, 10:05am

Is the text below the commented command the contents of /tmp/mediacheck? I don't see any spaces on any line. There are no duplicate input lines in that file. Why do you need to sort it? If you just want a count of lines that aren't empty. This is much simpler than your current code:

awk '/./ {cnt++} END {print cnt}' /tmp/mediacheck

If there are more fields in your input and you really meant that you only want to count lines that contain at least one <space> character, please give us a real example of what is in /tmp/mediacheck.

Corona688 · August 9, 2012, 12:52pm

The output you have posted makes no sense for the line you have pasted. wc -l would print a number.

It's possible that a tab or something has convinced awk that the first field is blank, rather than a space...

If you're using awk, there's no point using grep too. awk can check if something's blank by itself and thereby shrink your pipe chain.

Also see Useless Use of Cat.

sort -u < /tmp/mediacheck | awk '$1 { print $1 }'

Junes · August 10, 2012, 2:43am

The following worked for me to count the "/tmp/mediacheck" file.


awk '/./ {cnt++} END {print cnt}' /tmp/mediacheck

But i was trying to achieve is when i enter media string into a file "/tmp/mediacheck", it leaves a space at the end of the file. I want to have the script ignore the character space at the end of the file.

How can i have the script cat "/tmp/mediacheck" and ignore the EOF character space.

Corona688 · August 10, 2012, 9:26am

Have you tried my code? Does it ignore your space?

Don_Cragun · August 10, 2012, 9:28am

junes:

The following worked for me to count the "/tmp/mediacheck" file.
awk '/./ {cnt++} END {print cnt}' /tmp/mediacheck
But i was trying to achieve is when i enter media string into a file "/tmp/mediacheck", it leaves a space at the end of the file. I want to have the script ignore the character space at the end of the file.

How can i have the script cat "/tmp/mediacheck" and ignore the EOF character space.

Your terminology does not match the language used when writing documentation describing the behavior of UNIX and Linux systems. Therefore, we do not understand your question.

The cat utility never ignores any character. It has options (which vary from system to system) that allow it to transform certain characters, to add an indicator that visually displays non-printing characters, to add line numbers, or to squeeze sequences of multiple adjacent blank lines to a single empty line; but it never "ignore"s characters.

The EOF character and the space character are not the same. In ASCII and in Unicode's UTF-8 encoding, there is no EOF character (it is a condition encountered after reading the last byte from an input file, and <space> is an 8-bit character having the decimal value 32.

You said that the command that I supplied earlier:

awk '/./ {cnt++} END {print cnt}' /tmp/mediacheck

is doing what you wanted. The UNIX terminology for what this command does is:

It uses the awk utility. It does not need the cat, sort, grep, or nl utility to get the job done; and the output it produces is nothing like the output you showed when you asked for help.

The title of this thread is "Grep line with all string in the lines and not space."
By definition a line is a string of one or more characters terminated by a <newline> character. By definition an empty line is a line whose only character is the <newline> character.

The command:

grep -vi " "

in the pipeline in your question copies standard input to standard output deleting input that contains an uppercase or lowercase <space> character.
(And since <space> is not an alphabetic character, the -i option has no effect in this command.) Since you said you were getting 5 when you wanted 4, and you showed 5 lines after the pipeline, I assumed that you were trying to ignore empty lines and print a count of the other lines. If this was your goal, another way to do it that looks more like your original pipeline would have been:

cat /tmp/mediacheck | sort -u  | grep -v "^$" | awk '{print $1}' | wc -l

The cat command is not needed. It creates an additional process and reads the input and copies it to standard out. This could have been done by making /tmp/mediacheck an operand to sort, grep, or awk or by using the shell to redirect the input of any of these commands.

The sort command transformed your input file to:

BA7552
BAA002
BAA003
BAA004

removing all duplicated lines from the output. But since it seems that there were no duplicate lines in your input file and the order of the lines in the file doesn't make any difference to anything in the rest of your pipeline, calling sort creates another process, causes the input to be read again, sorted, and written again but in no way affects the final output with the input shown above.

The grep command read the input again and wrote every line that did not contain a <space> character to its standard output. Since you show 5 input lines and you reported by wc -l displayed 5, we can assume that none of the lines in your input file contained a <space> character. Therefore, calling grep created another process, read the file contents again, and wrote the file contents again without affecting the results of the pipeline.

The awk command in your pipeline printed an output line for each input line that it read from its standard input. If there had been more than one input field on one or more of the input lines, it would have thrown away all fields following the first field. But, since none of your input lines had more than one field, all this command did was create another process, read the data again, and write the data again without affecting the result of invoking this pipeline.

So with the input you had, the pipeline you provided could have been replaced by the command:

wc -l /tmp/mediacheck

and produced the same results you got.

Using the UNIX philosophy of providing filters in a pipeline to perform the steps needed to complete a task, you could have gotten the results you wanted with the command:

grep -v "^$" /tmp/mediacheck | wc -l

which starts two processes, reads (most) of the data twice, writes most of the data once, provides a count of the non-empty lines. Your original pipeline started five processes, read all of the data five times, wrote all of the data four times. A straight translation of this pipeline into a awk script would be:

awk '/^$/ {next}
        {cnt++}
END {print cut}

which only starts one process, reads the data once, and never write anything but the desired count of non-empty lines. It does this by throwing away empty lines and counting the remaining lines. Using:

awk '/./ {cnt++}
END {print cut}

ignores lines that are empty and counts lines that are not empty using one less line of awk programming with the same results.

I hope this helps you understand why the solutions we provided didn't use cat or grep and why we tried to get you to explain where the <space> characters were that you were talking about (when you never showed us any <space> characters in your input).