How to remove comments from a bash script?

I would like to remove comments from a bash script. In addition, I would like to remove lines that consist of only white spaces, and to remove blank lines.

#!/bin/bash
perl -pe 's/ *#.*$//g' $1 | grep -v ^[[:space:]]*$ | perl -pe 's/  +/ /g' > $2
#
# $1 INFILE
# $2 OUTFILE

The above code seemed to work at first. Unfortunately, however, I found later that the above code destroys the following two special variables.

${#ARRAY[@]} the number of array elements
$# the number of shell arguments

A workaround is to replace "${#" and "$#" with words that do not appear in the input file before applying the above code.

sed 's/${#/__UNUSUALWORD1__/g ; s/$#/__UNUSUALWORD2__/g' in.txt | \
perl -pe 's/ *#.*$//g' | grep -v ^[[:space:]]*$ | perl -pe 's/  +/ /g' | \
sed 's/__UNUSUALWORD1__/${#/ ; s/__UNUSUALWORD2__/$#/' > out.txt

However, the preparatory replacement is awkward. I would like to modify 's/ *#.*$//g' so that it will not match "${#" or "$#". Does anyone know a better solution?

Bash comments always start with #. However, the problem is that bash allows some exceptions where # does not lead a comment, as shown below.

${#ARRAY[@]} the number of array elements
$# the number of shell arguments
\# escaped by a backslash.
'abcd#efgh' protected by quotes
"abcd#efgh" protected by quotes

Does anyone know how to remove comments from bash scripts without destroying the exempted #'s that do not lead comments? (In addition, I would like to remove lines that consist of only white spaces, and to remove blank lines.)

Many thanks in advance.

Can you please post your script, or some lines of you script?

@arrals_vl
It's the Perl program at the beginning of Post #1 .

egrep -i -v "^#|^$" file

?

I mean the script where you want to remove the comments...not the script that removes the comments.

The problem is, to understand which # are comments and which # aren't, you have to understand the script. "#" '#' should not be stripped out for instance -- or any other time # is not given to the bare line but wrapped inside something. It's possible to wrap # inside quite complex structures, I'm not sure you can check for every possible thing with regexes alone.

What's the best thing out there for understanding scripts? A shell, of course.

There is an option for the shell to print lines as it executes them(which strips out comments). Also, an option to check lines for syntax without actually running them. Unfortunately they seem to be mutually exclusive. Seeing if there's anything else relevant...

I'm curious. Why?

Regards,
Alister

---------- Post updated at 12:21 PM ---------- Previous update was at 12:11 PM ----------

Also, while your post makes it clear that you are aware that naively removing #.* can cause problems, the same is true for a blank line. If it is part of a heredoc or some other quoted string, removing a blank line alters the meaning of the script.

Regards,
Alister

I find it sufficient to strip lines which consist totally of comments or white space when examining a script. This is only for visual inspection because it can still remove significant lines from Here Documents.

Stripping comments on a permanent basis is not advisible at all and could breach copyright in commercial code.

My main purpose for the post #1 is to compare two copies (or two versions) of the script, ignoring comments and meaningless white spaces that do not affect execution.

If commands like diff or cmp provide an option to ignore comments and meaningless white spaces, and if two copies (or two versions) of the script can be compared without permanently removing comments and meaningless white spaces, I would be partially satisfied. Does anyone know how to compare two copies (or two versions) of bash script, ignoring comments and meaningless white spaces? I would appreciate it if you show me such a technique.

Even if diff or cmp has an option to ignore comments and meaningless white spaces, the permanent removal of comments and meaningless white spaces would be useful in the following situation. If they are permanently removed, I would be able to take the resultant copies to a GUI OS. I feel more comfortable to perform the comparison on a graphical environment on a GUI OS, after performing the removal on a text-based Unix machine.

Again, a program that can tell the difference between comments and meaningful code in a shell script, is probably a shell...

I recently got to know that, if a hash mark (#) to begin a shell comment is not placed at the beginning of a line, then the hash mark must be preceded by a horizontal whitespace. (The # mark is also called a "pound sign" in the United States, and a "number sign" as a Unicode name.)

I have never seen any Unix guide documents explicitly state the need for a space before a hash mark (#) to begin a comment. C++ allows comment-leading double slashes (//) to immediately follow another token without any spaces preceding the double slashes. So, I thought that BASH would not require any spaces before a hash mark (#) that begins a comment.

I recently read carefully a man page of bash, which writes, "a word beginning with # causes that word and all remaining characters on that line to be ignored." The phrase "a word beginning with #" obscurely implies that a space is required before # if the comment-leading # is not placed at the beginning of a line.

Now, let me go back to my initial problem. I need to compare two copies of a bash script. One copy is full of detailed comments. The other copy has few or no comments. The two copies differ also in spacing. In terms of command statements that affect execution, the two copies are almost identical. So, I would like to remove comments before comparing the two copies. The following script now gives me most of what I wanted. I named the script "rmcomment.sh".

#!/bin/bash

# SYNOPSIS
# rmcomment.sh INFILE OUTFILE
#
# DESCRIPTION
# This script performs the following:
#
# - removes comments
# - removes all the horizontal whitespaces at the end of line
# - removes lines that consist of only horizontal whitespaces
# - removes blank lines
# - reduces consecutive multiple horizontal whitespaces into one space.
#
# CAVEAT
# While bash does not consider a quote-escaped hash mark to be
# a comment starter, this code is not smart enough to take
# quotes into account.  Thus, this code removes the
# quote-escaped hash mark and all the characters after it on
# the line, if the quote-escaped hash mark is preceded by a
# horizontal whitespace.
#



perl -pe 's/(^|[[:blank:]]+)#.*$//g ; s/[[:blank:]]+$//g' $1 | \
grep -v ^[[:space:]]*$ | perl -pe 's/[[:blank:]][[:blank:]]+/ /g' > $2

Here is a sample input.

# line01 comment1  
line02 # comment2  
line03# word3  
line04 followed by spaces  
    

line07     hello    world
line08 'abcd#efgh'
line09 "abcd#efgh"
line10 'abcd #efgh'
line11 "abcd #efgh"
Note: Lines 1-4 end with spaces.
Note: Line 5 consists of only spaces.
Note: Line 6 is a blank line.