Format your scripts with shfmt

I've been working on a tool to format (style) shell programs, much like gofmt for Go. After much testing and personal use, I'm throwing it out there to see if it's useful to anyone else.

GitHub - mvdan/sh: A shell parser and formatter in Go

You'll need golang to build and install it. If this is an issue for anyone, I can upload cross compiled binaries somewhere.

The style itself is influenced both by what's most used in the wild and Google's shell style:

https://google.github.io/styleguide/shell.xml\#Formatting

Feedback appreciated, and of course feel free to report any issues.

I will try to give you program a test run but lacking go it will take me a few days.

To be honest i haven't noticed this styleguide at all until today. Several things in there are IMHO rather questionable and my suggestion is that you shouldn't rely on that at all. Here are my main objections:

  • Bash is the only shell scripting language permitted for executables.
    Nonsense! Many applications prescribe a certain shell (SAP for instance enforces csh for system users) and to maintain a supported system you have to respect that, even if (conceivably) unhappy about it. Other systems (like the AIX i work on) use ksh , which IMHO is even better suited to shell programming than bash . But that is my opinion and to say "use this shell" or "use that shell" is like saying "only walk with your left foot" - ridiculous.
  • Shell should only be used for small utilities or simple wrapper scripts.
    ROFLMAO! Some of the scripts i maintain are several thousand lines long and contain calls to a shell library of functions which also is several thousand lines of code worth.
  • [[ ... ]] is preferred over [, test and /usr/bin/[.
    It is exactly the other way round: use "[ ... ]" instead of "[[ ... ]]" if you want to write portable scripts.
  • Indent 2 spaces. No tabs.
    I can understand the "no tabs", but 2 spaces is uncomfortable for some. I use 5 spaces, because it makes indentation standing out clearly (IMHO), but again: this is personal preference. A reformatter should be configurable in this regard. The same goes for:
  • Put ; do and ; then on the same line as the while, for or if.
    I like that style better than putting do/then in the next line, but that also is personal preference! It should be configurable in a reformatter.
  • Use process substitution or for loops in preference to piping to while.
    This takes the biscuit! Consider for foo in $(command) ; do : if command produces too many output words your foor-loop will crash with a "too many arguments" while the while-loop with the pipeline has no such limitations.
  • Naming Conventions
    This is just one more way to do things in a somewhat consistent way. Personally i am a fan of Hungarian-style Notation, which i use in a variation suited to shell programming: "ch" as prefix for strings (char), "i" for integers, "f" for file-/directory names, "p" (procedure) for local functions, "f_" for library functions, etc. Any system will do, as long as you consistently use it. Personally i also find Mixed case to be easier to read and write than the underscore-separated words. If pReadAttributeFromItem() or read_attribute_from_item() is better is everybodies guess, but i think it is personal preference.

I hope this helps.

bakunin

1 Like

Note that nearly all of the items you're commenting on do not concern formatting at all - like limiting to bash, what scripts should be used for or [[ over [.

Some comments:

  • shfmt (the parser) works with bash, which should work just as fine for POSIX shell scripts
  • shfmt supports indentation with tabs or any number of spaces (see README)
  • It will not replace [ with [[. It does not modify any code - it just formats. Same goes for process substitution in for loops. And for naming conventions.

Regarding spacings and newlines being configurable - I'm trying to keep the number of options to a minimum. The way it keeps "do" and "then" on the same line is because that's what is most common and also found in the most popular style guides.

Hi bakunin...

Ha ha, I agree.
There is at least one large script on here, URL below. <wink>

Hi mvdan...

You might like to test your formatter with this:-

And have some fun...

1 Like

wisecracker, that fails right now because I wasn't aware of $[. Apparently, it's a very old and deprecated form of $((:

In any case, will add support for it now.

First bug? <wink>

Inside the code are nothing but tabs for indentation.
Why? - Because my tab is equivalent of 8 whitespaces and just imagine the size of the file with all of those tabs replaced by useless whitespaces.
Also it contains for - do - done on seperate lines.
Also it contains while - do - done and others on seperate lines.
Also it contains if - then - else - fi and derivatives on seperate lines.
Also it contains somefunction() - { - } on seperate lines.
Also it contains ` somecommand ` , also deprecated.
Although items might be deprecated does not mean they are not in use - be aware.

EDIT:-
I forgot to add that this script works on current bash versions on various platforms along with CygWin too.
So replace the word 'deprecated' with 'hidden' until it is completely removed from the bash shell which I suspect will not be for a long while yet...

None of this matters. The original format is of little significance, since the program is written to change it.

Deprecated means "don't use, but still supported". I doubt they will ever remove support for these, as that would mean a breaking change.

How will your parser cope with this line and others in that script you have to test with?:-

printf "%b" "\x52\x49\x46\x46\x24\x00\x01\x00\x57\x41\x56\x45\x66\x6d\x74\x20\x10\x00\x00\x00\x01\x00\x01\x00\x40\x1f\x00\x00\x40\x1f\x00\x00\x01\x00\x08\x00\x64\x61\x74\x61\x00\x00\x01\x00" > /tmp/binary.bin

This is also in octal format too in another script!
The binary is 44 bytes in size.

wisecracker, you can split long lines like that with a backslash:

$ echo "asdf\
> qwertyuiop\
> qe7158905189"
asdfqwertyuiopqe7158905189

$

This does not actually add newlines to the resulting string, as you see.

I hope I don't have code formatting practices imposed on my scripts... in my experience enforced formatting practices usually make things less readable and harder to edit.

1 Like

> How will your parser cope with this line and others in that script you have to test with?

The parser of course does not have problem with long lines.

The formatter does not do anything about long lines, as there is no good solution to split lines. This should be done by humans, if they wish to.

> in my experience enforced formatting practices usually make things less readable and harder to edit.

Less readable is subjective, and harder to edit is until you get used to it. In my opinion, the tradeoff is worth it once there are multiple people maintaining a software. Otherwise, you can easily get to a point where pieces of the software are written in wildly different formats.

Diffs become bigger than they need to be, merge conflicts on the rise, readability decreases as noone is happy, etc. Those are simply irrelevant for scripts you develop on your own.

1 Like

I am actually well aware of that except I was wondering if the parser would chop at exactly 80 characters of which case the could corrupt the data if in mid flight inside a 4 character '\x??' string.

I treid it and it does corrupt.

printf "%b" "\x52\x49\x46\x46\x24\x00\x01\x00\x57\x41\x56\x45\x66\x6d\x74\x20\\
x10\x00\x00\x00\x01\x00\x01\x00\x40\x1f\x00\x00\x40\x1f\x00\x00\x01\x00\x08\x0\
0\x64\x61\x74\x61\x00\x00\x01\x00" > /tmp/binary

Results:-

Last login: Fri Sep  9 19:22:55 on ttys000
AMIGA:barrywalker~> cd Desktop/code/Shell
AMIGA:barrywalker~/Desktop/code/Shell> chmod 755 80charperline.sh
AMIGA:barrywalker~/Desktop/code/Shell> ./80charperline.sh
AMIGA:barrywalker~/Desktop/code/Shell> ls -l /tmp/binary
-rw-r--r--  1 barrywalker  wheel  48  9 Sep 19:29 /tmp/binary
AMIGA:barrywalker~/Desktop/code/Shell> hexdump -C /tmp/binary
00000000  52 49 46 46 24 00 01 00  57 41 56 45 66 6d 74 20  |RIFF$...WAVEfmt |
00000010  5c 0a 78 31 30 00 00 00  01 00 01 00 40 1f 00 00  |\.x10.......@...|
00000020  40 1f 00 00 01 00 08 00  64 61 74 61 00 00 01 00  |@.......data....|
00000030
AMIGA:barrywalker~/Desktop/code/Shell> _

As it is mvdan has adequately explained his position.

It does not. Not even when I abuse it by chopping in the middle of a hex code.

#!/bin/sh

printf "%b" "\x52\x49\x46\x46\x24\x00\x01\x00\x57\x41\x56\x45\x66\x6d\x74\x20\x10\x00\x00\x00\x01\x00\x01\x00\x40\x1f\x00\x00\x40\x1f\x00\x00\x01\x00\x08\x00\x64\x61\x74\x61\x00\x00\x01\x00" > bin1

printf "%b" "\x52\x49\x46\x46\x24\x00\x01\x00\x57\x41\x56\x45\x66\x6d\x74\
\x20\x10\x00\x00\x00\x01\x00\x01\x00\x40\x1f\x00\x00\x40\x1f\x00\x00\x01\
\x00\x08\x00\x64\x61\x74\x61\x00\x00\x01\x00" > bin2

printf "%b" "\x52\x49\x46\x46\x24\x00\x01\x00\x57\x41\x56\x45\x66\x6d\x7\
4\x20\x10\x00\x00\x00\x01\x00\x01\x00\x40\x1f\x00\x00\x40\x1f\x00\x00\x01\
\x00\x08\x00\x64\x61\x74\x61\x00\x00\x01\x00" > bin3

diff bin1 bin2 || echo "bin1 and bin2 differ"
diff bin2 bin3 || echo "bin2 and bin3 differ"
diff bin1 bin3 || echo "bin1 and bin3 differ"

You used two backslashes, and in doing so, inserted a literal backslash, followed by a literal newline.

You're correct, though, how we feel about coding standards has nothing to do with his script. I apologize.

1 Like