Naive coding...

wisecracker · February 18, 2017, 10:47am

"Naive coding."
(Apologies for any typos.)

I came across this phrase a couple of weeks ago and it has made me decide to set off a discussion.

I had never heard of it before but I did some research and discovered that I probably fall into this category.

My phrase is: "I code to work, not work to code!", so therefore I guess when viewing any of my code, pros' must think how primitive some of it looks.

So, what do you guys who code for a living think when you see some attempts of others solving problems and use _brute_force_methods_ for example to solve their coding problems when there are probably more elegant solutions?

Also what is wrong with a naive solution that works for the coder although it may not be anywhere near as fast as another method more obvious to someone else?

The plotting and drawing routines for example on AudioScope can't seem to be done any other way except by _brute_force_...

Comments......

Don_Cragun · February 18, 2017, 4:25pm

From the definition of naive, naive programming isn't necessarily bad programming; it just shows that the person writing the code lacks experience, judgement, or wisdom. In many cases naive programming produces code that is just slower than what a more experienced programmer might right, but sometimes the naive code fails miserably.

Suppose you write code to get the month, day, and year to put in a report that you're about to generate and to use as part of the filename for that report. That could naively be done with:

YYYY=$(date +%Y)
MM=$(date +%m)
DD=$(date +%d)
label="$MM/$DD/$YYYY"
filename="report_${YYYY}_${MM}_${DD}.txt"
...

A more experienced programmer might do the same thing with:

IAm=${0##*/}
read label filename hour <<-EOF
	$(date '+%m/%d/%Y report_%Y_%m_%d.txt %h')
EOF
if [ ${hour#0} -lt 8 ]
then	printf '%s: This program should not be run before 8am.\n' "$IAm" >&2
	exit 1
fi
...

If you run this program every day at noon, will the label and filename variables be set differently with these code segments? Almost certainly, no.

If you run this program every day in a cron job scheduled to run at one minute before midnight what values might you get for those variables on December 31, 2016 when lots of long-running month-end and year-end reports are running at the same time? With the naive code you could get any one of the following four sets of values for those variables:

label		filename
==========	========
12/31/2016	report_2016_12_31
12/01/2016	report_2016_12_01
01/01/2016	report_2016_01_01
01/01/2017	report_2017_01_01

Note that the middle two of those will overwrite daily reports from the start of the previous month or year and the last one will be overwritten by the report written at the end of the day on January 1st (unless the mistake is caught sometime during the day on New Year's day). With the more experienced programmer's code, the last three will never happen, but you might get a diagnostic message instead of the wanted report if the code is started late. When the naive programmer is bitten by this, (s)he learns from experience that the code needs to be made more robust and becomes less naive (and learns that you shouldn't schedule cron jobs to run at one minute before midnight if the code that will be run depends on being run before midnight).

Note that naive code can sometimes run faster than well written code, but fail miserably when unexpected (and unchecked) events occur.

Most of the code we write in this forum assumes that data is in the specified format and skips a lot of error checking that should appear in production code. Many of us should be more careful to point out that the sample code we suggest here should be strengthened with appropriate error checking if the code is going to be used in a production environment. :rolleyes:

MadeInGermany · February 19, 2017, 4:30am

I use "date with eval" instead of "unsafe" here docs (that denote ugly tmp files).

eval `date '+YYYY=%Y MM=%m DD=%d'`
label="$MM/$DD/$YYYY"
filename="report_${YYYY}_${MM}_${DD}.txt"

--
You code like you eat.
Fast food makes you fat, and you risk a heart attack.

wisecracker · February 19, 2017, 7:42am

I have this insane distrust of compilers and interpreters.
So I do what could be called naive coding in most langauages that I know well enough because of this distrust.

This is one example of my naive code and IS actually inside AudioScope.sh.

read -r -p "Set timebase starting point. From 0 to $scan_end<CR> " -e tbinput
# Ensure the timebase values are set to default before changing.
scan_start=0
scan_jump=1
# Eliminate any keyboard error longhand...
# Ensure a NULL string does NOT exist.
if [ "$tbinput" == "" ]
then
	scan_start=0
	tbinput=0
fi
# Find the length of the inputted string and correct for subscript position.
str_len=$(( ${#tbinput} - 1 ))
# Now check for continuous numerical characters ONLY.
for count in $( seq 0 $str_len )
do
	# Reuse variable _number_ to obtain each character per loop.
	number=${tbinput:$count:1}
	# Now convert the character to a decimal number.
	number=$( printf "%d" \'$number )
	# IF ANY ASCII character exists that is not numerical then reset the scan start point.
	if [ $number -le 47 ]
	then
		scan_start=0
		tbinput=0
	fi
	if [ $number -ge 58 ]
	then
		scan_start=0
		tbinput=0
	fi
done

Derivatives of this have never failed under normal conditions on the langauges I have used so it seems idiot proof.
Would professionals like yourselvs consider this puerile coding?
I have no idea if it is possible to have buffer overrun in bash scripting from a 'read', (input), statement.

On searching a few months ago I found this and I do use this method below on small apps to do the same job but I still use longhand derivatives of the above for more serious stuff.
This seems bullet proof but due to my insane distrust of compilers and interpreters I am not sure...

read -r -p "Enter a number, between 1 and 10:- " -e LIMIT
# Error check!
case $LIMIT in
	''|*[!0-9]*)	LIMIT=10 ;;
esac
# If number is valid check boundaries here...
# More code...

MadeInGermany · February 19, 2017, 2:06pm

I don't like the assumption of ASCII codes.
Without it becomes better readable.
Also you are missing a logical OR

for count in $( seq 0 $str_len )
do
	# Reuse variable _number_ to obtain each character per loop.
	number=${tbinput:$count:1}
	# IF a character is not numerical then reset the scan start point.
	if [[ $number < 0 ]] || [[ $number > 9 ]]
	then
		scan_start=0
		tbinput=0
	fi
done

Then you can optimize even the OR

if [[ $number != [0-9] ]]

And then it's only a small step to realize that perhaps the whole loop can be replaced.

Corona688 · February 22, 2017, 10:47am

For one-off scripts and things you only do once, nothing wrong with it. You're writing code to solve a problem, and if the problem's solved to your own satisfaction without severe side effects, who cares?

The problem is, coding that way, with no effort to learn further, builds bad habits. Naive methods are not applicable to all situations, or even most situations. If you only have the heavy-duty sledgehammer in your toolbox, you'll break all the smaller nails.

You're building audioscope.sh as a teaching tool. You've eschewed many modern features because you consider them hard to read.

I think reducing it to a quarter of its length, could make it easier to read.

Saying so doesn't make it so. You've fought tooth-and-claw against anyone who tries to optimize it.

Corona688 · February 22, 2017, 10:58am

wisecracker:

I have this insane distrust of compilers and interpreters.
So I do what could be called naive coding in most langauages that I know well enough because of this distrust.

This is one example of my naive code and IS actually inside AudioScope.sh.
read -r -p "Set timebase starting point. From 0 to $scan_end<CR> " -e tbinput
# Ensure the timebase values are set to default before changing.
scan_start=0
scan_jump=1
# Eliminate any keyboard error longhand...
# Ensure a NULL string does NOT exist.
if [ "$tbinput" == "" ]
then
	scan_start=0
	tbinput=0
fi
# Find the length of the inputted string and correct for subscript position.
str_len=$(( ${#tbinput} - 1 ))
# Now check for continuous numerical characters ONLY.
for count in $( seq 0 $str_len )
do
	# Reuse variable _number_ to obtain each character per loop.
	number=${tbinput:$count:1}
	# Now convert the character to a decimal number.
	number=$( printf "%d" \'$number )
	# IF ANY ASCII character exists that is not numerical then reset the scan start point.
	if [ $number -le 47 ]
	then
		scan_start=0
		tbinput=0
	fi
	if [ $number -ge 58 ]
	then
		scan_start=0
		tbinput=0
	fi
done
Derivatives of this have never failed under normal conditions on the langauges I have used so it seems idiot proof.
Would professionals like yourselvs consider this puerile coding?

That's just about the most difficult way possible to solve the problem. I only resort to it when the language features just can't handle it (i.e. needing to build a recursive parser from scratch).

When you find yourself doing this for trivial things, you're definitely overthinking it. Try inverting the problem. What if you looked for exactly one non-numeric character? You only need to find one to prove the string's bad, and if you can't... fait accompli.

One way:

case "$STR" in 
) echo "Blank" ;;
*[^0-9]*)  echo "Contains non-numeric" ;;
*) echo "Valid" ;;
esac

This is portable across all bourne shells. In BASH, you could reduce it to a single statement.

MadeInGermany · February 22, 2017, 11:12am

Two corrections:

case "$STR" in 
"") echo "Blank" ;;
*[!0-9]*)  echo "Contains non-numeric" ;;
*) echo "Valid" ;;
esac

bakunin · March 5, 2017, 8:16pm

A few observations about programming habits and programming in general:

I think the real difference between naive and non-naive (clever) programming is not so much to make (or make not) use of a certain language construct but the employment of certain algorithms. The example of Corona688 shows that very clearly. The difference in simplicity (and perhaps speed) comes not from using some "clever language feature" a non-expert might not know about, but from the ability to look at the problem from a different angle and draw the right conclusion.

Here is a story from my own practice: I once worked for a development team who created some database application. At one time they had trouble with a daily import routine because it ran approximately 32 hours. (For the non-experts: standard days come with only 24 of them. ) The import script worked like that: first a huge SQL-statement to create some table. This table was fed into a loop where something was changed, then the whole was fed back into the DB. The script looked like this (only the structure):

#! /bin/ksh

db2sql "...some 300 lines of SQL here...." |\
while read record ; do
     newrecord=$(echo $record | sed '....some changing of the record here....')
     db2sql "... import record from $newrecord here..."
done

exit 0

There is nothing "logically wrong" here and when you test it with 5 records it is probably as fast as any other solution. Doing it with some ten-million-records table, though, will reveal that calling sed one time for every record is slightly slower (by several orders of magnitude) than to call it once in a pipeline:

db2sql "...some 300 lines of SQL here...." |\
sed '....some changing of the records here....' |\
while read newrecord ; do
     db2sql "... import record from $newrecord here..."
done

So, this was my first instinctive comment and (we tested it later) it would have brought down the processing time by ~3.5 hours. But i worked on this together with the DBA and he observed inside the 300-line-SQL-monster (which i had ignored, being an SQL-ignorant) that most of it was effectively creating a left outer join of the whole database on itself and then dropping 99% of it, So, after selecting what really was to be selected - and finding out that the changes could be done inside instead of using the shell and an external program to do it - - we arrived at:

db2sql "select for update ...some 20 lines of SQL here...."

which took - about 5 seconds. All the while management was calling for bigger hardware to "meet increasing needs" instead of just firing this incompetent click-boy who confused some graphic representation of the DB within a tool (Informatica) with reality.

So, what is the differrence between clever and naive programming? It is about the same difference as 32 hours and 5 seconds.

My two cents.

bakunin

cb88 · May 24, 2017, 9:26pm

I wrote a rather naive snake implementation... it worked by each position being logged in an array... so each time you move a for loop scanned the array for collisions it would take a little longer. The target machine was a 50Mhz Sparcstation LX so... I thought nothing of it when writing it but it became obvious when running the game. After about 100 indexes it would get so slow you could just about walk away and get coffee, perhaps I exaggerate a bit but still! Rather than using an array of previous positions, I probably should have allocated an array the size of the play area and marked them traversed or not.

Corona688 · May 25, 2017, 5:44pm

Why was it looped? If you know where the snake is and where it's going, there's only one cell to check.