Use decimal value of array in bc ends with illegal character

Grille · September 17, 2015, 4:04am

hi all

I have to read a long cvs file every 4 columns with decimal "3,45" numbers.
The 9th row in this cvs is the first line I need, so it I tail -n+9.
I use sed -e 's/,/./g' to get decimal values with . delimiter.

So far no problem.

Goal is to get two maximum negative forces in ranges 56-66 degrees and 33-55 degrees.

Just for better understanding:
one test case is a set of two direction (open and close)
open and close have both an value of angel in degrees and a value force in Newton.

Here is a table header of three test cases. The original csv file has thousands and around 340 rows of angel/force combinations.

;test no 3;;;;test no 2;;;;test no 1;;;;
;process open;;process close;process open;;process close;process open;;process close
;angel;force;angel;force;angel;force;angel;force;angel;force;angel;force
;68,09;8,88;0,4;0,6;68,27;8,46;0,21;0,7;68,27;8,8;0,21;0,64
;67,44;0,19;0,12;3,41;67,72;0,35;0,03;2,65;67,72;0,26;0,03;2,2
;66,62;1,5;0,05;0,73;67,08;0,37;0,24;0,05;66,89;0,28;0,24;0,38
;60,46;-4,39;4,65;0,92;60,46;-4,79;5,02;0,82;60,55;-4,65;5,2;0,92
.....

What I need to get working is the commented if-statement.

#!/bin/bash

csv=${1:-"/home/my/test.csv"}

val1_angel_high=66
val1_angel_low=56

val2_angel_high=55
val2_angel_low=33

no_cols=$(head -n 1 "${csv}"  | awk -F";" '{ print NF}')
no_val_pairs=$((((no_cols--))/4))

while [ $no_cols -gt 1 ]; do
        angelCollection_process_1=( $(tail -n+9 "${csv}" | cut -d ';' -f "$no_cols" | sed -e 's/,/./g') )
        forceCollection_process_1=( $(tail -n+9 "${csv}" | cut -d ';' -f "$((no_cols+1))" | sed -e 's/,/./g') )
        #printf "Process 1 - Angel: %s Force: %s\n" "${angelCollection_process_1[@]}" "${forceCollection_process_1[@]}"

        position=0
        max_force=0.0;
        for t in "${angelCollection_process_1[@]}"; do
                if (( $(bc <<< "$t <= $val1_angel_high") && $(bc <<< "$t >= $val1_angel_low") )); then
                        #echo $t : $max_force">"${forceCollection_process_1[$position]}; #works
                        force="${forceCollection_process_1[$position]}";
                        echo $max_force $force; #works

                        force2=$(bc <<< "$force*100");
                        echo $force2; # not working (standard_in) 1: illegal character: ^M

                        #if (( $(bc <<< "${forceCollection_process_1[$position]} < $max_force") )); then
                        #       max_force="${forceCollection_process_1[$position]}";
                        #fi
                fi
                ((position++))
        done

        echo $max_force

        no_cols=$((no_cols-4))
done

Output when I disable force2
0.0 0.9
0.0 0.96
0.0 1.04
0.0 1.16
0.0 1.1
0.0 1
0.0 0.94
0.0 0.94
0.0 0.98
0.0 1.02
0.0 1
0.0 0.98
0.0 1.04
0.0 0.96
0.0 0.9
0.0 0.72
0.0 0.56
0.0 0.38
0.0 0.22
0.0 0.18
0.0 0.08
0.0 0.04
0.0 0
0.0 0.05
0.0 0.09
0.0 0.15
0.0 0.39
0.0 0.47
0.0 0.77
0.0 -1.21
0.0 -1.53
0.0 -2.71
0.0 -1.87
0.0 -1.75
0.0 -1.91
0.0 -2.03
0.0 -1.81
0.0 -1.71
0.0 -1.53
0.0 -1.55
0.0 -1.57
0.0 -1.45
0.0 -1.45
0.0 -1.49
0.0 -1.45
0.0 -1.53
0.0 -1.31

Hope you find a solution how I can use the value from array in bc.
Other solutions are also welcome. I try to use perl also, but I failed with reading csv every 4 columns.

Example of csv content: ; 000X ; 000Y ; 001X ; 001Y ; 002X ; 002Y - Pastebin.com

RudiC · September 17, 2015, 4:54am

The ^M in the error message indicates that your script won't deal with DOS line terminators. Try sth. like dos2unix to remove those.

As you are using awk anyhow, wouldn't it make sense to consider doing most - if not all - of the processing in awk ?

Grille · September 17, 2015, 6:09am

Thank you RudiC, dos2unix helps.

I already tried with sed to replace ^M, but without success.

sed -e 's/^M//g'
sed -e 's/^M$//'

I get my values I need. Slowly, but it works for now.
To speedup everything using awk or using another script language I will do on weekend

Have a nice day and rest of the week.

RudiC · September 17, 2015, 6:21am

^M is the visible, readable representation of the <CR> (carriage return, 0X0D, \r) character and as such not recognized by sed . Your sed might accept the \r representation?

Don_Cragun · September 17, 2015, 6:28am

If your sed doesn't remove carriage-returns with sed 's/\r$//' "$csv" , try changing:

csv=${1:-"/home/my/test.csv"}

early in your script to:

csv=${1:-"/home/my/test.csv"}
tr -d '\r' < "$csv" > "$csv.$$"
csv="$csv.$$"

and add:

rm -f "$csv"

to the end of your script. Or, if you don't need to keep the input file in DOS format, just change:

csv=${1:-"/home/my/test.csv"}

to:

csv=${1:-"/home/my/test.csv"}
tr -d '\r' < "$csv" > "$csv.$$" && cp "$csv.$$" "$csv" && rm "$csv.$$" || exit 1

Don_Cragun · September 17, 2015, 9:53pm

I am assuming that you are really dealing with angles (the slope between two lines) and forces (instead of angels (benevelant attendant spirits) and forces ) and have changed variable names to match; but I have not changed the typos in your sample input file. I also assume that the val2_* variables are intended to be used to select ranges of angles to be processed using the 3rd and 4th values in each set of 4 values comprising a set (even though these variables are not referenced at all in your sample code).

The code you provided seemed to be saving a minimum value in the variable named max_force instead of a maximum value. The code below sets max_force1 to the maximum value found in the 2nd field in each test set with an angle in in the inclusive range $val1_angle_low <= field1 <= $val1_angle_high and sets max_force2 to the maximum value found in the 4th field in each test set with an angle in in the inclusive range $val2_angle_low <= field3 <= $val2_angle_high . Your code also multiplied force values by 100 for no obvious reason. At the end max_force1 , max_force2 , and the maximum of those two values are all printed. The code below does not modify input values, but does print maximum force values at the end after multiplying them by 100.

I do not know why the tail commands in your code start collecting data on line 9 of your input file. The sample data file given seems to have data starting on line 4 of the. The code below assumes data starts on the line following the 1st line in the file where the 3rd semicolon separated field is the string force (which appears to be the last line of the headers in your sample input).

;angel;force;angel;force;angel;force;angel;force;angel;force;angel;force

The code below makes wild guesses at what in your sample code was desired output and what was intended to be printed only as debugging information. With the unconditional output and the debugging output, I hope that you will find ways to print what you want. I don't know if you wanted a single maximum force value or one value for the data from the 1st two values in each set of values and one value for the data from the 2nd two values in each set. The following codes prints individual and combined data.

The sample data you provided didn't now have any column three values in the range 33 through 55, so no value was selected from the 3rd and 4th values in each set. The following code strips DOS <carriage-return> characters from the input and (when run in a Locale where the LC_NUMERIC category has period as the radix character) can process input files that have period, comma, or a mixture of both as the radix character.

The following code was written and tested using the Korn shell, but should work with any shell that recognizes Bourne shell syntax (rather than csh shell syntax). It won't work with a pure Bourne shell because it needs basic POSIX parameter expansions (copied from your sample script). So, it should work with ash , bash , dash , ksh , zsh , and other shells that recognize the syntax used by these shells.

If you want to enable debugging printouts, switch the line just before the exit at the end of the script with the line just after the exit .

If someone wants to try this code on a Solaris/SunOS system, change awk in this script to /usr/xpg4/bin/awk or nawk .

#!/bin/ksh

csv=${1:-"/home/my/test.csv"}

val1_angle_high=66
val1_angle_low=56

val2_angle_high=55
val2_angle_low=33

awk -F';' -v v1low=$val1_angle_low -v v1high=$val1_angle_high \
    -v v2low=$val2_angle_low -v v2high=$val2_angle_high '
FNR == 1 {
	# Note that we are looking for the end of the header in this file.
	hdr = 1
}
hdr {	# Look for the last header line in this file.
	if($3 == "force")
		hdr = 0
	if(debug) printf("hdr %d deleted: %s\n", FNR, $0)
	next
}
{	# Process data lines...
	# Convert commas to periods (assume European data collector with
	# "," as radix character and American data processor with "." as
	# radix character) and get rid of DOS <carriage-return>s.
	gsub(",", ".")
	gsub("\r", "")
	if(debug) printf("line %d: %s\n", FNR, $0)
	# Process all test sets on the current line...
	for(col = 2; col < NF; col += 4) {
		# Look for val1 angle in range.
		if($col >= v1low && $col <= v1high)
			# Look for new maximum...
			if(found1++) {
				if($(col + 1) > max_force1)
					max_force1 = $(col + 1)
			} else	max_force1 = $(col + 1)
		# Look for val2 angle in range.
		if($(col + 2) >= v2low && $(col + 2) <= v2high)
			# Look for new maximum...
			if(found2++) {
				if($(col + 3) > max_force2)
					max_force2 = $(col + 3)
			} else	max_force2 = $(col + 3)
		if(debug) {
			printf("subscript: %d\n", col)
			printf("range1 cnt: %d, angle1=%s, max=%s\n",
			    found1, $col,
			    found1 ? max_force1 : "undefined")
			printf("range2 cnt: %d, angle2=%s, max=%s\n",
			    found2, $(col + 2),
			    found2 ? max_force2 : "undefined")
		}
	}
}
END {	# Print results.
	printf("# of values found in range1 (%f <= angle1 <= %f): %d\n%s ",
	    v1low, v1high, found1, "Maximum selected range1 value: ")
	if(found1)
		printf("%f\n", max_force1 * 100)
	else	print "undefined"
	printf("# of values found in range2 (%f <= angle2 <= %f): %d\n%s ",
	    v2low, v2high, found2, "Maximum selected range2 value: ")
	if(found2)
		printf("%f\n", max_force2 * 100)
	else	print "undefined"
	printf("# of values found in range: %d\nMaximum selected value: ",
	    found1 + found2)
	if(found1 + found2)
		printf("%f\n",
		    ((found1 && found2) ? \
			max_force1 > max_force2 ? max_force1 : max_force2 : \
			found1 ? max_force1 : max_force2) * 100)
	else	print "undefined"
}' "$csv"
exit
}' debug=1 "$csv"

With the sample data you provided (and debugging disabled), the above code produces the output:

# of values found in range1 (56.000000 <= angle1 <= 66.000000): 3
Maximum selected range1 value:  -439.000000
# of values found in range2 (33.000000 <= angle2 <= 55.000000): 0
Maximum selected range2 value:  undefined
# of values found in range: 3
Maximum selected value: -439.000000