Experimental awk audio converter for CygWin and AudioScope.sh

wisecracker · October 31, 2014, 5:39pm

Development machine:- Standard MBP 13 inch, OSX 10.7.5...
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin11)
Copyright (C) 2007 Free Software Foundation, Inc.

Scenario:- Audio capture for AudioScope.sh for CygWin without ANY third party installs.

I am trying my hardest to get a very fast CD quality to 8 bit mono converter for CygWin.
In the process I am learning awk with a difference.

I have a shell version but it is very slow speed wise making it totally unusable for CygWin.

I have a fully working embedded audio capture using SoundRecorder.exe for CygWin using Windows Vista and 7, and I assume 8.x too.
It autosaves as a .WAV file, stereo, 16 bit signed depth at 44100 Hz sampling rate.
There are NO default tools to convert the audio to my usable state so awk is my only way for CygWin.
(There are for OSX and Linux however.)

I needed to convert this to a raw file at quantised 8 bit unsigned depth.

My MBP awk does not have the function strtonum() so the code below is a workaround.
With my limited knowledge of awk, I could only do this in two separate awk stages.

As 'dd' and 'od' ARE part of a default CygWin install then creating a pseudo file was easy.

using 'od' in the script creates a very large file formatted exactly like this:-

            31148    7375       4  -25735   10869  -24167   14517  -11227
            -6133   -2066    9627  -17687    4971    6743   29567    -635
            -6794  -25374   -2041   26752  -16123   -4752   13322  -30519
            23023  -16552    8628  -31734   -8601   13902    1350    4574
            27658  -16061  -24812    3982   18407   21855  -22704  -24993
            20391    1043   -1924  -32113   11721    -744    4990   28174
           -29418   -9503   26898   19716   -4177  -10850   -6189   -2099
            20053   10576  -25765    2636   11141   -8168  -10572  -15413
            25518   32717    9471  -32754    8971     411   30384   28994
           -22583  -22757    5525  -16048   29524   31623   25504  -17132
           -27344   31565    3412    5296   32543   22148   -7893   19038
            -5604    2076  -24990   27066   25595   -5440  -28169  -19214
            16594    5942   19707   18301  -12918  -31001   -3970    6721
            15475   -1516  -18608   23642   13658    8275   -7287    3707
            21163   -6393   15614   31905   -8726   26653    5264   31329
            27363   10634    1148    6455  -19502  -24530    9748  -22252
            13941  -21525  -25641   32089   13707  -24503  -24055    9336
           -26173  -12437    2745  -10283  -25705  -31137   -3881  -20048

This gave me S1 to $8 per line input where $1, $3, $5 and $7 are the signed 16 bit left hand channel.
$2, $4, $6 and $8 are the 16 bit signed right hand channel.
Thus the first awk script converts the signed 16 bit decimal to unsigned 8 bit decimal, left hand channel only.

The second awk script then converts the unsigned 8 bit decimal to a pure binary file 44100 bytes in size.

It all works and is seriously quick on this MBP but it looks seriously ugly too.

This WHOLE shell script takes around 0.5 seconds to complete:-

#!/bin/bash
# 16to8bit.sh
> /tmp/left
> /tmp/binary
> /tmp/signed16bit.txt
> /tmp/sample.raw
# Generate a raw pseudo stereo signed 16 bit per channel file, (CD quality).
dd if=/dev/urandom of=/tmp/sample.raw bs=1 count=176400 > /dev/null 2>&1
# Convert to signed dcimal using the default format from 'od'.
od -td2 -An /tmp/sample.raw > /tmp/signed16bit.txt
# Convert, (quantise), the signed 16 bit decimal to 8 bit unsigned left hand channel.
awk 'BEGIN \
{
	FS=" ";
}
{
	if ($1=="")
		exit;
	# $1,$3,$5,$7 are/were the signed 16 bit depth left hand channel.
	# $2,$4,$6,$8 are/were the signed 16 bit depth right hand channel.
	$1=(int(($1+32768)/256));
	# $2=(int(($2+32768)/256));
	$3=(int(($3+32768)/256));
	# $4=(int(($4+32768)/256));
	$5=(int(($5+32768)/256));
	# $6=(int(($6+32768)/256));
	$7=(int(($7+32768)/256));
	# $8=(int(($8+32768)/256));
	printf $1" "$3" "$5" "$7" " > "/tmp/left";
}' < /tmp/signed16bit.txt
# Now create the 44100 byte raw 8 bit depth binary file.
awk --characters-as-bytes 'BEGIN \
{
	BINMODE=3;
	FS=" ";
	n=1;
}
{
	while (n<=44100) \
	{
		printf ("%c",$n) > "/tmp/binary";
		n=n+1;
	}
}' < /tmp/left

Ignore the '\' after BEGIN, etc, as this is my way if making it a little easier for me to read.
Please tear it apart and if there are better methods please point me in the right direction...

Many thanks guys...

I await the flak...

shamrock · October 31, 2014, 6:21pm

What's your rationale behind this magic formula... ($1+32768)/256)

wisecracker · November 1, 2014, 5:41am

Hi shamrock...
The signed 16 bit decimal equivalent cannot go less that -32768 and no greater than 32767.
The centreline is zero.

Add 32768 to the signed decimal to shift the centreline to 32768, the minimum now sits at zero and the maximum at 65535.

Divide the result by 256 gives a centreline of 128, minimum of zero and maximum of 255.
Try it and find out.

Nothing more sinister, and accurate enough for a further quantise to 4 bit depth for AudioScope.sh...

HTH...

---------- Post updated 01-11-14 at 09:41 AM ---------- Previous update was 31-10-14 at 11:13 PM ----------

Got it into one block realising that the numbers already exist as numbers and not strings...
I am happy with the results now and shaved off another two tenths of a second execution time...

#!/bin/bash
# 16to8bit_new.sh
> /tmp/sample.raw
> /tmp/leftbinary
> /tmp/signed16bit.txt
dd if=/dev/urandom of=/tmp/sample.raw bs=1 count=176400 > /dev/null 2>&1
od -td2 -An /tmp/sample.raw > /tmp/signed16bit.txt
awk --characters-as-bytes 'BEGIN \
{
	BINMODE=3;
	FS=" ";
}
{
	if ($1=="")
		exit;
	# $1,$3,$5,$7 are the left hand channel.
	# $2,$4,$6,$8 are the right hand channel.
	$1=(int(($1+32768)/256));
	# $2=(int(($2+32768)/256));
	$3=(int(($3+32768)/256));
	# $4=(int(($4+32768)/256));
	$5=(int(($5+32768)/256));
	# $6=(int(($6+32768)/256));
	$7=(int(($7+32768)/256));
	# $8=(int(($8+32768)/256));
	printf("%c%c%c%c",$1,$3,$5,$7) > "/tmp/leftbinary";
	# printf("%c%c%c%c",$2,$4,$6,$8) > "/tmp/rightbinary";
}' < /tmp/signed16bit.txt

EDIT:
Now tested on CygWin and completes the cycle in around 1.5 seconds. Highly acceptable...

Corona688 · November 1, 2014, 12:29pm

I think you can simplify that awk code by telling it to use all whitespace as record separators. One statement instead of four. Then you just tell it to process the "odd" lines -- 1, 3, 5, ...

You can get rid of the BEGIN block by feeding variables into awk on the commandline. This also lets you script the value of the outputfile.

I started adding pipes and stuff then saw the BINMODE, and realized that's probably why you were forced to use temp files. Oh well.

awk --characters-as-bytes 'NR%2 { printf("%c", ($1+32768)/256)) > OUT }' RS="[ \r\n\t]+" BINMODE=3 OUT="/tmp/leftbinary" /tmp/signed16bit.txt

wisecracker · November 2, 2014, 1:02pm

Hi Corona688...

Along with a few others, you have been extremely helpful with AudioScope.sh over its 21 month lifetime.

All I can say is many thanks.

I will try this baby of yours out and see how it goes on CygWin, as that is the platform that it is aimed at...

Bazza.

---------- Post updated 02-11-14 at 01:44 PM ---------- Previous update was 01-11-14 at 06:59 PM ----------

Hi Corona688...
Just this minute tried your code.
Had to add a '(' in red below to get it to run but ot only delivers 1 byte instead of 48000.

#!/bin/bash
# 16to8bit_new.sh
> /tmp/sample.raw
> /tmp/leftbinary
> /tmp/signed16bit.txt
dd if=/dev/urandom of=/tmp/sample.raw bs=1 count=192000 > /dev/null 2>&1
od -td2 -An /tmp/sample.raw > /tmp/signed16bit.txt
awk --characters-as-bytes 'NR%2 { printf("%c", (($1+32768)/256)) > OUT }' RS="[ \r\n\t]+" BINMODE=3 OUT="/tmp/leftbinary" /tmp/signed16bit.txt

Results:-

Last login: Sun Nov  2 13:30:37 on console
AMIGA:barrywalker~> cd Desktop
AMIGA:barrywalker~/Desktop> cd Code
AMIGA:barrywalker~/Desktop/Code> cd Awk
AMIGA:barrywalker~/Desktop/Code/Awk> ./16to8bit_new_c688.sh
awk: extra ) at source line 1
 context is
	NR%2 { printf("%c", >>>  ($1+32768)/256)) <<< 
awk: syntax error at source line 1
awk: illegal statement at source line 1
	extra )
AMIGA:barrywalker~/Desktop/Code/Awk> ./16to8bit_new_c688.sh
AMIGA:barrywalker~/Desktop/Code/Awk> ls -l /tmp/leftbinary
-rw-r--r--  1 barrywalker  wheel  1  2 Nov 13:39 /tmp/leftbinary
AMIGA:barrywalker~/Desktop/Code/Awk> _

---------- Post updated at 06:02 PM ---------- Previous update was at 01:44 PM ----------

This is an even smaller version for the LHS only and works a real treat in CygWin...
Thanks C688 for mentioning the removal BEGIN ...

#!/bin/bash
# 16to8bit_new.sh
> /tmp/sample.raw
> /tmp/leftbinary
> /tmp/signed16bit.txt
dd if=/dev/urandom of=/tmp/sample.raw bs=1 count=192000 > /dev/null 2>&1
od -td2 -An /tmp/sample.raw > /tmp/signed16bit.txt
awk --characters-as-bytes '
{
	BINMODE=3;
	FS=" ";
}
{
	if ($1=="") exit(0);
	$1=(int(($1+32768)/256));
	$3=(int(($3+32768)/256));
	$5=(int(($5+32768)/256));
	$7=(int(($7+32768)/256));
	printf("%c%c%c%c",$1,$3,$5,$7) > "/tmp/leftbinary";
}' < /tmp/signed16bit.txt

Listening tests were more than good, now I have to attempt to obtain a conversion to 8000Hz sampling rate for the frequency counter part of AudioScope.sh...

I have a simple idea to test and will post here when done.

RudiC · November 2, 2014, 2:37pm

If I get you right, what you want to do per sample point is strip the low byte and then XOR with 0X80. Try this with mere shell (recent bash) builtins:

 hexdump -v -e '4/1 "0x%02X " "\n"' /tmp/sample.raw | while read _ LL _ RR; do printf "x%x x%x\n" $((LL^128)) $((RR^128)); done | while read LL RR; do printf "\\$LL\\$RR"; done

wisecracker · November 2, 2014, 2:48pm

Hi RudiC...
Nice adaption of hexdump except....
Your code cannot be used for one reason.
CygWin does not have hexdump by default.
It only has 'od'.

Also OEM Windows installs have no audio conversion programs hence this hack.

The code has to be better than 2 seconds to complete inside a CygWin install...

Mine takes around 1.5(ish) seconds to complete using 'od'...

Thanks for your time however.

Bazza...