BASH: Break line, read, break again, read again...

SilversleevesX · May 6, 2011, 2:17am

...when the lines use both a colon and commas to separate the parts you want read as information.

The first version of this script used cut and other non-Bash-builtins, frequently, which made it nice and zippy with little more than average processor load in GNOME Terminal but, predictably, slow as a sloth on Valium and heavy on the CPU in Cygwin.

while read 'line';
do
n=$(echo $line)
file=$(echo -ne $n | cut -d: -f1)
tags=$(echo -ne $n | cut -d: -f2)
cate=$(echo -ne $tags | cut -d, -f1)
cred=$(echo -ne $tags | cut -d, -f2)
sour=$(echo -ne $tags | cut -d, -f3)
writ=$(echo -ne $tags | cut -d, -f4)
trans=$(echo -ne $tags | cut -d, -f5)
fixid=$(echo -ne $tags | cut -d, -f6)
objnm=$(echo -ne $tags | cut -d, -f7)
locn=$(echo -ne $tags | cut -d, -f8)
fname=`basename "$file"`
echo -e "Working on file $fname now."

I decided I wanted it to work faster, whatever it was run in.

Recently, I came across a line -- you might call it a command 'pair' -- in another script

exec 5<foo.txt
while IFS="," read -u5

for which I got help either here or on LQ. I've applied it to other scripts, and I've seen, yet again, the difference in speed a builtin often makes over an external binary. All those scripts had something in common, however: the data they read in was separated by a single delimiter, a colon or a comma, but not one of the former and several of the latter, as in my example above.

Re-working it with as much as I can get my mind around in terms of the 'read' and 'exec' commands, with only the most modest looking about on coder/scripter bulletin-board sites etc., I still can't get it to split the data in the $tags variable.

It looks like my approach is all wrong. So what would be a more useful way to go about it, keeping in mind that I want to employ more (all, if possible) builtins and stay away from "/bin" externals.

BZT

pravin27 · May 6, 2011, 2:42am

Could this help you?

 echo 'example.jpg:QVC,honeywell-labs.com,COM,blah' | awk -F"[:,]" '{for(i=1;i<=NF;i++) { print "fld"i,"-",$i}}'

SilversleevesX · May 6, 2011, 6:56am

You might say I kind'a bailed. I decided since I was formatting the data to work with the older (and in Cygwin, slower) script, whether or not the lines had one delimiter or two didn't matter so much as getting the processing time down with the built-ins.

So I tried a similar string to the example I gave in my OP, removed one 'while IFS="" ' inside another while/do/done loop (with which I'd been trying to break up the $tags variable), and on the single delimiter -- this time a comma, though it could have just as easily been a colon or a carat (^) -- it ran as fast in Cygwin as the other one has always done in GNOME Terminal, give or take a few dozen ticks.

Which is OK with me.

The command string you suggested won't go to waste, though. I've found another old script that is also written to read text data delimited by two different bits of punctuation. I've got a slightly-faster version worked up now, but I think it could be made even zippier. I'll C&P your line and see if I can use it with this other script.

Thanks for the quick reply.

BZT