Is there a way to make this more efficient

gugs · August 11, 2008, 9:50am

I have the following code.

Basically we are producing a report from the messages taken off an MQ Queue. The messages have some characters that throws sed whereby it sometimes extracts the data I am after and sometimes not. I got around this problem by cutting the portion of the data that contains the information I am after from each record, which equates to a message taken off the queue, and passing it through the first sed command (sed -e 's/[^a-zA-Z0-9+_:-]//g' ) to filter the characters of interest. Subsequently the data is put through the second sed command to extarct the required information. From each record I require seven different bits of information, I have shown four. However, the process is repetitive for the other three.

Is there a way of improving this piece of code? I wanted to save the reformatted (printf "%n" "$line" | cut -c1-7000| sed -e 's/[^a-zA-Z0-9+_:-]//g' ) portion of the data into a variable and then pass it to sed, but sed only accepts a file as an input. This would get rid off the repetitive code. Please note, I can not use perl.

BMDan · August 11, 2008, 9:52am

Because the teacher will give you an F? Why? Even if you're not root, you can always install Perl in your home directory.

Why is it important to do the cut?

Can you provide some sample input?

Tytalus · August 11, 2008, 10:50am

#  echo "sed only accepts a file as input" | sed 's/ only\(.* a \).*\(as .*\)/\1stream \2/g' 
sed accepts a stream as input

gugs · August 11, 2008, 11:20am

I was thinking of passing sed a variable, for example:

I get a message stating it "cannot locate directory or file". I have tried passing ${record} too.

Tytalus · August 11, 2008, 11:38am

#  record="sed only accepts a file as input"

#  echo $record |sed 's/ only\(.* a \).*\(as .*\)/\1stream \2/g'
sed accepts a stream as input

aigles · August 11, 2008, 11:38am

while read line
do
   record=$(printf "%n" "$line" | cut -c1-6000| sed -e 's/[^a-zA-Z0-9+_:-]//g' )
   puttime=$(echo "$record" | sed   's|.*PutTime\(.*\)ApplOrgin.*|\1|')
done < temp01

Jean-Pierre.

danmero · August 11, 2008, 1:03pm

Depends on your shell

#!/bin/bash
a=test
sed 's/t/b/' <<< $a

@OP you should provide a sample input.

era · August 12, 2008, 2:58am

The reply I already posted in a related thread Passing a variable to sed or cut can easily be extended to cope as well. Whether you use a single sed script or a single awk script is a matter of taste; I picked awk because it's less arcane.

awk '{ f=substr($0, 1, 6000); gsub (/[^a-zA-Z0-9+_:-]/, "", f);
    if (f ~ ... hit on ex:Msg ...) ... extract and print ex:Msg data ...;
    if (f ~ ... hit on PutDate ...) ...extract and print PutDate ...;
   ... etc' temp01

With better information about what the input is supposed to look like, the dotted parts could be made less speculative.

era · August 12, 2008, 5:52am

... Actually with Perl it gets rather simple.

perl -ne '$f = substr($_, 0, 5999);
  if ($f =~ /ex:Msg(.*?)ex:Msg/) { print "MsgId = $1\n"; }
  if ($f =~ /PutDate(.*?)PutTime/) { print "Put Date = $1\n"; }
  if ($f =~ /PutTime(.*?)ApplOrgin/) { print "Put Time = $1\n"; }
  if ($f =~ /CreationTimeStamp(.*?)CreationTimeStamp/) { print "Timestamp = $1\n"; }' temp01

era · August 13, 2008, 2:38am

... And finally here's sed:

cut -c1-6000 temp01 | sed '
  # Copy data to hold space
  h
  # Substitute MsgId and print
  s/.*ex:Msg\(.*\)ex:Msg.*/MsgId = \1/p
  # Copy back from hold space
  g
  # Similar for Put Date, Put Time, and Timestamp
  s/.*PutDate\(.*\)PutTime.*/Put Date = \1/p
  g
  s/.*PutTime\(.*\)ApplOrigin.*/Put Time = \1/p
  g
  s/.*CreationTimeStamp\(.*\)CreationTimeStamp.*/Timestamp = \1/p'

The cut could be replaced by s/^$.\{6000\}$.*/\1/ if your sed supports that syntax, or a similar expression with 6000 periods inside the parenteses, if your sed can cope with that. Or you could simply hope that each match on any of the expressions above will happen within the first 6,000 characters, and simply omit the cut. Again, without sample data, it's hard to say.

Sorry for following up on my own posts; sleeping on this question brought up new ideas.