Is there a way to make this more efficient

I have the following code.

printf "Test Message Report" > report.txt
while read line
do
msgid=$(printf "%n" "$line" | cut -c1-6000| sed -e 's/[^a-zA-Z0-9+:-]//g' -e 's|.*ex:Msg\(.*\)ex:Msg.*|\1|')
putdate=$(printf "%n" "$line" | cut -c1-6000| sed -e 's/[^a-zA-Z0-9+
:-]//g' -e 's|.*PutDate\(.*\)PutTime.|\1|')
puttime=$(printf "%n" "$line" | cut -c1-6000| sed -e 's/[^a-zA-Z0-9+_:-]//g' -e 's|.*PutTime\(.*\)ApplOrgin.
|\1|')
timestamp=$(printf "%n" "$line" | cut -c1-6000| sed -e 's/[^a-zA-Z0-9+_:-]//g' -e 's|.*CreationTimeStamp\(.*\)CreationTimeStamp.*|\1|')
printf "MsgId = $msgid" >> report.txt
printf "Put Date = $putdate" >> report.txt
printf "Put Time = $puttime" >> report.txt
printf "Timestamp = $timestamp" >> report.txt
done < temp01

Basically we are producing a report from the messages taken off an MQ Queue. The messages have some characters that throws sed whereby it sometimes extracts the data I am after and sometimes not. I got around this problem by cutting the portion of the data that contains the information I am after from each record, which equates to a message taken off the queue, and passing it through the first sed command (sed -e 's/[^a-zA-Z0-9+_:-]//g' ) to filter the characters of interest. Subsequently the data is put through the second sed command to extarct the required information. From each record I require seven different bits of information, I have shown four. However, the process is repetitive for the other three.

Is there a way of improving this piece of code? I wanted to save the reformatted (printf "%n" "$line" | cut -c1-7000| sed -e 's/[^a-zA-Z0-9+_:-]//g' ) portion of the data into a variable and then pass it to sed, but sed only accepts a file as an input. This would get rid off the repetitive code. Please note, I can not use perl.

Because the teacher will give you an F? Why? Even if you're not root, you can always install Perl in your home directory.

Why is it important to do the cut?

Can you provide some sample input?

#  echo "sed only accepts a file as input" | sed 's/ only\(.* a \).*\(as .*\)/\1stream \2/g' 
sed accepts a stream as input

I was thinking of passing sed a variable, for example:

while read line
do
record=$(printf "%n" "$line" | cut -c1-6000| sed -e 's/[^a-zA-Z0-9+:-]//g' )
puttime=$(sed<$record 's/[^a-zA-Z0-9+
:-]//g' -e 's|.*PutTime\(.*\)ApplOrgin.*|\1|')
done < temp01

I get a message stating it "cannot locate directory or file". I have tried passing ${record} too.

#  record="sed only accepts a file as input"

#  echo $record |sed 's/ only\(.* a \).*\(as .*\)/\1stream \2/g'
sed accepts a stream as input
while read line
do
   record=$(printf "%n" "$line" | cut -c1-6000| sed -e 's/[^a-zA-Z0-9+_:-]//g' )
   puttime=$(echo "$record" | sed   's|.*PutTime\(.*\)ApplOrgin.*|\1|')
done < temp01

Jean-Pierre.

Depends on your shell

#!/bin/bash
a=test
sed 's/t/b/' <<< $a

@OP you should provide a sample input.

The reply I already posted in a related thread Passing a variable to sed or cut can easily be extended to cope as well. Whether you use a single sed script or a single awk script is a matter of taste; I picked awk because it's less arcane.

awk '{ f=substr($0, 1, 6000); gsub (/[^a-zA-Z0-9+_:-]/, "", f);
    if (f ~ ... hit on ex:Msg ...) ... extract and print ex:Msg data ...;
    if (f ~ ... hit on PutDate ...) ...extract and print PutDate ...;
   ... etc' temp01

With better information about what the input is supposed to look like, the dotted parts could be made less speculative.

... Actually with Perl it gets rather simple.

perl -ne '$f = substr($_, 0, 5999);
  if ($f =~ /ex:Msg(.*?)ex:Msg/) { print "MsgId = $1\n"; }
  if ($f =~ /PutDate(.*?)PutTime/) { print "Put Date = $1\n"; }
  if ($f =~ /PutTime(.*?)ApplOrgin/) { print "Put Time = $1\n"; }
  if ($f =~ /CreationTimeStamp(.*?)CreationTimeStamp/) { print "Timestamp = $1\n"; }' temp01

... And finally here's sed:

cut -c1-6000 temp01 | sed '
  # Copy data to hold space
  h
  # Substitute MsgId and print
  s/.*ex:Msg\(.*\)ex:Msg.*/MsgId = \1/p
  # Copy back from hold space
  g
  # Similar for Put Date, Put Time, and Timestamp
  s/.*PutDate\(.*\)PutTime.*/Put Date = \1/p
  g
  s/.*PutTime\(.*\)ApplOrigin.*/Put Time = \1/p
  g
  s/.*CreationTimeStamp\(.*\)CreationTimeStamp.*/Timestamp = \1/p'

The cut could be replaced by s/^\(.\{6000\}\).*/\1/ if your sed supports that syntax, or a similar expression with 6000 periods inside the parenteses, if your sed can cope with that. Or you could simply hope that each match on any of the expressions above will happen within the first 6,000 characters, and simply omit the cut. Again, without sample data, it's hard to say.

Sorry for following up on my own posts; sleeping on this question brought up new ideas.