Another sed/awk search=>replace question

f77coder · August 28, 2014, 11:40am

Hello,

Need a little bit of help. Basically I need to replace lines in a file which were calculated wrong as it would 12 hours to regenerate the data. I need to calculate values based on other files which I've managed to figure out with grep/cut but now am stuck on how to shove these new calculated values into another file

#!/usr/local/bin/bash
array=(61329025
61333750
61348174
61368157
61371144
61374807
61406775
61413072
61416814
61418304
61421223
61423127
61432337)

file2mod=sub_c18.csv

for i in "${array[@]}"
do
  a=`grep $i sub_c11* |cut -f2 -d","`
  b=`grep $i sub_c13* |cut -f2 -d","`
  c=`grep $i sub_c7* |cut -f2 -d","`
  ratio=`echo "$a+$b+$c-$a*$b-$a*$c-$b*$c" | bc -l`
  printf "%s, %0.8f\n" $i $ratio>>sanitycheck.txt
  sed -i "s/^a$i*//$i , $ratio/" $file2mod
done
echo "done"

sub_c18.csv looks like this

.
.
61406774,0.553508428
61406775,a2f14cf2
61406776,0.169964852
.
.

but it returns

sed: 1: "sub_c18.csv": unterminated substitute pattern

Thanks for any help.

rbatte1 · August 28, 2014, 11:49am

Welcome f77coder,

Thanks for using CODE tags straight away. It's a pleasure to have you here.

At first glance, your code defines a variable array but with and open parenthesis ( and a close brace } (I think they are the right names) so there is a mis-match. Is this a typo/copy & paste error, or could this be your problem?

Regards,
Robin

f77coder · August 28, 2014, 11:53am

Thanks but that's not the issue, in my code it has parenthesis. I'll fix it.

Scrutinizer · August 28, 2014, 12:33pm

In your sed statement there are 2 slashes next to each other. Perhaps that should be one?

f77coder · August 28, 2014, 12:39pm

Thanks for reply.

I removed but still get this error.

ed: 1: "sub_c18.csv": unterminated substitute pattern

From what i've read the double '//' are needed to end the wildcard???

shamrock · August 28, 2014, 12:45pm

Can you re-post your modified code...

f77coder · August 28, 2014, 12:57pm

Sure. Thanks for looking.

#!/usr/local/bin/bash
array=(61329025
61333750
61348174
61368157
61371144
61374807
61406775
61413072
61416814
61418304
61421223
61423127
61432337)

file2mod=sub_c18.csv

for i in "${array[@]}"
do
  a=`grep $i sub_c11* |cut -f2 -d","`
  b=`grep $i sub_c13* |cut -f2 -d","`
  c=`grep $i sub_c7* |cut -f2 -d","`
  ratio=`echo "$a+$b+$c-$a*$b-$a*$c-$b*$c" | bc -l`
  printf "%s, %0.8f\n" $i $ratio>>sanitycheck.txt
  sed -i "s/^a$i*//$i , $ratio/" $file2mod
done
echo "done"

Scrutinizer · August 28, 2014, 1:00pm

What you mean by "end the wildcard"?

Also, that asterisk (regex repetition operator) is probably wrong there. What is it's intended function?

Otherwise, can you show a sample line from the original file and what you want to replace it with?

f77coder · August 28, 2014, 1:12pm

This data file sub_c18.csv looks like this

.
.
61406774,0.553508428
61406775,a2f14cf2
61406776,0.169964852
.
.

Where 61406775,a2f14cf2 is a bad line of data to be replaced with something like 61406775,0.7463839

a2f14cf2 hex number is random.

Don't I need the wildcard to match the complete line? I want to do whole line substitution.

Scrutinizer · August 28, 2014, 1:25pm

Try:

"s/^$i,.*/$i , $ratio/"

or

"s/^$i *,.*/$i , $ratio/"

if there can be zero or more spaces before the comma

Try it without the -i option first..

f77coder · August 28, 2014, 2:06pm

I tried both and they both started looping without stopping and not substituting.

After looking closely, there is a white space after the hex number.

Scrutinizer · August 28, 2014, 2:12pm

After the hex number? That should not make a difference, as .* should cover that..

What starts looping. What did you try exactly?
If I put your sample in a file, I get this:

$ cat somefile
.
.
61406774,0.553508428
61406775,a2f14cf2
61406776,0.169964852
.
.
$ i=61406775 ratio=0.7463839
$ sed "s/^$i *,.*/$i , $ratio/" somefile
.
.
61406774,0.553508428
61406775 , 0.7463839
61406776,0.169964852
.
.

f77coder · August 28, 2014, 2:25pm

This is the code I used.

#!/usr/local/bin/bash
array=(61329025
61333750
61348174
61368157
61371144
61374807
61406775
61413072
61416814
61418304
61421223
61423127
61432337
61449549
61449715
61451883
61453612
61457479)

file2mod=sub_c18.csv

for i in "${array[@]}"
do
  a=`grep $i sub_c11* |cut -f2 -d","`
  b=`grep $i sub_c13* |cut -f2 -d","`
  c=`grep $i sub_c7* |cut -f2 -d","`
  ratio=`echo "$a+$b+$c-$a*$b-$a*$c-$b*$c" | bc -l`
  sed "s/^$i *,.*/$i , $ratio/" $file2mod 
done
echo "done"

part of sub_c18.csv file

60000000,0.22370241
60000001,0.192148371
60000002,0.235125105
60000003,0.553508428
60000004,0.20143351
60000005,0.169964852
60000006,0.272900889
60000007,0.103461449
60000008,0.0722885972
60000009,0.300323534
60000010,0.169650792
60000011,0.169964852
60000012,0.256249637
60000013,0.551763018
60000014,0.192014202
60000015,0.103461449
60000016,0.103461449
60000017,0.445011978
60000018,0.433271354
60000019,0.230131215
60000020,0.218191459
60000021,0.213877195
60000022,0.0722885972
60000023,0.295587426
60000024,0.0472011667
60000025,0.0472011667
60000026,0.335213292
60000027,0.305288462
...

junior-helper · August 28, 2014, 2:41pm

I think it's got to be sed "s/^$i.*/$i , $ratio/"

---------- Post updated at 08:37 PM ---------- Previous update was at 08:35 PM ----------

Worked for me as I've tried to systematically trace it...

---------- Post updated at 08:41 PM ---------- Previous update was at 08:37 PM ----------

x

f77coder · August 28, 2014, 3:32pm

Where did you put the file name?

I tried this

  sed "s/^$i.*/$i , $ratio/" <$file2mod> test0

it the script goes into an infinite loop of re-writing the file, test0.

junior-helper · August 28, 2014, 4:02pm

Right after the sed command...

sed "s/^$i.*/$i , $ratio/" sub_c18.csv

I copied your original command, set the $i and $ratio variables manually and played with it until it worked...

---------- Post updated at 10:02 PM ---------- Previous update was at 09:56 PM ----------

I just tried Scrutinizer's code, it also works perfectly (I should have tried it before posting though)

f77coder · August 28, 2014, 4:25pm

Now when I add '-i' to do inline substitution, I still get my original error

sed -i "s/^$i.*/$i , $ratio/" sub_c18.csv

sed: 1: "sub_c18.csv": unterminated substitute pattern

Ugh...

junior-helper · August 28, 2014, 4:58pm

I guess I just figured out what you mean with infinite loop, because I also tested it in a loop - with the sample data (csv file with just 3 lines) - not just manually on the command line and indeed it looked like a infinite loop and I was just like , but then I realized the csv is parsed 13 times (13 items in the array).

Here you can clearly see how 61406775 is matched in the 7th round.

$ ./sub.sh 
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775 , 0.324324234234
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
61406774,0.553508428
61406775,a12324
61406776,0.169964852
done
$

Note that the sed command is much faster with the -i option or even redirection into another file because the output is not printed to the (slow) stdout.

Can you post your current script?

---------- Post updated at 10:58 PM ---------- Previous update was at 10:54 PM ----------

I'd be happy if some "senior" person could confirm my assumption though.

f77coder · August 28, 2014, 5:14pm

I don't understand the looping problem? Shouldn't sed scan the array and find the value the first loop?

for i in "${array[@]}"
do
  a=`grep $i sub_c11* |cut -f2 -d","`
  b=`grep $i sub_c13* |cut -f2 -d","`
  c=`grep $i sub_c7* |cut -f2 -d","`
  ratio=`echo "$a+$b+$c-$a*$b-$a*$c-$b*$c" | bc -l`
  sed -i "s/^$i.*/$i , $ratio/" sub_c18.csv
done
echo "done"

Appreciate the help btw.

junior-helper · August 28, 2014, 5:29pm

No, the for loop goes through the array and passes the values one by one - placed in the $i variable - to sed .
sed does the text transformation part only.

---------- Post updated at 11:29 PM ---------- Previous update was at 11:26 PM ----------

Your latest script looks good imho...