Interpolation of two values in two different files

Dear All,

I have two files which contain numerical data and strings. I want to create a new file that only revise numerical data from two files using interpolation. I guess AWK works, but I am new on AWK.

FileA.txt

.
.
index_2("0.1, 1, 2, 4, 8, 16, 32");
values("0.0330208, 0.0345557, 0.0409915" \
 "0.0329737, 0.0344924, 0.0483888", \
 "0.0336141, 0.0351873, 0.0370214");
.
.

FileB.txt

.
.
index_2("0.1, 1, 2, 4, 8, 16, 32");
 values("0.0624608, 0.0654286, 0.0684585", \
 "0.0707829, 0.0737413, 0.0767505", \
 "0.0880787, 0.0909583, 0.0939307");
.
.

FileNew.txt

.
.
index_2("0.1, 1, 2, 4, 8, 16, 32");
 values("(ValueA+ValueB)/2, (ValueA+ValueB)/2, (ValueA+ValueB)/2", \
 "(ValueA+ValueB)/2, (ValueA+ValueB)/2, (ValueA+ValueB)/2", \
 "(ValueA+ValueB)/2, (ValueA+ValueB)/2, (ValueA+ValueB)/2");
.
.

There are two tasks in my work.
1) Find "index_2(" using if statement
2) Create a new file that only change values (e.g., NewValue = (ValueA+ValueB)/2)

Please provide any comment for me. Thank you in advance.

Best,

Jaeyoung

Is this a homework assignment? (Homework and coursework questions can only be posted in the Homework and Coursework forum under special homework rules listed here.)

The quoting in your sample data is very strange. Instead of:

index_2("0.1, 1, 2, 4, 8, 16, 32");
values("0.0330208, 0.0345557, 0.0409915" \
  Why is there no comma after the closing quote on the line above?
 "0.0329737, 0.0344924, 0.0483888", \
 "0.0336141, 0.0351873, 0.0370214");

we usually see quoting like:

index_2("0.1", "1", "2", "4", "8", "16", "32");
values("0.0330208", "0.0345557", "0.0409915", \
 "0.0329737", "0.0344924", "0.0483888", \
 "0.0336141", "0.0351873", "0.0370214");

or (especially when dealing with numeric values) no quotes:

index_2(0.1, 1, 2, 4, 8, 16, 32);
values(0.0330208, 0.0345557, 0.0409915, \
 0.0329737, 0.0344924, 0.0483888, \
 0.0336141, 0.0351873, 0.0370214);

Thank you for your replay. This is not a homework.

Data is given. Therefore, comma and closing quotes are not main problem.

Best,

Jaeyoung

---------- Post updated at 03:20 PM ---------- Previous update was at 03:14 PM ----------

Dear All,

Let me clarify my questions. I have two files which contain numbers and texts. Texts in two files are identical. I want to create a new file that has average of numbers from two files.

FileA.txt (more than 10000 lines and more than 1000 text and numbers)

textA
textB(10,2,2)
textC(2)
textD
.
.

FileB.txt (Texts are identical to FileA.txt )

textA
textB(0,0,4)
textC(4)
textD
.
.

FileNew.txt (Have averages from FileA and FileB.txt )

textA
textB(5,1,3)
textC(3)
textD
.
.

One request is that I don't want to change any text. Only numbers are needed to be changed.

I think AWK or diff work this job.

Best,

Jaeyoung

Well, they usually are. Everything within quotes usually is considered ONE data point.
Are index_2 and values to be considered correlated? Would be difficult, as index_2 has 7 items while values has 9 (when all double quotes are removed).
Your leading and trailing dots imply that there's more contents in either input file. Which of these should make it into the output file?

Hi, jypark22

Your last try at clarifying your request has made it a bit more confusing, at least for me.

I think that we need an example with a bit more extended information that would allow us to find what kind of pattern the real file has. One block example is not enough.

Could you confirm the pattern of the block to be? Is it like this in FileA.txt?

index_1("0.1, 1, 2, 4, 8, 16, 32");
values("0.0330207, 0.0345558, 0.0409915", \
 "0.0329737, 0.0344924, 0.0483888", \
 "0.0336141, 0.0351873, 0.0370214");
index_2("0.1, 1, 2, 4, 8, 16, 32");
values("0.0330208, 0.0345557, 0.0409915", \
 "0.0329737, 0.0344924, 0.0483888", \
 "0.0336141, 0.0351873, 0.0370214");
index_3("0.1, 1, 2, 4, 8, 16, 32");
values("0.0330203, 0.0345554, 0.0409915", \
 "0.0329737, 0.0344924, 0.0483888", \
 "0.0336141, 0.0351873, 0.0370214");

Would it ever have anything else in between block patterns?

Please use code tags as required by forum rules!

For your samples in post#3, this might work:

awk -F"(" '
FNR==NR         {if ($2) T[$1] = $2
                 next
                }
$1 in T         {n = split ($2, N, ",")
                 n = split (T[$1], M, ",")
                 $2 = "("
                 for (i=1; i<=n; i++) $2 = sprintf ("%s%.f,", $2, (N+M)/2)
                 sub (/,$/, ")", $2)
                }
1
' OFS="" file[12]
textA
textB(5,1,3)
textC(3)
textD

Thank you for your comment, Aia and RudiC.

"index_2" is followed by numbers. There are no "index_1" and "index_2" in data.

Also, there is no relationship between texts. All texts in two files are identical. Only numbers are changed on two files.

Best,

Jaeyoung

With the samples in post#3 amended by your post#1 data, try

awk -F"(" '

FOUND           {while (/\\$/)  {getline X
                                 $0 = $0 X
                                }
                 FOUND = 0
                }

/^index_2/      {FOUND = 1
                }

                {gsub (/[ "\\;)]/, "", $2)
                }

FNR==NR         {if ($2) T[$1] = $2
                 next
                }

$1 in T         {n = split ($2, N, ",")
                 n = split (T[$1], M, ",")
                 $2 = "("
                 for (i=1; i<=n; i++) $2 = sprintf ("%s%f,", $2, (N+M)/2)
                 sub (/,$/, ")", $2)
                }
1
' OFS="" file[12]
textA
index_2(0.100000,1.000000,2.000000,4.000000,8.000000,16.000000,32.000000)
values(0.047741,0.049992,0.054725,0.051878,0.054117,0.062570,0.060846,0.063073,0.065476)
textB(5.000000,1.000000,3.000000)
textC(3.000000)
textD

Correcting formatting issues is left as an exercise to the reader...

Thank you, RudiC. I would appreciate your help.

Could you explain your code in detail? If you can, please line by line.

Best,

Jaeyoung

---------- Post updated at 05:35 PM ---------- Previous update was at 02:22 PM ----------

Hi RudiC,

Your code works great.

However, this does work if the number of array is greater than 1.

FileA.txt

.
.
index_2("0.1, 1, 2, 4, 8, 16, 32");
 values("0.0530208, 0.0345557, 0.0409915" \
 "0.0329737, 0.0344924, 0.0483888", \
 "0.0336141, 0.0351873, 0.0370214");
.
.
.
.
index_2("0.1, 1, 2, 4, 8, 16, 32");
 values("0.0330208, 0.0545557, 0.0409915" \
 "0.0329737, 0.0344924, 0.0483888", \
 "0.0336141, 0.0351873, 0.0370214");
.
.

FileB.txt

.
.
index_2("0.1, 1, 2, 4, 8, 16, 32");
 values("0.0624608, 0.0654286, 0.0684585", \
 "0.0707829, 0.0737413, 0.0767505", \
 "0.0880787, 0.0909583, 0.0939307");
.
.
.
.
index_2("0.1, 1, 2, 4, 8, 16, 32");
 values("0.0624608, 0.0654286, 0.0684585", \
 "0.0707829, 0.0737413, 0.0767505", \
 "0.0880787, 0.0909583, 0.0939307");
.
.

Result.txt

.
.
index_2(0.100000,1.000000,2.000000,4.000000,8.000000,16.000000,32.000000,0.000000)
 values(0.047741,0.059992,0.054725,0.052638,0.061065,0.055182,0.061633,0.063990,0.046965)
.
.
.
.
index_2(0.100000,1.000000,2.000000,4.000000,8.000000,16.000000,32.000000,0.000000)
 values(0.047741,0.059992,0.054725,0.052638,0.061065,0.055182,0.061633,0.063990,0.046965)
.
.

The result of the second array was not updated. Could you let me know how to revise your code to work correctly if the number of array is greater than 1?

Best,

Jaeyoung

awk -F"(" '                                     # make "(" the field separator

FOUND           {while (/\\$/)  {getline X      # if last line had "index_2", read and append
                                 $0 = $0 X      # all lines with a continuation char "\" to $0
                                }
                 FOUND = 0                      # done with the values
                }

/^index_2/      {FOUND = 1                      # keep info that index_2 was found for above
                }

                {gsub (/[ "\\;)]/, "", $2)      # remove all occurrences of unwanted chars
                }

FNR==NR         {if ($2) T[$1] = $2             # for 1. file: store values (if present) in T arr
                 next                           # don't continue processing THIS line
                }

$1 in T         {n = split ($2, N, ",")         # split all values into N arr
                 n = split (T[$1], M, ",")      # split all 1. file values into M arr
                 $2 = "("                       # start new $2
                 for (i=1; i<=n; i++) $2 = $2 (N+M)/2 ","         # add average values
                 sub (/,$/, ")", $2)            # to $2; remove last comma
                }
1                                               # print the actual line (with new values)
' OFS="" file[12]

---------- Post updated at 23:00 ---------- Previous update was at 22:50 ----------

A bit surprising, ain't it?

If the texts are IDENTICAL line by line except for the numerical values, try

awk -F"(" '

FOUND           {while (/\\$/)  {getline X
                                 $0 = $0 X
                                }
                 FOUND = 0
                }

/^index_2/      {FOUND = 1
                }

                {gsub (/[ "\\;)]/, "", $2)
                }

FNR==NR         {if ($2) T[FNR] = $2
                 next
                }

FNR in T         {n = split ($2, N, ",")
                 n = split (T[FNR], M, ",")
                 $2 = "("
                 for (i=1; i<=n; i++) $2 = $2 (N+M)/2 ","
                 sub (/,$/, ")", $2)
                }
1
' OFS="" file[12]
textA
index_2(0.1,1,2,4,8,16,32)
values(0.0577408,0.0499921,0.054725,0.0526376,0.061065,0.0551823,0.061633,0.0639899)
textB(5,1,3)
index_2(0.1,1,2,4,8,16,32)
values(0.0477408,0.0599922,0.054725,0.0526376,0.061065,0.0551823,0.061633,0.0639899)
textC(3)
textD
1 Like

Thank you so much!!!. That problem is resolved.

However, I encountered a strange problem with the new code. 0 is inserted after "(".

I only posted a part of my data at the first time. My data goes as shown below. 100 are values that I want to change.

			.... 
			timing () {
 				related_pin : "A1";
 				timing_sense : "positive_unate";
 				cell_rise ("del_1_7_7") {
 					index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
 					index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
 					values("100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100");
 				}
 				rise_transition ("del_1_7_7") {
 					index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
 					index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
					values("100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100");
 				}
				cell_fall ("del_1_7_7") {

.....

Results are shown below. I observed the values that I wanted to modify did not change, only "0" is inserted after "("

.... 
 			timing (0)
 				related_pin : "A1";
 				timing_sense : "positive_unate";
 				cell_rise (0)
 					index_1(0.016,0.032,0.064,0.128,0.256,0.512,1.024)
 					index_2(0.1,0.25,0.5,1,2,4,8)
 					values(100,100,100,100,100,100,100,0)
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100");
 				}
 				rise_transition (0)
 					index_1(0.016,0.032,0.064,0.128,0.256,0.512,1.024)
 					index_2(0.1,0.25,0.5,1,2,4,8)
 					values(100,100,100,100,100,100,100,0)
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100", \
 					"100, 100, 100, 100, 100, 100, 100");
 				}
 				cell_fall (0)
.... 

Any help is appreciated. I can share my data file with you if it would be helpful to provide comments.

Best,

Jaeyoung

That value is 0 as there's text string inside the parentheses. Wouldn't it have been nice if the real data had been known from the beginning?

1 Like

Yes, I am sorry about that. I couldn't share my data files because of confidential issue. Data files are from a company for educational purpose, but I think that is not appropriate to post data in the forum.

I want to share it to you, but there is no way to send a private message currently. Could you send me a private message containing you email address if you are okay? I will share data files.

Best,

Jaeyoung

We don't need or want you to post sensitive data. But, for us to help you write code to process data, we absolutely require a clear description and/or a representative sample (with sensitive data scrubbed) that describes what needs to be processed and what needs to be ignored.

With your latest sample it appears that your problem lines could be handled by changing:

FOUND           {while (/\\$/)  {getline X      # if last line had "index_2", read and append

to something like:

/[{}]/          {print                          # copy non-data lines
                 next                           # and skip remaining steps
                }

FOUND           {while (/\\$/)  {getline X      # if last line had "index_2", read and append
1 Like

Thank you, Don.

I see your point. I will post my data clearly in the next post.

Best,

Jaeyoung

---------- Post updated 12-18-15 at 05:32 PM ---------- Previous update was 12-17-15 at 07:24 PM ----------

Thank you for all your help, Don and RudiC.

I have revised your elegant code a bit for my work.

Now, the updated code works successfully to calculate average values. I would appreciate your help.

I have a question about skipping non-data line. I tried Don's comment (/[{}]/), but it copied non-data lines on the top while remaining the original non-data lines. In addition, a few non-data string is changed to '0'. I marked in red in the following result.

Result.txt

 	cell (AND2X1_RVT) {
			timing () {
 				cell_rise ("del_1_7_7") {
 				}
 				rise_transition ("del_1_7_7") {
 				}
				cell_fall ("del_1_7_7") {
}
.... 
 	cell (AND2X1_RVT) {
			timing () {
 				related_pin : "0, ";
 				timing_sense : "0, ";
 				cell_rise ("del_1_7_7") {
 					index_1("0.016,0.032,0.064,0.128,0.256,0.512,1.024, ");
 					index_2("0.1,0.25,0.5,1,2,4,8, ");
 					values("1.5,3,4.5,6,7.5,9,10.5, ", \
 					"12,13.5,15,16.5,18,19.5,21, ", \
 					"22.5,24,25.5,27,28,30,31.5, ", \
 					"33,34.5,36,37.5,39,40.5,42, ", \
 					"43.5,45,46.5,48,49.5,51,52.5, ", \
 					"54,55.5,57,58.5,60,61.5,63, ", \
 					"64.5,66,67.5,69,70.5,72,73.5, ");
 				}
 				rise_transition ("del_1_7_7") {
 					index_1("0.016,0.032,0.064,0.128,0.256,0.512,1.024, ");
 					index_2("0.1,0.25,0.5,1,2,4,8, ");
					values("75,76.5,78,79.5,81,82.5,84, ", \
 					"85.5,87,88.5,90,91.5,93,94.5, ", \
 					"96,97.5,99,100.5,102,103.5,105, ", \
 					"106.5,108,109.5,111,112.5,114,115.5, ", \
 					"117,118.5,120,121.5,123,124.5,126, ", \
 					"127.5,129,130.5,132,133.5,135,136.5, ", \
 					"138,139.5,141,142.5,144,145.5,147, ");
 				}
				cell_fall ("del_1_7_7") {
}

.....

AWKscript.s

awk -F"\"" '
/[{}]/          {print                     # copy non-data lines
                 next                      # and skip remaining steps
                }

FOUND           {while (/\\$/)  {getline X      # if last line had "index_2", read and append
                                 $0 = $0 X
                                }
                 FOUND = 0
                }

#/^index_2("0.1, 0.25, 0.5, 1, 2, 4, 8"); /      {FOUND = 1
/^cell_rise ("del_1_7_7") { /      {FOUND = 1
                }

FNR==NR         {if ($2) T[FNR] = $2
                 next
                }

FNR in T         {n = split ($2, N, ",")
                 n = split (T[FNR], M, ",")
                 $2 = "\""
                 for (i=1; i<=(n); i++) $2 = $2 (M+(N-M)*0.5) ","
                 sub (/,$/, ", \"", $2)
                }
1
' OFS="" FileA.txt FileB.txt

FileA.txt

.... 
 	cell (AND2X1_RVT) {
			timing () {
 				related_pin : "A1";
 				timing_sense : "positive_unate";
 				cell_rise ("del_1_7_7") {
 					index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
 					index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
 					values("1, 2, 3, 4, 5, 6, 7", \
 					"8, 9, 10, 11, 12, 13, 14", \
 					"15, 16, 17, 18, 18, 20, 21", \
 					"22, 23, 24, 25, 26, 27, 28", \
 					"29, 30, 31, 32, 33, 34, 35", \
 					"36, 37, 38, 39, 40, 41, 42", \
 					"43, 44, 45, 46, 47, 48, 49");
 				}
 				rise_transition ("del_1_7_7") {
 					index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
 					index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
					values("50, 51, 52, 53, 54, 55, 56", \
 					"57, 58, 59, 60, 61, 62, 63", \
 					"64, 65, 66, 67, 68, 69, 70", \
 					"71, 72, 73, 74, 75, 76, 77", \
 					"78, 79, 80, 81, 82, 83, 84", \
 					"85, 86, 87, 88, 89, 90, 91", \
 					"92, 93, 94, 95, 96, 97, 98");
 				}
				cell_fall ("del_1_7_7") {
}

.....

FileB.txt

.... 
 	cell (AND2X1_RVT) {
			timing () {
 				related_pin : "A1";
 				timing_sense : "positive_unate";
 				cell_rise ("del_1_7_7") {
 					index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
 					index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
 					values("2, 4, 6, 8, 10, 12, 14", \
 					"16, 18, 20, 22, 24, 26, 28", \
 					"30, 32, 34, 36, 38, 40, 42", \
 					"44, 46, 48, 50, 52, 54, 56", \
 					"58, 60, 62, 64, 66, 68, 70", \
 					"72, 74, 76, 78, 80, 82, 84", \
 					"86, 88, 90, 92, 94, 96, 98");
 				}
 				rise_transition ("del_1_7_7") {
 					index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
 					index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
					values("100, 102, 104, 106, 108, 110, 112", \
 					"114, 116, 118, 120, 122, 124, 126", \
 					"128, 130, 132, 134, 136, 138, 140", \
 					"142, 144, 146, 148, 150, 152, 154", \
 					"156, 158, 160, 162, 164, 166, 168", \
 					"170, 172, 174, 176, 178, 180, 182", \
 					"184, 186, 188, 190, 192, 194, 196");
 				}
				cell_fall ("del_1_7_7") {
}

.....

I have tried many time by myself, but I could not find a way to skip non-data lines correctly. Any suggestions? Thank you in advance.

Best,

Jaeyoung

Best,

Jaeyoung

Now that we have a better understanding of the format of the data you're processing, we can throw away a bunch of code that was added to process mistaken assumptions. But, of course, you still haven't given us a clear description of the format of the data you are trying to process. The changed code below handles the additional idiosyncrasies you have now shown us, but we don't know that other parts of your day won't fail in a different way if it doesn't follow the format of the data you have shown us.

Try this:

awk '
BEGIN {	FS = DQ = "\""
	OFS=""
}

FNR == NR {
	if ($2 && ! /[{}:]/)
		T[FNR] = $2
	next
}

FNR in T {
	n = split ($2, N, ",")
	n = split (T[FNR], M, ",")
	$2 = DQ
	for(i = 1; i <= n; i++)
		$2 = $2 (N + M) / 2 (i < n ? ", " : DQ)
}

1
' FileA.txt FileB.txt

which produces output very similar to the format of your input files. With the data you provided in post #7 in this thread, it produces the output:

.... 
 	cell (AND2X1_RVT) {
			timing () {
 				related_pin : "A1";
 				timing_sense : "positive_unate";
 				cell_rise ("del_1_7_7") {
 					index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
 					index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
 					values("1.5, 3, 4.5, 6, 7.5, 9, 10.5", \
 					"12, 13.5, 15, 16.5, 18, 19.5, 21", \
 					"22.5, 24, 25.5, 27, 28, 30, 31.5", \
 					"33, 34.5, 36, 37.5, 39, 40.5, 42", \
 					"43.5, 45, 46.5, 48, 49.5, 51, 52.5", \
 					"54, 55.5, 57, 58.5, 60, 61.5, 63", \
 					"64.5, 66, 67.5, 69, 70.5, 72, 73.5");
 				}
 				rise_transition ("del_1_7_7") {
 					index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
 					index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
					values("75, 76.5, 78, 79.5, 81, 82.5, 84", \
 					"85.5, 87, 88.5, 90, 91.5, 93, 94.5", \
 					"96, 97.5, 99, 100.5, 102, 103.5, 105", \
 					"106.5, 108, 109.5, 111, 112.5, 114, 115.5", \
 					"117, 118.5, 120, 121.5, 123, 124.5, 126", \
 					"127.5, 129, 130.5, 132, 133.5, 135, 136.5", \
 					"138, 139.5, 141, 142.5, 144, 145.5, 147");
 				}
				cell_fall ("del_1_7_7") {
}

.....

Thank you so much!

This works great. I would appreciate your help. I will keep in mind the forum's rule in the next posting.

I have one more question. This would be the last question. How do I skip a specific non-data line?

In my data file, there is a non-data line and this sentence repeats more than 1000 times.

retention_pin("save",0);

After compiling, "save" becomes "0"

retention_pin("0",0);

Could you let me know how to skip a specific sentence?

Best,

Jaeyoung

The answer to your specific question is to change:

	if ($2 && ! /[{}:]/)

to:

	if ($2 && ! /[{}:]/ && ! /retention_pin[(]"save",0[)];/)

Note that any character in the "sentence" you're matching that has special meaning in an extended regular expression must be turned into an ERE regular character. I used character class expressions to escape the parentheses, but there are other ways.

A more general way to avoid the problem would be to just look for an alphabetic character in the second field and skip any lines where a match is found, as in:

	if ($2 && ! /[{}:]/ && ! $2 ~ /[[:alpha:]]/)