Awk: Assigning a variable to be the value of FNR at a certain line

Sorry for the probably strangely worded title but I don't really know how else to put it.

Background context: Post processing LAMMPS simulation data.
tl;dr: I'm making two spheres collide, every defined timestep the simulation outputs a bunch of data including total energy of the particles, their coordinates, etc...

Here is my problem:

I'm using awk to post process this data, what I need right now is for it to find the closest point two values approach, give me the closest approach distance (called overlap in the code since it's actually two particles overlapping slightly) and give me the line number that happens on.

So X1 & X2 will first approach each other, reach a closest point then distance themselves from each other.
Finding the closest distance is not a problem.
I do it thus:
The relative code includes the bits with 'overlap' 'max' 'dx' 'z1' 'z2' etc..

The other code is for a similiar problem: Once the energy reaches equilibrium I want that equilibrium and the line where it happens.
However in this case that's not a problem, since it will be the last line treated so FNR is correct.

BEGIN{
oldte=0
te2=1
max=0
OFS=","			# Changes Output Field Seperator to a comma (default is space) 
				# so it can be appended to a .csv file which excel opens without a prompt
printf("%s,%s,%s,%s,%s,%s,%s,\n","Change","Et1","Et2","ColDt","dt","steps","max")

}
{
	if((FNR+1)%11==0)
	{
		te1=$5
		z1=$1
		
	}
	
	if(FNR%11==0){
		te2=$5
		dt=$6
		z2=$1
		dx=(z2-z1)
		overlap=(0.05-dx)
		if(overlap>max)
		{
			max=overlap							# overlap calculations
		}
		if(oldte==te2)
		{
			steps=FNR/11
			colDt=dt*steps
			printf("%s,%s,%s,%1.8f,%s,%s,%1.8f\n",change,te1,te2,colDt,dt,steps,max)
			exit														
		}
		else
		{
		oldte=$5
		}
		
	}
		
}
END{

}

So the problem here is that I can't use

ClosestPoint=FNR/11

in the if expression for finding the max value of overlap because when I call that variable to be printed along with the other data, it will be reevaluated and use the FNR at that point in the input file (which will be to far down so to speak).

It's probably a lot more explanation then needed for my question but I thought I'd give as much context as possible.

I guess it just comes down to this:

Can I set a variable X to equal the value of NFR at the time that X is set and not have X be reevaluated when I print it.

I am having trouble understanding your problem..

In general I can say that is you are processing data and can only find out if a certain line contains the right data later one or more lines later on, then you could record the data into a variable. For example if you want to use printf(...) to print a value, instead you can use some_var=sprintf(...) to record an output line into a variable, and print this recorded variable ( print some_var ) one or more lines down when you are able to conclude that it was indeed the right value..

Hope this helps and/or makes sense...

1 Like

So f.e.

some_var=sprintf(NFR)

and then later

print(some_var)

Would give me the original value and some_var would not be reevaluated as being =NFR ?

Right. Of course in this example, you would record somevar=FNR , I only used sprintf() as an example so you can record including the formatting.

This works if the proper line is the previous line. If it can be several lines back, you can use a "circular buffer" by using an array,

For example:

A[FNR%4]=sprintf(...) would record the last 4 lines, so you can select which one to print...

Ah yes see that's the problem, the 'overlap' NFR will always be several (100+) lines before the oldte==te2 (energy equilibrium) NFR, because of how the simulation works.

So if the some_var still gets evaluated as =NFR when I print it at the end
f.e. print(some_var) then i'll have the wrong value. On the other hand if it only gets evaluated as being = an integer (a previous NFR in this case) than I'm golden.

An array wouldn't help me since I can't know which line I need, that's the whole point of getting the NFR value at that point.

I'm not at all sure that I understand what you're trying to do, but if you change:

		if(overlap>max)
		{
			max=overlap							# overlap calculations
		}

to:

		if(overlap>max)
		{
			max=overlap							# overlap calculations
			max_line=FNR
		}

then when you get to the END section of your awk script, the variable max contains your largest overlap you found and the variable max_line contains the line number in your input file where you found that maximum.

That's the thing I did something similar and it didn't give me the correct FNR instead it seemed to reevaluate the variable (max_line in your example) to =NFR

Although I'm not sure whether I put that printf() in the END section, then it the variable might not get reevaluated

Maybe if you post some more details we could find a way to help you better. Right now, you're working on two adjacent lines, each pair 10 lines apart. Must be a tribute to your simulation output? The nomenclature used doesn't help either.
A (reduced set of) sample data might help to interpret/find out what you really need...

First of all thanks for your help guys, sorry for my brief absence I had moved on from this to work on something else (also for my thesis).

Good news: the sprintf() did exactly what I needed it to do ! :smiley:

So my problem is solved.

For the sake of completeness though I'll try to again explain my problem, how it was solved and I'll attach some data as a sample although it's quite messy (hence the awk processing) but it might be usefull.

Context:
I'm getting about 20 000+ lines per output file, every 11 lines is 'one set of data'.
The simulation works in timesteps, where every X seconds (10-7 seconds in my case) it will calculate what the particles should do next, output data and then repeat the process on the next timestep.
This awk script only needs a few fields, namely $1,$5 and $6 and only every 11th line and every line before that.

The output is generated by LAMMPS an open source modeling software package, in this case I'm using it to perform a granular simulation of two identical particles colliding with each other.

(from here on out, my translation from Dutch to English might be lacking in certain scientific terms but I'll try to make it as clear as possible so bear with me)

In order to perform parametric studies (or scoping studies) I'm comparing how long it takes for the particles to reach a constant energy (this means the collision is over) and how much energy is left.
On top of that I want to determine the distance the particles overlap during the collision simulation and at what point they reach max overlap.

So back to awk.

Each particle's properties are outputted on a separate line.
This required me to compare values from one set of 11 lines to the next, that's what the if(NR%11==0) is for, so every 11th line is being compared to the next and also every line before the 11th to the next, hence the if( (NR +1)%11==0).

When constant energy has been reached I use the exit command to avoid unnecessary computing (in this case it's barely milliseconds that I save in computing time but i'll soon be up-scaling this to simulations that take days to run).

Here comes the problem.

Initially I stored the NFR value of the maximum overlap point in a variable simply like so:

overlapPoint=NFR

Further along the input file when constant energy was reached, I printed this variable along with the others I had calculated but it turned out to have changed.
That is to say, when I used the printf() to print 'overlapPoint' it printed the current value of NFR, in other words it reevaluated the 'overlapPoint' variable before printing it.

Using:

overlapPoint=sprintf(NFR)

however, awk did not reevaluate the 'overlapPoint' variable, since the reference to the internal NFR variable is apparently lost thanks to the sprintf() command.

Which solved the problem for me :slight_smile:

If anyone has any suggestions to make my title clearer I'll gladly edit it if I can.

I'm not sure I understand what you are striving for, plus there are some inconsistencies in your spec (NFR variable? change variable?) Howsoever, looking at your sample file, I see it has the time step in each record, and also an indicator for the data you nedd. I tried to simplify your code snippet and came up with this:

awk '
BEGIN           {printf "Change,Et1,Et2,ColDt,dt,steps,max,maxtm\n"
                }

/ITEM: TIMESTEP/        {getline TMSTP
                        }

/ITEM: ATOMS/   {getline
                 te1 = $5
                 z1  = $1

                 getline
                 te2 = $5
                 dt  = $6
                 ol = 0.05 + z1 - $1
                 if (ol > max)  {max = ol               # overlap calculations
                                 maxtm = TMSTP
                                }
                 if (oldte == te2)
                        {colDt = dt * TMSTP
                         printf("%s,%s,%s,%1.8f,%s,%s,%1.8f,%d\n",change,te1,te2,colDt,dt,TMSTP,max,maxtm)
                         exit
                        }
                 oldte = $5
                }

' file
Change,Et1,Et2,ColDt,dt,steps,max,maxtm
,6.468976533,6.554952976,0.00000150,0.000000100,15,0.00004660,7

which is pretty similar to what your code yields (except for the TIMESTEP which is one less as it starts with 0). Please make sure the DOS line terminators (<CR>, 0x0D) are removed from the file before you try it.

My bad I should've marked this as solved