Data manipulation with Awk

Hello guys,

I'm a new member here and I need some help with the Awk application. I'm using it through the Terminal app of OSX (I'm a Mac user).

I have a huge file with a large amount of data (rows of 3D cartesian coordinates). The data is typically like the following example (actually, the data is made of several thousands of "curves" defined by a list of points in 3D) :

I need Awk to read this input file, and write out the number of lines for each "curve" (with some arbitrary symbol in front of it), followed by a list of line numbers. From the data above, Awk should output something like this :

Take note that the first line is numbered as "0".
How can I do that ? Someone has an idea ? :confused::eek:

The only data manipulation I know to do with Awk is like this example :

Thanks, and sorry for my bad English.

Try this:

awk '
{s=n?s " " c++:c++; n++}
/# End/{print "NumberOfLines " n; print s "\n"; n=0; s =""}
' file

Hi!!

Franklin 52, could you explain, your command line....i don't understand the part...

{s=n?s " " c++:c++;

Thanks by advance.

I've used a conditional operator, the form is :

expr ? action1 : action2

If expr is true, return action1 else return action2

s=n?s " " c++:c++

Explanation:

if n != 0 then s = s " " c++ else s = c++

Regards

Great, thanks...

OMG !! It worked like a charm !

THANK YOU SO MUCH !!!

awk '/End of/{
if(flag==0){
  flag=1
  print "NumberOfLines"NR
  for(i=1;i<=NR;i++){
    printf i-1" "
  }
  pre=NR
}
else{
  num=NR-pre
  print "NumberOfLines"num
  for(i=pre+1;i<=NR;i++){
    printf i-1" "
  }
  pre=NR
}
print ""
}'

I just noticed something bad with my data. Some points are doubled for nothing. Here's a list of Long/Lat coordinates (it's just a sample of a huge data file) :

So I now need to remove all the doubles (A', B', C' in the example above), while keeping the first (A, B, C). Take note that the doubles are always located immediately after the first.

So how can I remove them, using Awk ? I'm pretty sure this should be easy, but I'm really not a specialist of Awk. :frowning:

You can remove the double lines first with uniq and pipe the result to the awk command:

uniq file | awk '
{s=n?s " " c++:c++; n++}
/# End/{print "NumberOfLines " n; print s "\n"; n=0; s =""}
'

Regards

Ok, but there's a constraint : Actually, the curves are all loops (same START and END points). I don't want to remove these. I just want to remove the useless doubles, which are always standing next from each other (one following the other, as in the example I gave above).

---------- Post updated at 01:11 PM ---------- Previous update was at 12:10 PM ----------

Okay, the uniq command works very well.

Geez, I'm learning ! :smiley:

Again, thank you very much for your help ! :slight_smile:

Okay, I have one more problem to solve with Awk (or any other simple method in UNIX) :

I have to remove some lines in the data. The original data have the following shape :

The two first lines after "END" should be removed. How can I do that ? Any idea ?

nawk 'c&&c--{next} /END/ {c=2}1' myFile

Thanks. This is working well, but only if I use awk (and not nawk, which isn't recognized on my system).

What is the logic behind this code ? What if we want to remove only 1 line, or three lines ?

Sorry if I'm such a noob !

adjust the 'c=N' accordingly.