Data manipulation with Awk

Cham · August 24, 2009, 2:02pm

Hello guys,

I'm a new member here and I need some help with the Awk application. I'm using it through the Terminal app of OSX (I'm a Mac user).

I have a huge file with a large amount of data (rows of 3D cartesian coordinates). The data is typically like the following example (actually, the data is made of several thousands of "curves" defined by a list of points in 3D) :

I need Awk to read this input file, and write out the number of lines for each "curve" (with some arbitrary symbol in front of it), followed by a list of line numbers. From the data above, Awk should output something like this :

Take note that the first line is numbered as "0".
How can I do that ? Someone has an idea ?

The only data manipulation I know to do with Awk is like this example :

Thanks, and sorry for my bad English.

Franklin52 · August 24, 2009, 2:39pm

Try this:

awk '
{s=n?s " " c++:c++; n++}
/# End/{print "NumberOfLines " n; print s "\n"; n=0; s =""}
' file

protocomm · August 24, 2009, 2:53pm

Hi!!

Franklin 52, could you explain, your command line....i don't understand the part...

{s=n?s " " c++:c++;

Thanks by advance.

Franklin52 · August 24, 2009, 3:08pm

I've used a conditional operator, the form is :

expr ? action1 : action2

If expr is true, return action1 else return action2

s=n?s " " c++:c++

Explanation:

if n != 0 then s = s " " c++ else s = c++

Regards

protocomm · August 24, 2009, 3:10pm

Great, thanks...

Cham · August 24, 2009, 5:25pm

OMG !! It worked like a charm !

THANK YOU SO MUCH !!!

summer_cherry · August 24, 2009, 10:27pm

awk '/End of/{
if(flag==0){
  flag=1
  print "NumberOfLines"NR
  for(i=1;i<=NR;i++){
    printf i-1" "
  }
  pre=NR
}
else{
  num=NR-pre
  print "NumberOfLines"num
  for(i=pre+1;i<=NR;i++){
    printf i-1" "
  }
  pre=NR
}
print ""
}'

Cham · August 25, 2009, 11:32am

I just noticed something bad with my data. Some points are doubled for nothing. Here's a list of Long/Lat coordinates (it's just a sample of a huge data file) :

So I now need to remove all the doubles (A', B', C' in the example above), while keeping the first (A, B, C). Take note that the doubles are always located immediately after the first.

So how can I remove them, using Awk ? I'm pretty sure this should be easy, but I'm really not a specialist of Awk.

Franklin52 · August 25, 2009, 11:53am

You can remove the double lines first with uniq and pipe the result to the awk command:

uniq file | awk '
{s=n?s " " c++:c++; n++}
/# End/{print "NumberOfLines " n; print s "\n"; n=0; s =""}
'

Regards

Cham · August 25, 2009, 1:11pm

franklin52:

You can remove the double lines first with uniq and pipe the result to the awk command:
uniq file | awk '
{s=n?s " " c++:c++; n++}
/# End/{print "NumberOfLines " n; print s "\n"; n=0; s =""}
'
Regards

Ok, but there's a constraint : Actually, the curves are all loops (same START and END points). I don't want to remove these. I just want to remove the useless doubles, which are always standing next from each other (one following the other, as in the example I gave above).

---------- Post updated at 01:11 PM ---------- Previous update was at 12:10 PM ----------

Okay, the uniq command works very well.

Geez, I'm learning !

Again, thank you very much for your help !

Cham · August 28, 2009, 4:13pm

Okay, I have one more problem to solve with Awk (or any other simple method in UNIX) :

I have to remove some lines in the data. The original data have the following shape :

The two first lines after "END" should be removed. How can I do that ? Any idea ?

vgersh99 · August 28, 2009, 5:20pm

nawk 'c&&c--{next} /END/ {c=2}1' myFile

Cham · August 28, 2009, 9:55pm

Thanks. This is working well, but only if I use awk (and not nawk, which isn't recognized on my system).

What is the logic behind this code ? What if we want to remove only 1 line, or three lines ?

Sorry if I'm such a noob !

vgersh99 · August 29, 2009, 8:07am

adjust the 'c=N' accordingly.