Reading multiple columns in C++

emily · March 11, 2013, 8:57pm

Dear all,
I am novice in C+= programing. I would like to seek help in one of the progra. Here it is, I have txt file which has the data as following order

 varA   varB
-21      0 
-21.2    3, 4, 5, 6
-21.4    45, 65, 87, 98, 98
-22.0    345677, 349887, 98766, 877654, 987543
-23.0   76549, 8764, 9873

I need to plot the
Log(varA) vs average of the varB on graph. I am confuse as the columns for varB are all undecided, how shall I do that? Trying to look for help in google, not much help..

thanks in advance,
emily

hanson44 · March 12, 2013, 12:40am

Do you create the data file, or are you stuck with something? Can you change the data file, because whatever process creates the data file "knows" how many values in varB column:

varA   cnt varB
-21      1     0 
-21.2    4     3 4 5 6
-21.4    5     45 65 87 98 98
-22.0    5     345677 349887 98766 877654 987543
-23.0    3     76549 8764 9873

Then the programming to read the data would be pretty trivial, using fscanf.

emily · March 12, 2013, 5:20am

Hi,
Thanks for the reply.
I have some experimental measurement which I am suppose to put in txt file and also need to plot. So the no of values for varB can be maximum of 6 or 7 maybe.
If you kindly write a small program for doing it, would be useful.
I figured out the TGraph from ROOT is good tool to plot it, but reading the txt file with second column has so many variables.. I am confuse...

thanks
emily

hanson44:

Do you create the data file, or are you stuck with something? Can you change the data file, because whatever process creates the data file "knows" how many values in varB column:
varA   cnt varB
-21      1     0 
-21.2    4     3 4 5 6
-21.4    5     45 65 87 98 98
-22.0    5     345677 349887 98766 877654 987543
-23.0    3     76549 8764 9873
Then the programming to read the data would be pretty trivial, using fscanf.

Corona688 · March 12, 2013, 11:22am

Unless you can tell us what the data file means, or at least what created it, we're going to be as puzzled as you.

emily · March 12, 2013, 11:29am

HI Corona,
Sorry I thought I explained this before, let me try again.
These are the set of measurement, which I got manually from some hardware setup.
And I need to plot the varA vs Average no of the VarB

  varA   varB
-21      0 
-21.2    3, 4, 5, 6
-21.4    45, 65, 87, 98, 98
-22.0    345677, 349887, 98766, 877654, 987543
-23.0   76549, 8764, 9873

So my (x,y) would be

   
x            y
-21         0
-21.2   (3+4+5+6)/4 
------
---and so on

So yes, I know before hand that the txt file has no of reading for varB in separate rows.

thanks again
emily

Corona688 · March 12, 2013, 11:38am

Does this have to be C++? This would be trivial in many languages but annoying in C/C++.

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
        char buf[1024];
        char buf2[1024];

        while(fgets(buf, 1024, stdin) != NULL)
        {
                char *tok;
                float x, y=0.0;
                int n=0;
                if(sscanf(buf, "%f %s", &x, buf2) != 2) continue;

                tok=strtok(buf2, ", \r\n\t");
                while(tok != NULL)
                {
                        float z;
                        n++;
                        if(sscanf(tok, "%f", &z) == 1) y+=z;
                        tok=strtok(NULL, ", \r\n\t");
                }

                printf("%f %f\n", x, y/n);
        }
}

emily · March 12, 2013, 11:45am

thanks Corona,
Actually, I work around with C++ and also on ROOT. I find the combination really good and effective.
It is just that I begin with such task, so sometimes stuck !!

I will see if I can use the TGraph from ROOT here to plot a nice Graph, thanks for your kindness..

Out or curiosity, which other language you would refer, where it would be really easy to plot it?

emily,

Corona688 · March 12, 2013, 11:48am

awk.

awk FS="[ ,]" 'NR>1 {for(N=3; N<=NF; N++) $2+=$N;  $2 /= (NF-1); NF=2 } 1' inputfile

With explanation:

awk FS="[ ,]" # Separate columns on spaces and commas \
        'NR > 1 { # Do not try to do math on the first line with column names
                 for(N=3; N<=NF; N++) # Loop over column 3 to the last column, adding them to column 2 in turn
                         $2 += $N ;
                 $2 /= (NF-1); # Divide the column by 1-number of columns
                 NF=2; # Strip off all columns beyond the second
                 } 1 # Print all lines' inputfile

DGPickett · March 12, 2013, 2:13pm

Never a fan of strtok() or scanf myself, debugging too many core dumps, I guess, especially in C++ (stateful subroutines in OO, ooooh!). I would fgets() lines into a char array and then process them as follows. If the byte is not a number part (strchr( " ,\t\n\r\f", this_char ), strchr( "1234567890-.", this_char) -- I like the positive test), set it to null and reset the in-number flag, else if it is, then if the in-number flag is not set, record the address in a char[], increment the index on that array and set the in-number flag. When you are done, you have an array of pointers to null terminated character strings of numbers to atof() in the processing layer, with the index telling you the count of numbers found.

I would not read 4k into a 1k buf. fgets( buf, sizeof( buf ), stdin ) ?

Every fgets needs a "line too long" check to be stable. WHne this occurs, the last buffer byte is null and then preceeding one is not a line feed. Seed the buffer tail, rather than pawing over the line an additional time, for a quick check. Put it all in a nice subroutine p_fgets() (private fgets) with if ! fgets() if ferror() perror( "stdin" ); exit 1; else exit 0;

Corona688 · March 12, 2013, 2:38pm

It definitely has its flaws. Doing further editing on the string after it's tokenized is a bad idea. It's not re-entrant. And other such gripes.

But seeing your description of how you'd do it, particularly the 'set it to null', that's exactly what strtok does, to the letter. I don't see what makes your way better -- bigger program, slower program, bigger chance for bugs.

Same goes for sscanf. It makes some really complicated problems simple, and vice versa.

If the problem wasn't this simple though, I certainly wouldn't have used those.

DGPickett · March 12, 2013, 2:52pm

When I strtok, I put it in a for! Less magic, I guess! The positive test is more robust, as the world is just line feeds, numbers and not numbers.

I guess it might go awry with "- 1". Does *scanf() have more states to handle that? Can atof() deal with spaces after the sign? If you have them, check! The man says no: Man Page for strtod (all Section 3) - The UNIX and Linux Forums).

hanson44 · March 12, 2013, 3:31pm

Sorry to repeat. But it sounds like you DO create the data file. Is that correct?

If so, what if you simply change the data file to something like the following, and then the programming to read the data would be trivial, using fscanf.

varA   cnt     varB 
-21      1     0  
-21.2    4     3 4 5 6 
-21.4    5     45 65 87 98 98 
-22.0    5     345677 349887 98766 877654 987543 
-23.0    3     76549 8764 9873

DGPickett · March 12, 2013, 3:46pm

CSV might be nice, too -- just read with excel. However, differentiating numbers from not is pretty trivial. The major challenge here is the variable number of fields. A more normal form would have one varB per line, and downstream, assuming you do not lose the sort, you process when the two keys change or EOF. Normal could be put in RDBMS and processed in SQL. You are writing in two dimensions, sometimes across and sometimes down. But that's OK.

Nohim_Ys · March 13, 2013, 12:06am

Thx for this code

corona688:

Does this have to be C++? This would be trivial in many languages but annoying in C/C++.

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
   char buf[1024];
   char buf2[1024];

   while(fgets(buf, 1024, stdin) != NULL)
   {
   char *tok;
   float x, y=0.0;
   int n=0;
   if(sscanf(buf, "%f %s", &x, buf2) != 2) continue;

   tok=strtok(buf2, ", \r\n\t");
   while(tok != NULL)
   {
   float z;
   n++;
   if(sscanf(tok, "%f", &z) == 1) y+=z;
   tok=strtok(NULL, ", \r\n\t");
   }

   printf("%f %f\n", x, y/n);
   }
}

achenle · March 13, 2013, 12:10pm

It's probably better to use "strtod()" or "atof()" to convert a string to a double, if only because "sscanf()" can and does do weird things when input is not as expected. "strtod()" also provides the ability to know the last character processed, so it's easier to tell if something went bad.

DGPickett · March 13, 2013, 3:18pm

Can even be done in shell with bc for the floating point, something like:

while read a b c
do
 f=$(
  ( echo 'e=0
n=0'
    for d in $c
    do
     echo 'n=n+1
e=e+'"$d"
    done
    echo e/n
   ) | bc -l | sed 's/\.*0*$' )
 echo $a $b $f
done <in_file >out_file

I second the sscanf() concerns. I like to validate my inputs -- never give inputs the benefit of the doubt. The man says atof() is strtod() with less error checking. You get to choose. And that is UNIX in a nutshell: lots of options, support for quick, devil-may-care code, but the wise chose the more robust alternatives.

emily · March 13, 2013, 3:20pm

hi,
Thanks for the reply..but I am confuses, I need to plot both columns, how am I suppose to plot them the?
So you propose to use shell script to make them like single valued columns..and then call in C++ code to plot?

emily,

dgpickett:

Can even be done in shell with bc for the floating point, something like:

while read a b c
do
 f=$(
  ( echo 'e=0
n=0'
   for d in $c
   do
   echo 'n=n+1
e=e+'"$d"
   done
   echo e/n
   ) | bc -l | sed 's/\.*0*$' )
 echo $a $b $f
done <in_file >out_file

DGPickett · March 13, 2013, 3:58pm

I am sure there are shell friendly graph programs, too. Heck, you can lay down images in text and then call utilities to make them into images! I am never shy about using a popen() to read shell output in C/C++! You can use system() to make an named pipe and sort it to itself, so all you have to do is write the named pipe and then read it. Unlike COBOL, no embedded sort!

system( "(mknod p p ; sort -o p p ; rm p )&" );
FILE *f = fopen( "p", "w" );
while (...) { ... fputs( buf, f ); ... }
fclose( f );
f = fopen( "p", "r" );
while ( fgets( buf, sizeof( buf ), f )){...}
fclose( f );