awk - filtering data by if --> into an array

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted!

  1. The problem statement, all variables and given/known data:
    my data in csv-format ...
...
13/08/2012,16:30,303.30,5.10,3,2,2,1,9360.0,322
13/08/2012,16:40,305.50,5.00,3,2,2,1,12360.0,322
13/08/2012,16:50,319.90,3.80,3,2,1,0,2280.0
13/08/2012,17:00,326.10,3.50,3,2,1,1,4380.0,321
13/08/2012,17:10,333.00,3.80,3,3,1,0,2280.0
13/08/2012,17:20,330.20,4.50,3,3,2,0,2700.0
13/08/2012,17:30,332.00,4.20,3,3,2,1,5220.0,332
13/08/2012,17:40,334.90,3.20,3,3,1,0,1920.0
13/08/2012,17:50,337.30,3.10,3,3,1,1,3780.0,331
13/08/2012,18:00,343.20,2.20,3,3,1,1,5100.0,331
13/08/2012,18:10,345.20,1.30,3,3,1,1,5880.0,331
13/08/2012,18:20,340.70,1.60,3,3,1,1,6840.0,331
...

finally i want to display the three-number -index in the last collumn in a gnuplot-graf. Depending from the first number should be chosen a different color. Herefor i want do prepare the data with the awk-script, so that i achive ...

...
08/13/12-16:30  \t      \t      322    \t
08/13/12-16:40  \t      \t      \t      422
08/13/12-16:50  \t      \t      \t      \t
08/13/12-17:00  \t      \t      321    \t
08/13/12-17:10  \t      \t      \t      \t  
08/13/12-17:20  \t      \t      \t      \t  
08/13/12-17:30  \t      \t      322    \t
08/13/12-17:40  \t      \t      \t      \t  
08/13/12-17:50  \t      \t      321    \t
08/13/12-18:00  131    \t       \t      \t
08/13/12-18:10  \t      \t      322    \t
08/13/12-18:20  \t      231    \t      \t
...
  1. Relevant commands, code, scripts, algorithms:
    here a awk-script changes the date and time-format
cd $home
        if test -f "$f" 
        then
        awk '{FS=",";  print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2) "-" $2  "\t" $10}' $home/$f > index.txt
        else
        DAT=$(date +%Y%m%d)
        printf "$DAT - Datei $i nicht vorhanden" >> "$home/Logfiles/Log_$(date +%Y%m%d).log"
    fi
08/13/12-16:30  322
08/13/12-16:40  422
08/13/12-16:50  
08/13/12-17:00  321
08/13/12-17:10  
08/13/12-17:20  
08/13/12-17:30  332
08/13/12-17:40  
08/13/12-17:50  331
08/13/12-18:00  131
08/13/12-18:10  331
08/13/12-18:20  231
  1. The attempts at a solution (include all code and scripts):
    i tried to upgrade the script to seperate the triplet in column 3
#!/bin/bash
f=APS-Index_20120101_20120814.dat
home=$HOME/Desktop/Bachelor_Projekt/APS3321/Messdaten/APS_AUSWERTUNG_TRIPLET-Daten
cd $home
	if test -f "$f" 
	then
	awk -F"," '{ 
                      if($10 > 400)
                          print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2) "-" $2  "\t\t\t" $10       $home/$f > index.txt;
	              else if ($10 < 400 && $10 > 300)
                          print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2) "-" $2  "\t\t" $10 "\t"    $home/$f > index.txt;
                      else if ($10 < 300 && $10 > 200)  
                         print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9,  2) "-" $2  "\t" $10 "\t\t"    $home/$f > index.txt;
	              else if ($10 < 200 && $10 > 100)
                         print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2) "-" $2  "\t" $10 "\t\t\t"   $home/$f > index.txt;
	else
	print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2) "-" $2  ""   $home/$f > index.txt;
	}'
	else
	DAT=$(date +%Y%m%d)
        printf "$DAT - Datei $i nicht vorhanden" >> "$home/Logfiles/Log_$(date +%Y%m%d).log"
    fi

I tried to do it with the above named awk if-statement, but it will not work like this.
Can someone please help me to do it the right way.
Thanks in advance!
IMPe

  1. Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):
    University of Applied Sciences Aachen, Germany, Prof.Dr.Helsper, Bachelor Thesis

Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).

1 Like

Where do the 422, 231, 131 come from? They're not in your input data...

If I understand you correctly -- and I'm not sure I do, given that discrepancy -- you can use a trick to do what you want.

awk doesn't just let you read columns -- it lets you assign columns, too. The output separator is controlled by the OFS variable, and can be different from the input.

So you can do something like this:

awk -v OFS="\t" '{ ...
        $0="date and timestamp" # Set entire line to this value, overwriting $1 $2 ... variables too
        N=5 # Fifth column
        $N="value" # Set Nth colum to "value"
        print # print entire buffer
        }' file

to set the fifth, fourth, or whatever column to 'value' depending on which number N is set to. It will insert the right number of tabs in front as appropriate.

1 Like

sorry, i've forgotten to mention this - the last three numbers in column $10 could be numbers from 111 up to 433. These triplets i use to classify several data conditions.

What relation does your output have to your input then? If your output doesn't match your input, we're only guessing.

My guess is, 122 would end up in the second column, 233 would be in the third, 344 would be in the fourth, and 455 would be in the fifth. But I can't really tell for sure.

1 Like

Hi corona688!
First of all, thank you for helping me to solving my problem..
the triplets should be sortet by there first number acording to the timestamp.
I would like the triplets printout in a gnuplot-grafik. there should be the 4**-Triplets in red, the 3** in green and so on. therefor i need the data-triplets in this strukture.

...
08/13/12-16:30  \t      \t      322    \t
08/13/12-16:40  \t      \t      \t      422
08/13/12-16:50  \t      \t      \t      \t
08/13/12-17:00  \t      \t      321    \t
08/13/12-17:10  \t      \t      \t      \t  
08/13/12-17:20  \t      \t      \t      \t  
08/13/12-17:30  \t      \t      322    \t
08/13/12-17:40  \t      \t      \t      \t  
08/13/12-17:50  \t      \t      321    \t
08/13/12-18:00  131    \t       \t      \t
08/13/12-18:10  \t      \t      322    \t
08/13/12-18:20  \t      231    \t      \t
...

The main problem i think, is the if-structure. I'm not quit famuiliar with this part in the awk. I think the script start struggeling by defining the Fieldseperator.

Thanks
IMPe

You haven't answered my question. What relation does your input have with your output? Does the column it ends up in, depend on the first digit of the triplet?

That is what I am assuming since you refuse to say, but in all probability I have guessed wrong...

You are making it way more complicated than you need to with if/else if/else if/else if. You are also making it way more complicated than you need to with all those substr things. Just use columns to get the data you want and assign it where you wish. N=5; $N="abcd" sets column 5 to abcd for instance without needing 5 if/else if/else if/else if's to count up to 5. The number could be 3, 7, or 99 and it would work the same way.

awk -F"," -v OFS="\t" '{
        L=$10
        N=substr(L, 1, 1)+1;
        $0=$1"-"$2;
        if(L) $N=L } 1' input > output
1 Like

Hi

Unfortunatelly i found out after all, that gnuplot do have problems displaying data in the created structure.

...
08/13/12-16:30  \t      \t      322    \t
08/13/12-16:40  \t      \t      \t      422
08/13/12-16:50  \t      \t      \t      \t
08/13/12-17:00  \t      \t      321    \t
...

it has to be an "-" on every missing number.

...
08/13/12-16:30  -      -      322   -
08/13/12-16:40  -      -      -      422
08/13/12-16:50  -      -      -      -
08/13/12-17:00  -      -      321   -
...

The OFS is still "\t". I try to do it on the based script, but i break down on it. CAn you please give me one more hint.

Thanks in advance!
IMPe

You haven't answered my question. What relation does your input have with your output? Does the column it ends up in, depend on the first digit of the triplet? Yes? No? Maybe?

This might do what you want... Maybe...

awk -F"," -v OFS="\t" '{
        L=$10
        N=substr(L, 1, 1)+1;
        $0=$1"-"$2";
        for(N=2; N<=5; N++) $N="-"
        if(L) $N=L } 1' input > output
1 Like

Hi Corona688!

Thanks for your help. First i will answer the question:
The first triplet-number should be the marker for the coresponding column.
for example:

  • triplet 233 according to the timestamp in column 3
  • triplet 123 according to the timestamp in column 2
  • triplet 433 according to the timestamp in column 5
  • triplet 311 according to the timestamp in column 4
    Sorry for the slovenly reaktion on your question.
    Finally there a two points:
  1. I change the code slightly, because i have to change the dateformat
 awk -F"," -v OFS="\t" '{
                L=$10
                N=substr(L, 1, 1)+1;
                $0=substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2)"-"$2;
                for(N=2; N<=4; N++) $N="-"
                if(L) $N=L } 1' input.dat > output.txt
  1. The code is working efficently and fast, but the result is
    every triplet is written into the last column ..
01/01/12-00:50  -       -       -       423
01/01/12-01:00  -       -       -       423
01/01/12-01:10  -       -       -       423
01/01/12-01:20  -       -       -       -
01/01/12-01:30  -       -       -       -
01/01/12-01:40  -       -       -       -
01/01/12-01:50  -       -       -       223
01/01/12-02:00  -       -       -       223
....

I try to solve it but without results - can you please help me.

Thanks in advance!
IMPe

Don't use N in the for loop; it's holding the col no. for your output.

i had made some changes to your awk command

awk -F"," '{ 
                      if($10 > 400)
                          print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2) "-" $2,"-","-","-",$10
	              else if ($10 < 400 && $10 > 300)
                          print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2) "-" $2,"-","-",$10,"-"
                      else if ($10 < 300 && $10 > 200)  
                         print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9,  2) "-" $2 ,"-",$10,"-","-"
	              else if ($10 < 200 && $10 > 100)
                         print  substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2) "-" $2,"-",$10,"-","-","-"}' OFS=\t  $home/$f > index.txt
1 Like

Fix for the code which probably doesn't do what you want:

awk -F"," -v OFS="\t" '{
                L=$10
                N=substr(L, 1, 1)+1;
                $0=substr($1, 4, 2) "/" substr($1, 1, 2) "/" substr($1, 9, 2)"-"$2;
                for(M=2; M<=4; M++) $M="-"
                if(L) $N=L } 1' input.dat > output.txt

But HOW do you get 5 from 433? :wall:

HOW do you get 4 from 311? :wall:

You have not shown representative input for your output, so we are only guessing.

We will be stuck guessing until you show representative input.

Please show representative input and output -- lines that have anything to do with each other! -- or I will close this thread.

1 Like

Pardon me, but i'm afraid i didnt understand you right - sorry for the confusion.
i will try to concentrate my specification.

  1. a snippet of the original input data [csv] i posted in #1
    there i've forgotten to mention, that the triplet number in column $10 varies from
    111 ... 433 [ #3 ]
  2. the awk-script should basically
  • read the input data [$1], time [$2] and the triplet-number [ $10 ]
  • change the dateformat and accomplish it with the time-value to one timestamp.
  1. the output >
    in column $1 write the timestamp and depending from the first tripletncipher the related tripletnumber in the aproppriate column - for example
$1                      $2    $3     $4     $5
...
02/04/12-06:10  131   -       -       -
02/04/12-06:20  -       213   -       -
02/04/12-06:30  -       -       311   -
02/04/12-06:40  -       -       -       421
02/04/12-06:50  -       -       331    -
...

.. so that a triblet-no. with first cipher 4 should be written in column $5
or a triplet-no. with first cipher 1 should be written in column $2
or a triplet-no. with first cipher 3 should be written in column $4
and a triplet-no. with first cipher 2 should be written in column $3
and the occurred, TAB-separated blank spaces should be filled up with "-"

But with the last changes, the script is very efficently and exactly generating output as desired.

Thank you very much!
IMPe

1 Like

A simple 'yes' pages ago would have been sufficient since my original guess, by pure luck, turned out to be correct.

Thank you.

1 Like