AWK - check column data

vasanth.vadalur · December 31, 2009, 12:55pm

Hi,

I have a data in a file .

Infile:

1 e 1.2  1.6 
5 f  2.3  3.6
3 g 1.2  2.6
6 i  2.3  3.6
8 o 1.2  3.6

output:

1 e 1.2  1.6 
5 f  2.3  3.6
3 g 1.1  2.6
6 i  2.2  3.5
8 o 1.0  3.4

In column #3 the first occurence of value is unchanged. the next same value occurence will be reduced by 0.1, and the next occurence will be reduced by 0.2,....it should continue according to the repeats.

Anybody has answer,

Thanks in advance
Vasanth

radoulov · December 31, 2009, 1:44pm

awk '{ 
  for (i=2; ++i<=NF;)
    c[i,$i] = f[i,$i]++ ? c[i,$i] += 0.1 : 0 
  }
{ 
  for (i=2; ++i<=NF;) 
    $i = sprintf("%.1f",$i -= c[i,$i])
  }
 9' infile

gaurav1086 · December 31, 2009, 1:45pm

Hello ,
first of all Happy New year
try using hashes (err maybe in perl )
would come back to you .

Regards.

vasanth.vadalur · December 31, 2009, 2:19pm

Dear radoulov,

I am getting error as ' unexpected.. please can u explain ur program...

Dear gaurav1086,

Thanks. same wishes to all...
Perl i can,t use. awk ans???

Thanks in advance,
Vasanth

radoulov · December 31, 2009, 2:21pm

What error do you get?
If you're on Solaris, you should use gawk (if available), nawk or /usr/xpg4/bin/awk.

vasanth.vadalur · December 31, 2009, 2:26pm

i am using cygwin.

Please, can u explain the above progam.

i will try my best to solve from my side..

Thanks
vasanth

radoulov · December 31, 2009, 2:27pm

It should work on Cygwin.
Please post the error.

gaurav1086 · December 31, 2009, 2:39pm

Hello
Here you go

gaurav@localhost:~$ perl -n -e 'split(/\s+/,$_);foreach $i (@_){if($i!~/[a-z]/){print $i-0.1*$n{$i}," ";$n{$i}++}else{print $i," ";}}print "\n"' infile
1 e 1.2 1.6 
5 f 2.3 3.6 
3 g 1.1 2.6 
6 i 2.2 3.5 
8 o 1 3.4 
gaurav@localhost:~$

cheers ,
Happy New year.

radoulov · December 31, 2009, 2:50pm

It's not clear to me what should be the output for an input like this one:

1 e 1.2 1.6
5 f 1.6 1.2

This:

% perl -n -e 'split(/\s+/,$_);foreach $i (@_){if($i!~/[a-z]/){print $i-0.1*$n{$i}," ";$n{$i}++}else{print $i," ";}}print "\n"' infile
1 e 1.2 1.6 
5 f 1.5 1.1

Or this (unchanged);

% awk '{
  for (i=2; ++i<=NF;)
    c[i,$i] = f[i,$i]++ ? c[i,$i] += 0.1 : 0
  }
{
  for (i=2; ++i<=NF;)
    $i = sprintf("%.1f",$i -= c[i,$i])
  }
 9' infile
1 e 1.2 1.6
5 f 1.6 1.2

gaurav1086 · December 31, 2009, 2:54pm

radoulov:

It's not clear to me what should be the output for an input like this one:

1 e 1.2 1.6
5 f 1.6 1.2

This:

% perl -n -e 'split(/\s+/,$_);foreach $i (@_){if($i!~/[a-z]/){print $i-0.1*$n{$i}," ";$n{$i}++}else{print $i," ";}}print "\n"' infile
1 e 1.2 1.6 
5 f 1.5 1.1

Or this (unchanged);

% awk '{
  for (i=2; ++i<=NF;)
   c[i,$i] = f[i,$i]++ ? c[i,$i] += 0.1 : 0
  }
{
  for (i=2; ++i<=NF;)
   $i = sprintf("%.1f",$i -= c[i,$i])
  }
 9' infile
1 e 1.2 1.6
5 f 1.6 1.2

I think he wants to reduce the value the no. of times it occurs * 0.1 .
Incidently the same value doesnt occur in the different columns in the dataset provided by him.

vasanth.vadalur · December 31, 2009, 3:12pm

Dear Radoulov,

Sorry,It's working fine.... Thanks a lot...

Please can u give a comment. how this program works..

Thanks
vasanth

---------- Post updated at 12:12 PM ---------- Previous update was at 12:04 PM ----------

Dear Radoulov,

If the input file contains more than 4 column, then the output, after fifth column prints as -0.0,-0.1,etc...

The other should be printed as it is..

Thanks in advance..
VAsanth

radoulov · December 31, 2009, 3:33pm

This should fix it.

awk '{ 
  for (i=2; ++i<=4;)
    c[i,$i] = f[i,$i]++ ? c[i,$i] += 0.1 : 0 
  }
{ 
  for (i=2; ++i<=4;) 
    $i = sprintf("%.1f",$i -= c[i,$i])
  }
 9' infile

vasanth.vadalur · December 31, 2009, 3:52pm

Dear radoulov,

Yaa... Its worked fine...

Still if i need to check pairs of third column($3) and fourth column($4), with the next line upto end... wht will be the program..

Eg:

Input file :

w e 4.2 5
t j 5 6
y u 4.2 5
y i 5 5

output:

w e 4.2 5
t j 5 6
y u 4.1 5
y i 5 5

Thanks in advance

radoulov · December 31, 2009, 4:22pm

awk '$3 -= 0.1 * c[$3,$4]++' infile

Thanks to gaurav1086 for the algorithm, I'm not good at math

---------- Post updated at 10:22 PM ---------- Previous update was at 10:07 PM ----------

It should be:

awk '($3 -= 0.1 * c[$3,$4]++)||1' infile

Otherwise you'll loose the records where $3 = 0

tene · January 2, 2010, 2:51am

#!/usr/bin/awk

BEGIN {
   col3[""]=0;
   col4[""]=0;
}

{
   if(col3[$3] <= 0){
       col3[$3] = 1;
       data3 = $3;
   }
   else {
      col3[$3]++;
      data3 = ($3-0.1*(col3[$3]-1)); 
   }

   if(col4[$4] <= 0) {
      col4[$4] = 1;
      data4 = $4;
   }
   else {
      col4[$4]++;
      data4 = ($4-0.1*(col4[$4]-1));
   }

   print $1 $2 data3 data4;

}

This is not tested..

aaiaz · January 3, 2010, 6:05am

another approach:-

nawk 'a[$3$4]++ { $3 -= 0.1}1' test

but kindly radoulov in your solution

awk '($3 -= 0.1 * c[$3,$4]++)||1' infile

why you used multiplication instead of

awk '($3 -= 0.1 && c[$3,$4]++)||1' infile

ahmad.diab · January 3, 2010, 6:15am

aaiaz:-

using multiplication and not && is for the subtraction value (amount) that we have to subtract after pattern repetition so you have to use
"*" not "&&"

by the way ...you code will not work proparly because each time you subtract
"0.1" although in the second pattern you have to subtract "0.2".

BR

:D:D:D

aaiaz · January 3, 2010, 7:53am

Thanks man for the details ...I had understand now.