I want to write a program that would read this database and tell me when the 3rd item of each group is smaller than the second item of the same group or when the 1st item is greater than the 2nd item of that group. Basically item1 < item2 < item3 is the standard within the same group. If this is not met within that group, The script would print the output of all the groups that are different. For example, the output would be:
However it won�t do it. With the commas in the format, your script does not work; with the equal sign, it works. Any idea what I am doing wrong?
---------- Post updated at 11:24 AM ---------- Previous update was at 09:46 AM ----------
Disregard my previous posting. I mistakenly enterer a comma instead of a semicolon. That's why it was not working.
Thanks!
---------- Post updated at 11:31 AM ---------- Previous update was at 11:24 AM ----------
Now, if you could explain the script to me, that would help. I need to modify it somehow. Basically what I am trying to do is to the following:
The output data for all the groups has 6 entries.But while most of the groups has only three entries filled, some groups have all 6 entries filled. What I want to do is Whenever the script reads the file and sees a group with only 3 entries filled, it needs to print the three rows that have data but strip the other three empty rows; for the groups with 6 entries filled, it just needs to print them out. That would be fantastic if you or someone else could help.
Thanks!
---------- Post updated at 07:56 PM ---------- Previous update was at 11:31 AM ----------
Hey rdcwayx,
I tried your script but got this error message.
awk: record `1.3=12
1.2=348
1.1=180
...' too long
For your info, my input file format was as follows:
Firstly, the Perl one-liner posted earlier will *not* work for groups of 6 lines. It works *only* for groups of 3 lines.
Secondly, it may be beneficial if you could post some test data for groups of 3 lines as well as 6 lines. And post the output as well that shows what exactly is to be done for each group.
The input you've posted above is difficult to comprehend because it does not have the blank line that separates one group from the other.
Okay, I have 6 groups. Now I want the script to go through these groups and look at the structure of each group. If the .1 (within a group) is greater than the .2 or .3 OR if .2 is greater than .3 within a group; thus output these groups. In our case the output would be groups.
BEGIN {
RS=FS=""
ORS="\n\n"
}
{
n=split($NF,t, "=")
one=t[n]
n=split($(NF-1),t, "=")
two=t[n]
n=split($(NF-2),t, "=")
three=t[n]
if (one > two || one > three || two > three) print
}
the block of code is the content of ernst.awk file.
Your input to the script is assumed to be in file 'inputFile'.
The calling sequence is:
nawk -f ernst.awk inputFile
Interestingly enough, your script works fine with the sample that I published (short file) but does not work for a larger file, my database for example. When I use your script for my database which contains hundreds of goupings, I have the following error message:
awk: record `1 .6=
1 .5=
1 ...' too long
---------- Post updated at 08:35 AM ---------- Previous update was at 08:23 AM ----------
vgersh99:
I do not get an error message when I run your script with my database, but the output file is empty. I do not get any data which is wrong because I know for sure some of the goupings met the conditions that I listed.
Durden:
You are right. your scripts work fine for the small sample I provided you with. However, they do not work for my huge file.
With my huge file, I do not get an output file. Whenever I cat the output, I do not get any data.
I did not get an error message either.
your real data file is different with the sample data.
No blank line between each group.
first column has space, but in your sample data, there is no space !
1 .6=
1 .5=
1 .4=
1 .3=12
1 .2=348
1 .1=180
each group has 6 lines. some groups has no data in line 4, 5 or 6.
That's why our scripts can't work on your read data.
---------- Post updated at 01:18 PM ---------- Previous update was at 12:36 PM ----------
According your read date, I updated the script:
awk -F "[.=]" '{a[$1]; b[$1,$2]=$3}
END {for (i in a) {if (b[i,1]>b[i,2]||b[i,2]>b[i,3]||b[i,1]>b[i,3])
for (j=6;j>=1;j--) printf "%3s.%s=%s\n", i,j,b[i,j]
}
} ' urfile
Well, how huge is your huge file ? 1000 lines ? 10,000 lines ? 100,000 lines ? 1 million lines ?
And, there was no output file in my suggested script. So if you executed the Perl one-liner as I had posted, you wouldn't see any output file either.
The Perl one-liner processes your input file ("input.dat" in my post) and spews the output on stdout - which is your Terminal screen by default.
If you mean displaying the output file with the use of the "cat" command, then did you redirect the output to a file first ?
If yes, then can you post what exactly you typed on your Terminal screen ? (i.e. can you copy/paste the session from your Terminal screen).
As posted by others, your input files do not show consistent data. This is what you've posted earlier:
As you can see, the differences are listed below:
Difference # 1 : Your old input file did not have space between "1" and ".", whereas your new file has the space.
# First line of old input file
1.6=176
# First line of new input file
1 .6=
Difference # 2 : Your old input file has a number to the right of every single "=" character. Your new input file does not have a number to the right of every single "=" character.
# First 5 lines of old input file
1.6=176
1.5=172
1.4=168
1.3=14
1.2=13
# First 5 lines of new input file
1 .6=
1 .5=
1 .4=
1 .3=12
1 .2=348
Difference # 3 : Your old input file has blank lines at the end of each "group". Your new input file does not have even a single blank line.
# First 10 lines of old input file; it has two "groups" with a blank line to separate them
1.6=176
1.5=172
1.4=168
1.3=14
1.2=13
1.1=12
230.3=146
230.2=147
230.1=148
# First 10 lines of new input file; it has no blank lines anywhere in the file
1 .6=
1 .5=
1 .4=
1 .3=12
1 .2=348
1 .1=180
10 .6=
10 .5=
10 .4=
10 .3=360
Needless to say, you shouldn't expect consistent solutions to inconsistent problems !
Sure thing. Since you did not mention how huge your input file is, I'll assume it has 2 million lines.
Here's what I did. I took this input file "input.dat" and kept on appending the content over and over to another file called "input.txt".
The final line count of "input.txt" is 2 million lines roughly.
Here's some information about "input.txt".
$
$ # the line, word and character counts of "input.txt"; note that it has 2,062,500 lines
$ wc input.txt
2062500 1687500 14625000 input.txt
$
$ # the first 10 lines of "input.txt"
$ head input.txt
1.6=176
1.5=172
1.4=168
1.3=14
1.2=13
1.1=12
230.3=146
230.2=147
230.1=148
$
$ # the last 10 lines of "input.txt"
$ tail input.txt
100.5=172
100.4=168
100.3=20
100.2=12
100.1=16
117.3=24
117.2=82
117.1=79
$
And now, I run the Perl one-liner on the file "input.txt" and redirect the output to file "output.txt".
I also feed the entire one-liner to the "time" command.
$
$
$ time perl -lne 'chomp;
if (/^\s*$/) {
if ($x>$y or $x>$z or $y>$z) {print foreach (@a); print}
@a=(); $x=$y=$z="";
} else {
push @a,$_;
if (/^\d+\.1=(.*)$/) {$x = $1}
elsif (/^\d+\.2=(.*)$/) {$y = $1}
elsif (/^\d+\.3=(.*)$/) {$z = $1}
}
END {if ($x>$y or $x>$z or $y>$z) {print foreach (@a); print}}
' input.txt >output.txt
real 0m15.125s
user 0m0.015s
sys 0m0.031s
$
$
$ wc output.txt
937500 750000 8250000 output.txt
$
$ head output.txt
230.3=146
230.2=147
230.1=148
100.6=176
100.5=172
100.4=168
100.3=20
100.2=12
100.1=16
$
$ tail output.txt
100.5=172
100.4=168
100.3=20
100.2=12
100.1=16
117.3=24
117.2=82
117.1=79
$
$
And that's 15.125 seconds to process 2 million lines.