Something went awfully wrong in PHP+MySQL :(

Legend986 · October 30, 2007, 1:08am

When I'm running a few statements through mysql console, they are working just fine but when I'm trying to write some logic in php and trying to execute it, its awfully slow.. Its perhaps performing, 10 queries in 20 seconds... Why is this happening?

My table structures are:

Name: asn_number
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| asn | mediumint(9) | | MUL | 0 | |
| ip_address | varchar(18) | | MUL | | |
+------------+--------------+------+-----+---------+-------+

And

Name: trace
+-------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------------+------+-----+---------+----------------+
| id | double | | PRI | NULL | auto_increment |
| gid | int(11) | | | 0 | |
| sno | tinyint(4) | | | 0 | |
| asn | tinyint(4) | | MUL | 0 | |
| ip | varchar(18) | | MUL | | |
| r1 | decimal(10,3) | | | 0.000 | |
| r2 | decimal(10,3) | | | 0.000 | |
| r3 | decimal(10,3) | | | 0.000 | |
+-------+---------------+------+-----+---------+----------------+

Can someone tell me what could've gone wrong?

Yogesh_Sawant · October 30, 2007, 1:35am

The bottleneck can be at database level, web server level, at php level, or in your php script itself. Try to identify at which level is the slowness.

If all web pages at your web server are s-l-o-w, then you should check your web server configuration.

If other pages (such as html, perl-cgi etc) are fast, but only php pages are slow, then check what may be wrong with php.

If other php pages are fast and this particular page is slow, then check the php code.

I am not a database expert, so can't say much about it. Particularly, check if you can optimize the query that you are using in your php script.

If number of records in the tables are huge, and your query results in output of say 1000 records, then transferring that much data from web server to client (browser) may be slow.

Legend986 · October 30, 2007, 3:06am

Thank you... Actually I tried running the same script locally (by the way, I am running all this on a dedicated Unix machine and not a web server as such) and the same problem repeated. I mean, its doing everything with the same degree of slowness...

What I'm trying to do is an adjacent row comparison and if they are different, I'm copying them onto a different table. The code goes something like this:

<?php

for($i=2;$i<2749750;$i++) {
	//Get the previous row
	$sql_prev = "SELECT * FROM trace WHERE ID=".($i-1);
        $result_prev = mysql_query($sql_prev);
	$row_prev = mysql_fetch_assoc($result_prev);
	//Get the current row
	$sql_cur = "SELECT * FROM trace WHERE ID=".$i;
	$result_cur = mysql_query($sql_cur);
	$row_cur = mysql_fetch_assoc($result_cur);
	
        //Now select the corresponding ASN numbers from the other table
	$sql_prev_asn = "SELECT * FROM asn_number WHERE ip_address='$row_prev[ip]'";
	$result_prev_asn = mysql_query($sql_prev_asn);
	$row_prev_asn = mysql_fetch_assoc($result_prev_asn);

	$sql_cur_asn = "SELECT * FROM asn_number WHERE ip_address='$row_cur[ip]'";
	$result_cur_asn = mysql_query($sql_cur_asn);
	$row_cur_asn = mysql_fetch_assoc($result_cur_asn);
		
	if(($row_prev_asn['asn'] != $row_cur_asn['asn'])) {
		$sql = "INSERT INTO br(asn1,ip1,asn2,ip2) VALUES('$row_prev_asn[asn]','$row_prev[ip]','$row_cur_asn[asn]','$row_cur[ip]')";
		$result = mysql_query($sql) or die(mysql_error());

	}
}

?>

As you can observe, there are many queries and this is the only way I thought would work... Do you think because I'm trying to say "SELECT * FROM trace WHERE ID=" so many times and that the trace databases is big (around 200 MB) this problem is occuring? But I've actually indexed ID and ip fields...

Yogesh_Sawant · October 30, 2007, 4:05am

you are running so many queries... that is the reason the page is slow

instead, you can run one query and get all the data in an array. then run the loop over that array to do the computations.

or write a stored procedure that will do the computations for you. this will save running all the thousand queries (i am not so good with stored procedures, so can't tell you exactly how it would be)

matrixmadhan · October 30, 2007, 8:59am

couple of questions and suggestions to you

1)
Are you using all the fields from the table trace ? I could see only the field 'ip' being used.
If am right with the above, then what is the need for select ' * ' from table.
You don't need a ' * ' for that.

select * from table;

is expensive than

select specific_column from table;

so if you know the column that you are interested just use the column_name in the query.

2)
Query is run again and again where the data values are known in prior.

Consider this,

i=2; i<somenumber; i++

i=1 ( executed )
i=2 ( executed )

for the next iteration

i=2 ( executed )  //why this should be executed again, value is known in the previous iteration itself ???
i=3 ( executed )

effectively for ' n ' iterations you have ' n ' extra queries which I feel can be completely avoided. ( dropped as such )

3)
If only single column is extracted and if you are sure values are there definitely, there is no need for two queries. That could be rewritten something like

select ip from trace where trace_id >= $i and trace_id <= $i+1;

or even better if you have a running number as column in your table

select column_with_running_numbers_sequence, ip from trace;

use the above query to fetch all the values in one shot
and now you should have association like
record1 ip_value1
record2 ip_value2

the approach depends upon the table design

4)
What about the commit status of your program ? Is that auto commit ?
If so , that would definitely take more time because for each and every insert statement there is going to be commit statement which will definitely bring down the performance.

Only if the inserts can be clubbed and committed purely based on the criticality of the application you can commit after ' n ' inserts

insert
autocommit
insert
autocommit
insert
autocommit

is expensive than

insert
insert
insert
explicit commit

Please post how it goes ? All the best !

Legend986 · October 30, 2007, 1:28pm

@Yogesh

Thank you so much for the advice... I'll look into what stored procedures are now... Maybe that could help me out...

@matrixmadhan

I really appreciate the time you've spent on framing that reply... Thanks a million... Probably by receiving help from people like you, even we, newbies can learn "something" Thanks again...

I'm currently making changes as you suggested in 1 and 2... I think 3 is linked to 2.

Could you kindly elaborate this one? I seem to have missed some simple sql rules... Regarding the table design, I can change it even now because I'm just testing this on 2 million values... The actual data set contains 50 million values...

This 'id' column of trace is an auto_increment... So can I write something like:

SELECT id, ip FROM trace;

Its true that I will be having all the values at once, but how am I gonna compare adjacent values then?
Usually to fetch the rows I use something like:

while($rows = mysql_fetch_assoc($result) {
//Perform operations on rows. But how will I get the previous row or next row in this loop?
}

Thats my problem...

And if I'm not asking too much, I'd like to know about your 4th point too... I've never worked with autocommit... Maybe I'm unknowingly doing that... Is my code potraying that picture by any chance? How can I disable that? Really sorry for the trouble...

matrixmadhan · October 31, 2007, 1:50pm

Yes, a sort of.

If you have the control of the table design, then its relatively easy

So what I have been trying to say is
with column1 and an auto increment column you could actually build a hash of the auto increment column and the other column_value

something like,

autoincrement other_column
1 10
2 30
3 5

with this, indexing the column and comparing would become straight forward and the retrieval of the values also would be only in o(1)

do you have any explicit commit statements in your code ? If so, we could be sure we are overriding autocommit and explicitly we control the order and frequency in which the records need to be committed.

With the code segment that you had posted, I don't see any explicit commit statement being used, so I guess it should be " autocommit "

Try changing that, it will definitely improve the performance level am sure of that.

Legend986 · November 1, 2007, 11:15pm

I'm really indebted to you... Your ideas triggered me to actually learn so many new things... The culprit was in the trace table. When i was querying that table, it was returning many results or should I say duplicate results... And that was the reason it was slowing down everything... The way I figured this out is that I put a couple of echo statements in my script and observed where the program was slowing down... And that made the trick...!

I have one more doubt: Lets take the structure:
asn1 | ip1 | asn2 | ip2

All are integer fields... I want to actually delete all the duplicate rows in the table.

11 | 12121 | 12 | 23232
34 | 21231 | 32 | 12312
11 | 12121 | 12 | 23232
32 | 23333 | 55 | 22323
11 | 12121 | 12 | 23232

Then I should get something like:
11 | 12121 | 12 | 23232
34 | 21231 | 32 | 12312
32 | 23333 | 55 | 22323

into another table or anyway its possible. I think DISTINCT can do this but don't know if its possible with my table. I'm currently doing it as

CREATE TABLE temp AS SELECT * FROM br GROUP BY ip1;

Any suggestions?

matrixmadhan · November 1, 2007, 11:34pm

So am I to many No problem about that

To delete duplicate rows from the table, you could try something like,

delete from table1 where rowid not in ( select min(rowid) from table1 group by column1 );

what about the multiple insert statements and a commit following that ?
Did that improve performance ?

Legend986 · November 2, 2007, 1:40am

Actually, I still didn't specifically commit anything... I just used a single insert statement in a loop. Other than that I haven't done anything special and yeah I improved the queries too... And regarding your suggestion, what do I do when there's no rowid in my table?

matrixmadhan · November 2, 2007, 1:52am

Am sorry, I don't understand your statement.

Are you using
explicit commit

begin transaction
insert
insert
insert
commit

or implicit commit

insert ( auto commit )
insert ( auto commit )
insert ( auto commit )

I suppose you are using Oracle DB, rowid is not a column its a pseudo column which uniquely identifies a record in a table

Legend986 · November 2, 2007, 1:57am

Sorry that was a grammar typo... Corrected now... I'm doing it in the following way:

for($i=2;$i<2749750;$i++) {
	if($i == 2) {
		//This is the first row. So fetch it as usual and break out of the loop
		$sql_prev = "SELECT ip,gid FROM trace WHERE ID=".($i-1);
		$result_prev = mysql_query($sql_prev) or die(mysql_error());
		$row_prev = mysql_fetch_assoc($result_prev);
		continue;
	}
	//If $i is not 2 then we fetched the previous row. So lets compare now
	$sql_cur = "SELECT ip,gid FROM trace WHERE ID=".$i;
	$result_cur = mysql_query($sql_cur);
	$row_cur = mysql_fetch_assoc($result_cur);
	
		$sql_prev_asn = "SELECT asn FROM asn_number WHERE ip_address=$row_prev[ip] LIMIT 1";
		$result_prev_asn = mysql_query($sql_prev_asn);
		$row_prev_asn = mysql_fetch_assoc($result_prev_asn);
	
		$sql_cur_asn = "SELECT asn FROM asn_number WHERE ip_address=$row_cur[ip] LIMIT 1";
		$result_cur_asn = mysql_query($sql_cur_asn);
		$row_cur_asn = mysql_fetch_assoc($result_cur_asn);
		
		if(($row_prev_asn['asn'] != $row_cur_asn['asn']) && ($row_prev['gid'] == $row_cur['gid'])) {
	                $sql = "INSERT INTO br(asn1,ip1,asn2,ip2) VALUES('$row_prev_asn[asn]','$row_prev[ip]','$row_cur_asn[asn]','$row_cur[ip]')";
	                $result = mysql_query($sql) or die(mysql_error());
	}
	$row_prev = $row_cur;
	
}

So, I'm not using any explicit commit statements... I'm writing this code in PHP. And well, I'm not using Oracle. I'm using MySQL...

matrixmadhan · November 2, 2007, 2:01am

Oops! I didn't check the title

that you are using MySQL

Will check the equivalent of rowid in MySql and update !

Sorry for the confusion !

#########

That's pretty much you are using commit for every transaction which will drastically bring down the performance.

Combine the 'n' number of inserts in a transaction and the commit a single transaction alone, that would speed up.

Again, this depends upon the criticality of the application you are working, if the application is really critical keep the value of 'n' really low.

Legend986 · November 2, 2007, 9:42pm

Do you have any idea on how to get the connected components in the database? I mean, referring to the old table...i.e.

asn1 ip1 asn2 ip2

I want to get all the connected components with reference to ip1 and ip2. For example, consider the following:

asn1 ip1 asn2 ip2
1 | 121 | 2 | 12121
1 | 123 | 2 | 121
1 | 643 | 2 | 344

I should get something like:
121, 12121, 123
643, 344

Is that possible?

matrixmadhan · November 3, 2007, 4:41am

It should be definitely possible !

But I don't understand the relation you had specified ( sorry ! )

Are two tables being referred here ?

Is it the relation list that contains elements from both the tables.

Could you please explain that ?

Legend986 · November 3, 2007, 9:18am

I'm sorry about that... Great to hear that it is possible... Its from a single table... For instance, consider two columns,

Column1 | Column2
A | B
B | C
C | D
A | F
E | G
G | H

X|Y => X is connected to Y or Y is connected to X. So the output should be something like:

A B C D F

and

E G H

Hope I was able to explain this time. Please let me know if I'm not clear so that I'll write a more detailed example this time

Legend986 · November 6, 2007, 12:52am

Any suggestions please?

matrixmadhan · November 6, 2007, 1:56am

this would achieve the above,

I know its not optimized

this is just as a pointer

#! /opt/third-party/bin/perl

my $first = 1;
open(FILE, "<", "filename");

while(<FILE>) {
  chomp;
  s/ //g;
  my @arr = split(/\|/);
  $first == 1 || exists $firstHash{$arr[0]} ? pushValues(\%firstHash, \@arr, \$first) : pushValues(\%secondHash, \@arr, \$first);
}

close(FILE);

foreach my $k ( keys %firstHash ) {
  print "first set:$k\n";
}

foreach my $k ( keys %secondHash ) {
  print "second set:$k\n";
}

sub pushValues {
  my ($hash, $arr, $first) = @_;

  $$first = 0;
  foreach (@$arr) {
    $hash->{$_} = 1;
  }
}

exit 0

Legend986 · November 6, 2007, 2:26am

Thank you... I'm not well versed with perl so I'll take some time in converting the logic to php Actually I was trying something with one of the functions that mysql provided - Group_Concat... I wasn't particularly clear about getting the data in a format that I wanted... Using group_concat do you know of a way of getting the sub children too in a single line? As of now, I am able to get the following:

Parent -> A
Children -> B E F G

Parent -> B
Children-> C D

Don't bother about the children being the same... What I wanted was something like if C and D are children of B, then I want something like:

I see a hint of recursion here but I might be wrong.

The statement that I framed was:

SELECT ip2 AS Parent, GROUP_CONCAT(ip1 SEPARATOR ', ') AS Children FROM br GROUP BY ip2 LIMIT 15;

Legend986 · November 9, 2007, 1:09am

Sorry but someone please? I'm afraid I couldn't adapt your perl code in php. This database of mine contains millions of records so I don't know how to write that... Any thoughts on my previous post?