Best match query

yahyaaa · September 10, 2016, 8:52am

Hi Forum,

I have the following table, around 60k records .. table name is "destinations"

BG,United Kingdom Mobile,004474572,0.20441,O 
BG,United Kingdom Mobile,004479,0.20441,O 
BG,United Kingdom Mobile,004479932,0.2222,O 
BG,United Kingdom Mobile,00447993,0.20,O 
BG,United Kingdom Mobile,004477,0.20441,O 
BG,United Kingdom Mobile,00447620,0.20441,O 
BG,United Kingdom Mobile,0044758,0.20441,O 
BG,United Kingdom Mobile,004475897,0.20441,O 
. 
. 
.

select * from destinations where prefix like '0044799326542%'

will not get any results since 0044799326542 does not exist,

what Im trying to achieve is,

if 0044799326542 is not found, try 004479932654,
select * from destinations where prefix like '004479932654%' -- no match

if no match, try 00447993265
select * from destinations where prefix like '00447993265%' -- no match

if no match, try 0044799326
select * from destinations where prefix like '0044799326%' -- no match

if no match, try 004479932
select * from destinations where prefix like '004479932%' -- match
BG,United Kingdom Mobile,004479932,0.2222,O

how to achieve this without having to go through all iterations ?

Thanks

joker · September 10, 2016, 9:29am

Hi,

maybe you create a function which figures out the count of equal numbers from the left. Let it for example be called: " get_equal_numbers(col_number,my_number) " See here fore documentation: CREATE FUNCTION examples

Then you may create a query like this:

SELECT col_number,rate,get_equal_numbers(col_number,my_number) AS equals 
FROM destinations
WHERE equals >= 1 
ORDER BY equals DESC

I'm not sure if this performes well.

The choice I normally would take is too read the whole table into an array and use a custom search function, which does the same as above.

Both solutions are not that efficient - especially when you have a lot of lookups to accomplish.

For speed and improving search effiency, I probably would build a hash like this(example in ruby-code):

#!/usr/bin/ruby

def get_routing_target(number)
   # number is a string
   routing_table = [
     "001" => "data1",
     "0012 => "data2",
     "00129" => "data3" ]
   for current_length in number.length .. 1
        if routing_table[number[0,current_length]] then
            return routing_table[number[0,current_length]
        end
   end
end

The last idea does not allow to have multiple routing targets for the same number. So this needs a bit adjustment to enable that("data" may be implemented as an array of multiple routing targets).

In general I think this may be a performance critical issue, which may cause very high load - depending on the number of requests that will be processed - on your server(DB or application server) if not designed well. So maybe it is better to have a program running as daemon, keeping the complete routing table in memory and reloading from time to time instead of reading it for every request.

UPDATE-1:

I tried it and created some test data(100k Records, with index on the number) and a mysql function:

DROP FUNCTION IF EXISTS equal_chars_from_left;
DELIMITER $$
CREATE FUNCTION equal_chars_from_left(wanted TEXT, current_field TEXT)
  RETURNS INT
BEGIN
  DECLARE current_length INT;
  SET current_length = LENGTH(wanted);

  count_loop: LOOP
    IF SUBSTRING(current_field,1,current_length) = SUBSTRING(wanted,1,current_length) THEN
        LEAVE count_loop;
    END IF;
    SET current_length = current_length - 1;

    IF current_length = 0 THEN
      LEAVE count_loop;
    END IF;
    
  END LOOP;
  RETURN current_length;
END;
$$
DELIMITER ;

The performance is as bad as expected.

mysql> select destination,rate,equal_chars_from_left("1212313",destination) as equals from test order by equals desc limit 10;
+----------------+------+--------+
| destination    | rate | equals |
+----------------+------+--------+
| 1212313893101  | 0.97 |      7 |
| 12123105591    | 0.09 |      6 |
| 12123          | 0.98 |      5 |
| 12123235501917 | 0.07 |      5 |
| 121283339945   | 0.19 |      4 |
| 121243075      | 0.53 |      4 |
| 121251         | 0.69 |      4 |
| 1212626040     | 0.23 |      4 |
| 1212914454156  | 0.72 |      4 |
| 121251042914   | 0.46 |      4 |
+----------------+------+--------+
10 rows in set (6,35 sec)

UPDATE 2:

This is a bit faster:

SELECT destination,equal_chars_from_left('1212313',destination) as equals from test where destination like '1212313%'  UNION
SELECT destination,equal_chars_from_left('1212313',destination) as equals from test where destination like '121231%'  UNION
SELECT destination,equal_chars_from_left('1212313',destination) as equals from test where destination like '12123%'  UNION
SELECT destination,equal_chars_from_left('1212313',destination) as equals from test where destination like '1212%'  UNION
SELECT destination,equal_chars_from_left('1212313',destination) as equals from test where destination like '121%'  UNION
SELECT destination,equal_chars_from_left('1212313',destination) as equals from test where destination like '12%'  UNION
SELECT destination,equal_chars_from_left('1212313',destination) as equals from test where destination like '1%' 
ORDER BY equals DESC LIMIT 10;

+----------------+--------+
| destination    | equals |
+----------------+--------+
| 1212313893101  |      7 |
| 12123105591    |      6 |
| 12123          |      5 |
| 12123235501917 |      5 |
| 12120247005    |      4 |
| 121283339945   |      4 |
| 12125241424986 |      4 |
| 1212426494209  |      4 |
| 1212116831     |      4 |
| 121284268      |      4 |
+----------------+--------+
10 rows in set (2,10 sec)

yahyaaa · September 10, 2016, 5:26pm

Big thankss....

joker · September 10, 2016, 5:52pm

Hmm. UNION is unnecessarily complicated. A simple OR does it too.

Here 's some PHP-code which takes the part of the MySQL-function, and is a lot faster:

$wanted=1212313;

function equal_chars_from_left($wanted,$current) {

        $len_w=strlen($wanted);
        $len_c=strlen($current);
        $max_len=($len_c<$len_w)?$len_c:$len_w;
        for($charcount=$max_len;$charcount>0;$charcount--) {
                if(substr($wanted,0,$charcount)==substr($current,0,$charcount)) {
                        return $charcount;
                }
        }
        return 0;
}

function sort_by_equals($a, $b)
{
        return (
                ($a[1]==$b[1])  ? 0 
                                : ( ( $a[1] > $b[1] ) ? -1 : 1 )
               );
}

/*
        SELECT destination FROM test
        WHERE (
                destination LIKE '1%'           OR
                destination LIKE '12%'          OR
                destination LIKE '121%'         OR
                destination LIKE '1212%'        OR
                destination LIKE '12123%'       OR
                destination LIKE '121231%'      OR
                destination LIKE '1212313%' 
         );

*/

function create_query ($wanted) {
        $query = "
        SELECT destination FROM test
        WHERE (
        ";
        $len = strlen($wanted);
        for($i=1;$i<=$len;$i++) {
                $or = $i<$len ? "OR" : "";
                $query.="destination LIKE '".substr($wanted,0,$i)."%' $or\n";
        }
        $query.=" );\n";
        return $query;
}


$mysqli = new mysqli("localhost", "root", "blub123", "test");
$res    = $mysqli->query(create_query($wanted));

$data = [];
while ($row = $res->fetch_assoc()) {
        array_push($data,[ $row["destination"], equal_chars_from_left($wanted,$row["destination"])]);
}
usort($data,"sort_by_equals");

for($i=0;$i<10;$i++) { printf("%-15s %2d\n", $data[$i][0],$data[$i][1]); }

1212313893101    7
12123105591      6
12123            5
12123235501917   5
121216           4
121251042914     4
12129164         4
1212914454156    4
121291002        4
121284268        4

real    0m0.269s
user    0m0.208s
sys     0m0.016s

UPDATE

PHP-Variant b) is even faster

<?PHP

$wanted=$argv[1];

function equal_chars_from_left($wanted,$current) {

        $len_w=strlen($wanted);
        $len_c=strlen($current);
        $max_len=($len_c<$len_w)?$len_c:$len_w;
        for($charcount=$max_len;$charcount>0;$charcount--) {
                if(substr($wanted,0,$charcount)==substr($current,0,$charcount)) {
                        return $charcount;
                }
        }
        return 0;
}

function create_query ($wanted) {
        $query = "
        SELECT destination FROM test
        WHERE (
        ";
        $len = strlen($wanted);
        for($i=1;$i<=$len;$i++) {
                $or = $i<$len ? "OR" : "";
                $query.="destination LIKE '".substr($wanted,0,$i)."%' $or\n";
        }
        $query.=" );\n";
        return $query;
}

$mysqli = new mysqli("localhost", "root", "blub123", "test");
$res    = $mysqli->query(create_query($wanted));

$data = [];
for($i=1;$i<=20;$i++) { $data["$i"]=[]; }
while ($row = $res->fetch_assoc()) {
        $eq = equal_chars_from_left($wanted,$row["destination"]);
        if($eq>0) {
        array_push($data["$eq"],$row["destination"]);
        }
}

for($z=0,$i=20;$i>0;$i--) {
        for($j=0;$j<count($data[$i]);$j++,$z++) {
                if($z>9) break(2);
                printf("%-20s %2d\n",$data[$i][$j],$i);
        }
}
?>

1212313893101         7
12123105591           6
12123                 5
12123235501917        5
12120247005           4
1212116831            4
1212120056008         4
121216                4
1212232               4
121242                4

real    0m0.132s
user    0m0.120s
sys     0m0.008s

yahyaaa · September 11, 2016, 4:38am

I tried the below precedure... but keep getting errors...

delimiter //

CREATE FUNCTION BestMatch (msisdn INT)
  RETURNS FLOAT(9,3)

BEGIN

    DECLARE var INT(6);
    DECLARE s FLOAT(9,3);

    DECLARE msisdn_substr3 VARCHAR(16);
    DECLARE msisdn_substr4 VARCHAR(16);
    DECLARE msisdn_substr5 VARCHAR(16);
    DECLARE msisdn_substr6 VARCHAR(16);
    DECLARE msisdn_substr7 VARCHAR(16);
    DECLARE msisdn_substr8 VARCHAR(16);
    DECLARE msisdn_substr9 VARCHAR(16);
    DECLARE msisdn_substr10 VARCHAR(16);
    DECLARE msisdn_substr11 VARCHAR(16);
    DECLARE msisdn_substr12 VARCHAR(16);
    DECLARE msisdn_substr13 VARCHAR(16);
    DECLARE msisdn_substr14 VARCHAR(16);

SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr8');

IF var = 1 THEN
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr8');
END IF ;

IF var > 1 THEN
    SET msisdn_substr9 = SUBSTRING ( msisdn ,1 , 9 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr9');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr9');
    end if;

    if var = 0 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr8');
    end if;

    SET msisdn_substr10 = SUBSTRING ( msisdn ,1 , 10 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr10');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr10');
    end if;

    if var = 0 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr9');
    end if;

    SET msisdn_substr11 = SUBSTRING ( msisdn ,1 , 11 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr11');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr11');
    end if;

    if var = 0 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr10');
    end if;

    SET msisdn_substr12 = SUBSTRING ( msisdn ,1 , 12 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr12');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr12');
    end if;

    if var = 0 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr11');
    end if;

    SET msisdn_substr13 = SUBSTRING ( msisdn ,1 , 13 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr13');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr13');
    end if;

    if var = 0 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr12');
    end if;

    SET msisdn_substr14 = SUBSTRING ( msisdn ,1 , 14 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr14');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr14');
    end if;

    if var = 0 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr13');
    end if;

END IF;

IF var = 0 THEN
    SET msisdn_substr7 = SUBSTRING ( msisdn ,1 , 7 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr7');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr10');
    end if;

    SET msisdn_substr6 = SUBSTRING ( msisdn ,1 , 6 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr6');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr6');
    end if;

    SET msisdn_substr5 = SUBSTRING ( msisdn ,1 , 5 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr5');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr5');
    end if;

    SET msisdn_substr4 = SUBSTRING ( msisdn ,1 , 4 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr4');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr4');
    end if;

    SET msisdn_substr3 = SUBSTRING ( msisdn ,1 , 3 )  ;
    SET var = (SELECT count(*) FROM rates WHERE prefix = 'msisdn_substr3');

    if var = 1 then
    SET s = (SELECT price FROM rates WHERE prefix = 'msisdn_substr3');
    end if;

END IF;

    RETURN s;

  END //

DELIMITER ;

can you help ?

one IF condition is met, I want to set the variable s and then leave procedure..
how can i exit the procedure once s varable is set ? IF condition is met
Thanks

joker · September 11, 2016, 11:11am

Some comments:

If you try to write MySQL-Functions, I suggest starting from a very minimal function and extend bit by bit and checking if the results are as expected at every step. So you can catch up for single syntax/programming errors which is easier than to be faced with all of them at once
The mysql function syntax seems to me a bit limited in comparison with every real programming language, so I myself would never write anything but really needed snippets(if at all) as mysql functions.
In did not review your code in whole. But as you saw from my small 20 Line MySQL-Function: It already used 6 Seconds to complete, saying which huge amount of computing resources it needed. E. g. it was a very inefficient way to get it done. When I look at your code, you perform a lot of queries which should be all be very fast though. But I just wanted to note that some queries with big data sets can very quickly create huge loads of computing resources needed to get it done.
In Backup MySQL-Functions are not saved by default with the mysqldump command. You need to keep that in mind and
[list=1]
change your backup procedure, to have them included -or-
tell your system administrator to have that done in case you are not responsible for backup -or-
save your functions as textfiles outside the mysql database to be sure to have them backupped up
[/list]

...and since my previous snippets where a whole lot more than you wanted, this search can be done very quick here.

UPDATE

Mmh. I detected that my solutions were not exactly what you wanted. But it should help nevertheless.

joker · September 18, 2016, 10:36am

The solution as a mysql function.

GOOD Very fast execution
BAD Debugging is far more difficult
BAD Programming Environment is inferior and limited compared to every other Programming language(including bash)
CAUTION Be sure to have your backups included user defined functions

DELIMITER $$

DROP FUNCTION IF EXISTS get_prefix;
DROP FUNCTION IF EXISTS get_rate;

/*

Usage from the os shell:

 echo "select get_prefix('$YOUR_NUMBER'),get_rate(get_prefix('$YOUR_NUMBER'));"  \
      | mysql -N -uMYSQL_USER -pMYSQL_PASSWORD DATABASE_NAME 

*/

CREATE FUNCTION get_prefix(wanted TEXT)
  RETURNS TEXT
BEGIN
  DECLARE current_length   INT;
  DECLARE prefix           TEXT;
  SET current_length       = LENGTH(wanted);

  count_loop: LOOP

    SET prefix = (
    SELECT destination
    FROM test 
    WHERE destination = SUBSTRING(wanted,1,current_length) 
    LIMIT 1);

    IF LENGTH(prefix) > 0 THEN LEAVE count_loop; END IF;
    SET current_length = current_length - 1;

    IF current_length = 0 THEN LEAVE count_loop; END IF;
  END LOOP;
  RETURN prefix;
    
END;

CREATE FUNCTION get_rate(prefix TEXT)
  RETURNS TEXT
BEGIN
  RETURN (SELECT rate FROM test WHERE destination = prefix LIMIT 1);
END;
$$
DELIMITER ;

The solution without a mysql function in bash:

GOOD Easy to write, Easy to Debug
GOOD Full Featured Bash Programming Environent
BAD A little slower than the mysql solution(~50-100ms here)

#!/bin/bash

# Usage: thisprog.sh <YOURPHONENUMBER>

target=$1 

function mysql_exec { 
    mysql -uroot -pblub123 --skip-column-names test "${@}"; }

for((c=${#target};c>0;c--)) ; do
    prefix=$(mysql_exec -e \
        "SELECT destination FROM test WHERE destination = '${target:0:$c}' LIMIT 1")
    [ -n "$prefix" ] && break
done

rate=$(mysql_exec -e "SELECT rate FROM test WHERE destination = '$prefix' LIMIT 1")
echo "$target $prefix $rate"

jameswarner · November 10, 2016, 4:29am

Here one of simple match query example

select 1
  from (
        select Q1
          from T1
         where C1 <= 'fxg87698x84'
         order by Q1 desc
         limit 1
       ) tmp
 where 'fxg87698x84' like concat (Q1, '%')