Script to test / benchmark SQL queries

I’m not particularly knowledgable on the subject of optimising SQL queries, so the easiest way to optimise a query for me is to write a few variations then test them against one another. To this end I’ve developed a PHP class to do the testing and benchmarking. I think that even if I was highly knowledgable about optimising queries, then I would still want to test my queries to ensure that my theory held true in practice.

For a useful benchmark you need to execute the queries using a range of data that simulates the real data the queries would be executed with. They also need to be executed in a random order and multiple times, to ensure results can be averaged and reasonably reliable. That’s what this class does, along with providing a summary of the results in CSV format.

It should be noted that this class does not set up or modify any tables for testing with – it just allows you to supply a range of data to be included within the queries themselves, such as testing with a range of different values in a WHERE clause.

If you’d like to make use of the class, you can download it here: queryTester.zip. The zip file contains 3 files:

queryTester.php
The base class (not usable by itself)
queryTester_mysqli.php
Class for use with a mysqli database object
queryTester_pdo.php
Class for use with a pdo database object

To use the class, you need to create an associative array of queries you want to test. The queries should be formatted using the PHP sprintf format for inserting parameters into the query.

You then also need to create an associative array of data for inserting into the queries. Because the class uses sprintf for inserting the data into the queries, it is important that you provide already escaped data.

An example script might look like this:

include('./dbConnect.php');
include('./queryTester.php');
include('./queryTester_pdo.php');

//data to test
$aData = array(
    'id61 limit0,24' => array(61, '0,24'),
    'id42 limit0,24' => array(42, '0,24'),
    'id21 limit0,24' => array(21, '0,24'),
    'id14 limit0,24' => array(14, '0,24'),
    'id14 limit1576,24' => array(14, '1576,24'),
    'id14 limit1576,9999' => array(14, '1576,9999')
);

//queries to test
$aQueries = array(
'orig' =>
    'SELECT images.id, images.Headline, images.filename, images.Rating, images.url_name
    FROM images
    WHERE images.id IN (SELECT img_categories.img_id FROM img_categories
        INNER JOIN categories AS node ON node.id = img_categories.categories_id,
        categories AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.id = %d)
    GROUP BY images.id
    ORDER BY images.Rating DESC, images.id DESC
    LIMIT %s',
'new1' =>
    'SELECT images.id, images.Headline, images.filename, images.Rating, images.url_name
    FROM images
    RIGHT JOIN (SELECT DISTINCT img_categories.img_id FROM img_categories
        INNER JOIN categories AS node ON node.id = img_categories.categories_id,
        categories AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.id = %d) AS img_ids ON images.id = img_ids.img_id
    GROUP BY images.id
    ORDER BY images.Rating DESC, images.id DESC
    LIMIT %s',
'new2' =>
    'SELECT images.id, images.Headline, images.filename, images.Rating, images.url_name
    FROM images,
    img_categories,
    categories AS parent,
    categories AS node
    WHERE parent.id = %d
    AND img_categories.categories_id = node.id
    AND node.lft BETWEEN parent.lft AND parent.rgt
    AND images.id = img_categories.img_id
    GROUP BY img_categories.img_id
    ORDER BY images.Rating DESC, images.id DESC
    LIMIT %s'
);

try {
    $queryTester = new queryTester_pdo($db);
    //set data and queries to test
    $queryTester->setTestData($aData)->setQueries($aQueries);
    //check our different queries return the same result sets
    $aResults = $queryTester->checkQueries();
    if (!empty($aResults['errors'])) {
        var_dump($aResults); exit();
    }
    //Test how quickly the queries execute & return results
    $aResults = $queryTester->runTests(2000);
    //var_dump($aResults);
    //save the results in a CSV file
    $queryTester->getResultsAsCSV('/path/to/results.csv');
} catch(Exception $e) {
    echo $e->getMessage();
}

When you create the queryTester object, you need to pass in a database object (either an instance of mysqli or PDO depending on your preference). You then need to set the queries and data to be tested by calling setTestData and setQueries, passing in the relevant associative array. Both these methods return the queryTester instance, and so can be chained.

An optional step is to next run the checkQueries method. This runs each of the queries and checks they all return exactly the same results. It returns an associative array with any errors listed under the ‘errors’ index. The ‘results’ index contains information on what queries gave the same or different results.

Further information can be included under the results index by passing two boolean parameters to checkQueries. The first parameter is whether to include the executed SQL statements in the results, the second parameter is whether to include the query results in the results. These options might be helpful when debugging why two queries are returning different results.

The runTests method does the work of benchmarking the queries. It takes one parameter, which is the total number of iterations to run. This is split between all queries and data tested. So if you test 2 queries with 2 different sets of data, and specify a value of 1000 when calling runTests, each query plus data combination will be run approximately 250 times (some will be run more than that, others less, but the total will be 1000).

runTests returns an associative array by query containing the total number of times, total time taken, and average time taken that the query was run with for each piece of data.

To easily compare the results, you can use the getResultsAsCSV method to get a CSV spreadsheet with the results. This method takes a single parameter, the path that the CSV file should be saved as. If you don’t provide this parameter (or pass null), then the CSV spreadsheet will be returned as the response (useful if running the script via a web browser). Alternatively, passing an empty string will return the CSV as a string.

I hope this class might be helpful to others, let me know if you have any problems or spot any mistakes I’ve made.

Posted on by xoogu, last updated

Leave a Reply