Optimising MySQL queries – JOIN vs IN (subquery)

When searching for some advice relating to an SQL query I was writing recently, I read some advice that you should try to use JOINs rather than IN with a subquery. I modified the particular query I was working on to use a JOIN instead of a subquery, and lo and behold, it was much faster.

So I decided to try and rewrite some other queries I’d written using IN with a subquery. Re-written to use JOINs instead, they should be much faster. But rather than just blindly rewriting the queries, I decided to run a few tests.

Continue reading

Space separated rel values – HTML vs. ATOM

Just a quick note regarding an issue I came across today. HTML allows space separated rel values to indicate link types. I was using this to indicate a URL was both self and canonical, i.e.

<link rel="self canonical" href="http://www.xoogu.com/2013/space-separated-rel-values-html-vs-atom/" />

I was using the same code in both the HTML and ATOM feed versions of my site, but it turns out that ATOM does not allow space separated rel values. So for ATOM you would require two links, like:

<link rel="self" href="http://www.xoogu.com/2013/space-separated-rel-values-html-vs-atom/" />
<link rel="canonical" href="http://www.xoogu.com/2013/space-separated-rel-values-html-vs-atom/" />

Or don’t include a canonical link in your ATOM feed.

Differences between windows batch and linux bash shell script syntax

Carrying on from my previous post about using scripts created in Windows on Linux, here are some comparisons to show the syntax differences between Windows batch scripts and bash scripts.

Comments

Windows batch script

REM some comment

or

:: some comment

Linux shell script

# some comment

For loop through files in directory

Windows batch script

FOR %%i IN (%1\*) DO (
REM do stuff here
)

Linux shell script

for i in "$1"/*
do
# do stuff here
done

Setting a variable to a number and then printing the variable

Windows batch script

SET /A varInteger=0
echo The value of varInteger is %varInteger%

Linux shell script

varInteger=0
echo "The value of varInteger is $varInteger"

Execute a command and store the output in a variable

Windows batch script

FOR %%x IN ('command to be executed') DO SET result=%%x

Linux shell script

result=`command to be executed`

If statement

Windows batch script

IF %somevar% GEQ %someothervar% (
REM do something
)

Linux shell script

if [ $somevar -ge $someothervar ]
then
# do something
fi

(GEQ and -ge is the greater than comparison operator, both bash and windows batch scripts will let you use a range of comparison operators.)

While loop

Windows batch script

:loop
IF !somevar! GEQ %someothervar% GOTO endofloop
REM do something that increases the value of !somevar!
GOTO loop
:endofloop

Linux shell script

while [ ! $somevar -ge $someothervar ]
do
# do something that increases the value of $somevar
done

(If you’re wondering why I used !somevar! instead of %somevar% in the windows script, this is because in a while loop you will be wanting to update the variable. See this article for more info: Windows batch scripting: EnableDelayedExpansion.)

Check last command executed OK, if not print error message and exit

Windows batch script

somecommand
IF %ERRORLEVEL% NEQ 0 (
	echo "somecommand failed, exiting"
	GOTO :EOF
)
REM the EOF label should be located at the end of the file
:EOF

Linux shell script

somecommand
if [ $? -ne 0 ]
then
	echo "somecommand failed, exiting"
	exit 1
fi

Zeropad a number to 4 digits

Windows batch script

SET zeropadded=000%somenumber%
REM Trim zeropadded to only four digits, from the end
echo "4 digit padded = %zeropadded:~-4%"

Linux shell script

zeropadded=`printf "%04d" $somenumber`
echo "4 digit padded = $zeropadded"

Using scripts created in Windows on Linux

A few weeks ago I tried converting a Windows batch script to a Linux shell (bash) script. The syntax between windows and linux shell scripts is quite different, but I also had a problem other than syntax differences.

After modifying the script to use linux syntax (e.g. proper loops instead of labels and GOTOs), I ran the script but it returned the following message:

/bin/bash^M: bad interpreter: No such file or directory

A quick google revealed the problem was the file using windows line endings (\r\n), while linux uses \n for line endings. Luckily there is a handy utility found in most linux distros that you can use to convert files containing windows line endings to linux line endings.

In Ubuntu I used fromdos, though other distros might use dos2unix. In terminal just type

fromdos myscriptfromwindows.sh

Easy, eh?

If you want to convert a file from linux to windows line endings instead, you can just use todos or unix2dos.

Selecting distinct merged results from multiple tables in MySQL

I had an issue recently where I needed to get a list of keywords for one of my websites. The website uses two databases, one for the blog, and a different one for the main part of the site. So I wanted to select the unique (distinct) keywords from the keyword tables on both databases.

This is the query I ended up with, which I will explain in a moment:

SELECT `name` COLLATE utf8_unicode_ci FROM `db_1`.`tags_table` UNION SELECT `Subject` COLLATE utf8_unicode_ci FROM `db_2`.`keywords_table` INTO OUTFILE '/tmp/tags.txt';

So, to start with, we need to get the results from both tables merged together. I did this by using two queries joined by a UNION.

SELECT `name` FROM `tags_table` UNION SELECT `Subject` FROM `keywords_table`

Because the tables are in different databases, we need to prefix the table name followed by a dot (.) when specifying the table to retrieve the records from. E.g. db_1.tags_table selects from the table tags_table in the database db_1. You will also need to be signed into mysql as a user that can access both databases (e.g. the root user) if you want to query across multiple databases.

SELECT `name` FROM `db_1`.`tags_table` UNION SELECT `Subject` FROM `db_2`.`keywords_table`

The two tables I needed to select from had different collations. So I also needed to specify the collation in the query to avoid an Illegal mix of collations error.

SELECT `name` COLLATE utf8_unicode_ci FROM `db_1`.`tags_table` UNION SELECT `Subject` COLLATE utf8_unicode_ci FROM `db_2`.`keywords_table`

The default behaviour of UNION is that duplicate rows are removed from the result. So there is no need to use UNION DISTINCT (though you can if you want). If you do want duplicate rows, use UNION ALL.

All that’s left to do is to get the result into a text file. To do this use INTO OUTFILE at the end of the query, followed by the filename the results should be written to. The file must not already exist, and the mysql user must have write permission for the directory where you want to save the folder. The easiest thing to do is to write the file to the /tmp directory, and then you can move the file from there to wherever you want manually.

SELECT `name` COLLATE utf8_unicode_ci FROM `db_1`.`tags_table` UNION SELECT `Subject` COLLATE utf8_unicode_ci FROM `db_2`.`keywords_table` INTO OUTFILE '/tmp/tags.txt';

How to set up a W3TC mirror CDN for WP multisite with domain mapping

WordPress Multisite with the domain mapping plugin is a good way to run multiple wordpress sites. It means you don’t have to worry about keeping WordPress and plugins up to date for each individual site.

The W3 Total Cache plugin is one of the most popular wordpress plugins for speeding up wordpess websites. One of the things it offers is serving static content from a CDN (Content Delivery Network). You don’t have to use a real CDN, but can instead a generic mirror CDN.

A generic mirror CDN doesn’t have all the benefits of a real CDN. However, because you use a different domain (or subdomain) for the CDN address, it does solve the problem of cookies being sent for static content requests. And unlike a real CDN, you don’t need to pay anything extra for it.

Contents

Setting up a static mirror

To start off, you’ll need to create a subdomain for your CDN. Using this site as an example, the main address is www.xoogu.com, so I would create a subdomain of static1.xoogu.com for use as the CDN. If your webhost uses cpanel, look for ‘Subdomains’ in the ‘Domains’ section.

When adding your subdomain, ensure that the document root is the same as the document root for the main website address.

If you use a main website address that is not prefixed by www. (or anything else), then you will need to use a completely different domain for the CDN. e.g. if my main website address was xoogu.com (no www.), using static1.xoogu.com as a mirror CDN would be pointless.

This is because cookies applied to the root domain are applied to all subdomains as well. So it is a good idea to always use the www. subdomain as your main website address to avoid this issue.

After setting up your static subdomain, you’ll need to edit the .htaccess file in your wordpress folder. Add the following:

#Set an environment variable if it is a static subdomain
SetEnvIfNoCase HOST ^static STATIC_DOMAIN
#Set long expires headers for static subdomains
Header set Cache-Control "max-age=29030400, public" env=STATIC_DOMAIN

What this does is it first sets an apache environment variable called ‘STATIC_DOMAIN’ if the domain starts with ‘static’.

Then the Header line sends a long expires cache header, but only if the ‘STATIC_DOMAIN’ variable is true (i.e. for all requests to domains / subdomains starting with ‘static’).

On my wordpress .htaccess I have the following as well:

#Rewrite any non existant files on www. subdomain to index.php
RewriteRule ^index\.php$ - [L]
RewriteCond %{ENV:STATIC_DOMAIN} !=1
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

#Serve 404 to anyone requesting PHP files from a static subdomain
RewriteCond %{ENV:STATIC_DOMAIN} =1
RewriteRule \.php$ non-existant-file [L]

Now in the W3TC options in the wordpress admin area, you can select to use a CDN and choose Generic Mirror. W3TC will automatically rewrite all your urls pointing to static content to point at your static subdomain instead. For example, if I included an image in this post with an src of /wp-content/uploads/some-image.jpg, W3TC would change that so when the page is served, the src points to http://static1.xoogu.com/wp-content/uploads/some-image.jpg.

Stop requests to /files/* being rewritten to ms-files.php

However, we’re not done quite yet as wordpress multisite usually serves static files via PHP. Yes, it is madness. Thankfully a ticket is on the wordpress trac for fixing this, with a milestone of version 3.4.

In the meantime, you’ll need to fix this yourself. First check your .htaccess and look for a line like

RewriteRule ^([_0-9a-zA-Z-]+/)?files/(.+) wp-includes/ms-files.php?file=$2 [L]

This is the line that causes static files (with the address containing /files/) to be served by PHP. If you know your site doesn’t use /files/ for anything, then you don’t need to do any of the following. Otherwise, comment that line out by adding a pound sign (#) in front of it:

#RewriteRule ^([_0-9a-zA-Z-]+/)?files/(.+) wp-includes/ms-files.php?file=$2 [L]

Now you need to find out the ids of each of your blogs. Look at the database for your wordpress site (e.g. using phpMyAdmin) and find the table called wp_blogs. Look at the blog_id column and make a note of the blog_ids and matching domains.

Map blog_ids using .htaccess

Unless you can setup a rewrite map, you’ll need to manually add a line to your .htaccess file for each domain:

SetEnvIfNoCase HOST xoogu.com$ BLOG_ID=1
SetEnvIfNoCase HOST some-other-domain.com$ BLOG_ID=2
SetEnvIfNoCase HOST another-domain.com$ BLOG_ID=3
RewriteRule ^files/ wp-content/blogs.dir/%{ENV:BLOG_ID}%{REQUEST_URI} [L]

The SetEnvIfNoCase lines map the domains to the wordpress blog_ids. Then the rewrite rule rewrites the request to correct directory.

Map blog_ids using a rewrite map

If your host allows rewrite maps, or you can have access to configure apache, then you can do the following instead:

Create a plain text file called domainToBlogIdMap.txt and add your domains. As far as I’m aware, apache only does exact matches for this, so you need to include both www. and static subdomains.

www.xoogu.com 1
static1.xoogu.com 1
www.some-other-domain.com 2
static1.some-other-domain.com 2
www.another-domain.com 3
static1.another-domain.com 3

In your virtualhost config for the site, add the rewrite map

RewriteMap domainToBlogIdMap txt:/home/xoogu/path/to/domainToBlogIdMap.txt

RewriteRule ^files/ wp-content/blogs.dir/${domainToBlogIdMap:%{HTTP_HOST}}%{REQUEST_URI} [L]

Setting up a static mirror and mapping blog_ids in Nginx

If you’re using nginx you can do the following:

map  $http_host  $name  {
  hostnames;
 
  *.xoogu.com     1;
  *.some-other-domain.com       2;
  *.another-domain.com			3;
  
}


server {
	listen		80;
	server_name 	static1.xoogu.com static1.some-other-domain.com another-domain.com;
	error_log  logs/wpms-error.log warn;
	access_log  logs/wpms-access.log  main;
	
	root  /home/xoogu/path/to/wordpress;
	
	location ~* \.php$ {
		deny all;
	}

	location / {
		try_files       $uri /wp-content/blogs.dir/$name$uri =404;
		expires         max;
		access_log      off;
	    }
}

In the first section we set up our rewrite map to map the domains to the wordpress blog_ids. Then in the second section we set up our static subdomains. Set the root to the same as the main site.

Then we have one location block to prevent people accessing php files via the static subdomain. The other location block serves the requested file if it exists, otherwise it tries to rewrite the request to the correct blog_id directory. If the file doesn’t exist there, it will serve a 404 error.

.htaccess rule to prompt file download

Here’s a quick .htaccess trick you can use to make a user’s browser prompt them to download an image rather than the image just opening in the browser.

RewriteEngine On
RewriteCond %{QUERY_STRING} =download
RewriteRule ([^/]*)$ - [L,E=download:$1]
Header onsuccess set Content-disposition "attachment; filename=%{download}e" env=download

What this does is that any links to images (or any other file) will be handled normally, but if you add ‘?download’ to the end of your link, the user will be prompted to download the file instead, e.g.

Normal link <a href="/filepath/some-image.jpg" >Link</a>
Download link <a href="/filepath/some-image.jpg?download" >Link</a>

Here’s an explanation of the rule:

  • RewriteEngine On We need to use a rewrite rule for checking the query string is ‘download’, so we need to make sure the RewriteEngine is switched on for this to work.

  • RewriteCond %{QUERY_STRING} =download We check the query string exactly matches ‘download’.

  • RewriteRule ([^/]*)$ The first part of the rewrite rule matches anything that is not a forward slash up to the end of the path, so this gives us the filename. e.g. for /path/to/some/file.jpg the match would be file.jpg. By wrapping the match in brackets we store the match in the variable $1.

  • - The dash that comes next means don’t rewrite the path. We are only using the rewrite condition and rule to check the query string contains ‘download’ and to capture the filename, we don’t want to actually rewrite anything.

  • [L,E=download:$1] The L flag means this is the last rewrite rule, the E flag is used to create an environment variable called ‘download’, and set its value to the filename, which we captured and stored in $1.

  • Header onsuccess To get the file download prompt we need to send a header. We only want to do this if the file was found okay and there weren’t any errors though, so we use onsuccess.

  • set Content-disposition "attachment; filename=%{download}e" The Content-disposition header is what makes the user’s browser prompt for download rather than displaying the file in the browser. We need to include the filename the file should be saved as, so we get this from the download environment variable we set in the rewrite rule.

  • env=download This means the Header will only be sent if the download environment variable is set. Since we only set the variable as part of our rewrite rule when the query string contains ‘download’, the header won’t be sent for any normal requests, only those where the query string is ‘download’.

Of course, you don’t have to use the query string ‘download’, you can use whatever you want, just amend the .htaccess rules accordingly.

Security concern

When using this rule, you must make sure that you either:

  • Place the .htaccess file containing the rule in a directory where you don’t mind users downloading any of the files
  • Or modify the rule so that the RewriteRule will only match the types of files that you don’t mind users downloading e.g. RewriteRule ([^/]*\.jpg)$ - [L,E=download:$1] will only match .jpg files.

Otherwise, if you placed this rule in an .htaccess file at the root of your site, a hacker could use the ?download query string to download any file at all. For example, on a wordpress website they could access /wp-config.php?download to download your wp-config.php file. They’d then have access to your database login details and could wreak havoc on your website.

So be sensible about what files you allow people to download. I use this rule on my wordpress site in the uploads directory. This only contains static files that I don’t mind people downloading.

Alternative method

An alternative to this method is using a scripting language, such as PHP to deal with file downloads. You would write a PHP script that takes the file as an input, and then sends the correct headers and the file content. However, using .htaccess instead means that your server doesn’t have to start up a PHP (or other scripting engine) process, saving time and memory, and leaving PHP (or whatever) to deal with your dynamic page requests.

If you think there’s anything wrong with the above, you have any suggestions for improvements, or comments, please leave a comment below.