WordPress: Htaccess In Detail

0
434
Wordpress htaccess

In this article, we are going to take a look at WordPress’ .htaccess file in detail.

Before I begin, I want to say that I am in no way proficient at mod_rewrite and regular expressions. This has been a complex and difficult topic for me to wrap my head around over the years and I never really got much into nor had much need to. However, I feel it is important enough to talk about here because when it comes to WordPress development, the .htaccess file is a very important beast that needs to be understood. With that said, lets begin.

The .htaccess file is a funny file. For Windows users, its weird because its filename is an extension in the Windows world. But in the Linux world, its just a file. A special file, and one that is related to the Apache HTTP server. Apache HTTP runs with modules built with the C programming language. They are software components that can be plugged into the server to add additional functionality. One of them is the ability to rewrite URLS and redirect them. The Apache module that allows incoming URLs to be rewritten is called mod_rewrite.

If Apache mod_rewrite is installed and enabled, Apache searches for .htaccess and if it exists, takes a URL and maps it to an object that the running web application knows about. Notice that I said object. Typically this will be a file on the HTTP server but it doesn’t have to be. A URL can magically be transformed into a document that doesn’t have to exist on the server as a file. It can exist as an object in a database, for example. That there is a unique correspondence between URL and an application object that can be dynamically created is what is important here.

You as the website owner write rules based on a Perl Compatible Regular Expression (PCRE) parser. Whenever a visitor makes a HTTP request, you can grab the URL and do all sorts of funky things that make the URL act as an alias to an object. Apache mod_rewrite is a bit of an esoteric topic and quite complex in nature. You need to have a proper mindset and experience with using regular expressions and for the most part, requires a bit of training to think in this manner. As mentioned, Apache HTTP server and PHP use PCRE so its a good idea to at least be familiar with it.

WordPress’ Htaccess File

Lets take a look inside WordPress’ htaccess file to see what it is doing.

  1. <IfModule mod_rewrite.c>
  2. RewriteEngine On
  3. RewriteBase /blog/wordpress/
  4. RewriteRule ^index\.php$ - [L]
  5. RewriteCond %{REQUEST_FILENAME} !-f
  6. RewriteCond %{REQUEST_FILENAME} !-d
  7. RewriteRule . /blog/wordpress/index.php [L]
  8. </IfModule>

.htaccess is placed in the root of your website. By it being there, it overrides all mod_rewrite directives that may be specified in the server configuration (httpd.conf) and virtual host settings. These rules don’t have to be placed inside this file and for all practical purposes, this is not the fastest because .htaccess has to be searched, opened, read, and executed for every HTTP connection. Obviously, a better place for this is to store these rules in the httpd.conf file as this is read just once when Apache starts and is stored in memory. However, the reason why it exists as a file is because most users who use WordPress are running their websites on shared hosting systems where they do not have root access to the server environment.

Whenever a user agent requests a URL from your website, Apache HTTP breaks up the URL into parts.

http://www.site.com/category/title

In the above, Apache pulls out the domain name, site.com, and then looks inside the website’s root parent document folder to see if a .htaccess file exists. If it does, the .htaccess file is sent through to the mod_rewrite module parser. Now if the .htaccess file did not exist, it then looks to see if one exists in the category folder and executes the .htaccess file there. As you can sense, it is a daisy chain walk.

The <IfModule> line is an Apache directive that tests if the mod_rewrite module exists. This is done because we don’t want Apache blatantly executing mod_rewrite engine statements if it doesn’t exist causing server errors. All the rewrite rules inside this statement block gets execute in the case it is installed.

RewriteEngine On

The first statement is a switch to turn mod_rewrite on. Depending on the complexity, you can create mod_rewrite logic that has hundreds of statements in it. While this may be redundant, this is to allow you to avoid having to comment out each directive to turn them all off.

RewriteBase /blog/wordpress

The RewriteBase directive is used to set a base reference for all rewrite rules. In this example, we set the base to point to the blog/wordpress folder off our document root. Typically, when you install WordPress, it is usually just /wordpress. But in this case (and recommended) I installed WordPress in a different file folder (blog). All this does is save you a little bit of typing when writing your rewrite rules.

Ok now we get to the meat. First, lets talk about the RewriteRule directive. As you can imagine, it basically is an “output” resultant directive. Its format is as follows:

RewriteRule pattern substitution option

In our RewriteRule we see this:

RewriteRule ^index\.php - [L]

The ^ means “look for the first occurrence of the string ‘index.php'”. The backslash is a special character that means treat the next character “as-is”. In mod_rewrite, the period is special wild card character and means “match any character”. Since we do not want to use the period this way, we precede it with a backslash. The hyphen “-” means there is no substitution of text to take place and effectively we are just looking for the string ‘index.php’ without replacing. The [L] means to stop rewriting and don’t apply any more rewrite rules proceeding forward if we do find such a string.

Putting it all together (whew!) this line means if the user typed in any of the following:

http://www.site.com/index.php
http://www.site.com/foobar/index.php
http://site.com/haha/index.php
http://site.com/index.php/hahaha/ohNoitsWrong
etc.

It says to stop the rewrite process from continuing because index.php is already specified. This does not necessarily mean that the URL is valid and maps to a relevant index.php file. In the case it doesn’t, you will get a 404 error. So essentially, all we are doing here is letting all URL’s that have index.php in it to slide through the system and let normal URL processing take hold. Do not continue to the next mod_rewrite processing rules. Stop here.

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

Now that we know index.php is not in the URL, we proceed. The next two lines are conditionals that must be true before any RewriteRule can be applied. If they don’t apply, the first RewriteRule following them does not get executed. Yes, if you are used to programming in Javascript, C, Pascal, PHP, it looks weird because there is no block statements (i.e. while, if, for, etc.) in mod_rewrite rules.

– If the URL specified after being converted to a full absolute file path does NOT match a file or directory, continue to the next mod_rewrite line
– If the URL specified after being converted to a full absolute file path matches a file or directory on the server, then let Apache process that object.

Finally, the rule we all been waiting for:

RewriteRule . /blog/wordpress/index.php [L]

So after all the processing above we know:

  1. Its not an index.php file
  2. Its not a file or directory that exists

Wordpress magic

So what could it be?

Its something that isn’t physically available on the server as an object. Its a mirage. An alias for something! It’s magic! WordPress magic!

Mooo hahahahaha!

What happens is that we take the URL and send it off to the index.php file in WordPress core to go out and process. Yes, it is WordPress that now takes the URL and breaks it up to choose the template, post, page, etc. to deliver to the calling user agent.

This index.php file basically starts up WordPress as I wrote about it in a previous article.

Summary

By now, if you understood everything above, you learned a little bit about how WordPress routes by default, all URLs to its gateway file index.php. We saw that this is done through use of .htaccess, a rewriter file that has rules in it to redefine the processing to target an object in the WordPress system. You learned a little bit about the conditions, rewriting and PCRE expressions to get the job done. Most importantly, that you are aware of whats going on under the hood and that this helps “boot up” WordPress whenever a user agent hits your website.

Of Interest