Some time ago I stumbled on the Pux routing library, which claims to implement a request router that is many orders of magnitude faster than the existing solutions. In order to accomplish this, the library makes use of a PHP extension written in C.
However, after a cursory look at the code I had the strong suspicion that the library was optimizing the wrong parts of the routing process and I could easily get better performance without resorting to a C extension. This suspicion was confirmed when I took a look at the benchmarking code and discovered that only the extremely realistic case of a single route was tested.
To investigate the issue further I wrote a small routing library: FastRoute. This library implements the dispatch process that I will describe below. To give some perspective upfront, here are the results of a small benchmark against the Pux library:
1 placeholder | Pux (no ext) | Pux (ext) | FastRoute
First route | 0.17 s | 0.13 s | 0.14 s
Last route | 2.51 s | 1.20 s | 0.49 s
Unknown route | 2.34 s | 1.10 s | 0.34 s
9 placeholders | Pux (no ext) | Pux (ext) | FastRoute
First route | 0.22 s | 0.19 s | 0.20 s
Last route | 2.65 s | 1.78 s | 0.59 s
Unknown route | 2.50 s | 1.49 s | 0.40 s
This micro benchmark makes use of one hundred routes and then dispatches against the first of those routes (best case), the last one (worst case) and an altogether unknown route. This is done in two variations: Once using routes containing only a single placeholder and then using routes with nine placeholders. The whole process was obviously iterated a few thousand times.
Before getting to the actual content, let me clarify one last point: While on the surface this is an article about routing, what I’m really talking about are regular expression based dispatch processes in general. In many ways this is a continuation of my post on [lexing performance in PHP][lexing_performance].
The routing problem
To make sure we’re on the same page, lets first define what “routing” refers to. In the most practical form, it is the process of taking a set of route definitions in a form similar to the following
$r->addRoute('GET', '/user/{name}/{id:\d+}', 'handler0');
$r->addRoute('GET', '/user/{id:\d+}', 'handler1');
$r->addRoute('GET', '/user/{name}', 'handler2');
…and then dispatch a URI based on them:
$d->dispatch('GET', '/user/nikic/42');
// => provides 'handler0' and ['name' => 'nikic', 'id' => '42']
To bring this to a more abstract level, we’ll dispense with the HTTP methods and any specific format for route definitions. The only thing I’ll be considering in this post is the dispatch stage - how the routes are parsed or the dispatcher data is generated will not be covered.
So, what is the slow part of the routing process? In a massively overengineered system it is likely the general overhead of instantiating a few dozen objects and calling a few hundred methods. Pux does a great job at reducing this overhead. However, at a more fundamental level, the slow part of the dispatch process is sequentially going through a list of dozens or hundreds or thousands of route expressions and matching them against the provided URI. How to make this fast (or at least faster) is the topic of this post.
Combined regular expressions
The basic idea for optimizing this kind of problem is to av
Truncated by Planet PHP, read more at the original (another 27008 bytes)
more
{ 0 comments... » Fast request routing using regular expressions read them below or add one }
Post a Comment