You can do the weirdest things with mod_rewrite: crazy regexes, load balancing in all flavors, dynamic content generation and chain all sorts of complex rules to your heart's content.
But as it turns out, it is horrible when you just want to do one simple thing. In my case, that meant indeed rewriting my URLs. But let's start at the beginning...
I'm running my homepage on uberspace.de, a nice little provider with an Apache webserver and fastCGI. For FastCGI you store your own programs in their own directory, e.g. /fcgi-bin/myprogram.cgi, and Apache delegates the work to this program. Apache needs to know which program it should execute for which URL, and it makes a sensible default assumption: The URL contains the program name. Therefore, if I wanted to access my blog, I would go to http://fmutzel.de/fcgi-bin/myprogram.cgi/blog. If I wanted to access my main page, I'd have to go to http://fmutzel.de/fcgi-bin/myprogram.cgi/. That looks a bit ugly, though.
That's where mod_rewrite comes into play. It is designed to rewrite URLs to make them look nicer. Unfortunately, it can do a ton of things and that makes it horrendously complicated. Essentially, all I wanted to do is to map all requests to my custom made script. The internet suggests to put the following in a .htaccess file to configure Apache's mod_rewrite:
This should do the trick, and it did until I had a question mark in a URL. Question marks are a weird thing in URLs stemming from the time when URLs where paths to scripts and options and the question mark was used to signify "okay, up to here was a path to a file and the following are options", e.g. domain.com/users/whoever/blog?page=5 (nowadays, this is a bit obsolete since the folder path in the URL usually has nothing to do with the folders in the file system of the server). So, the thing you do if you don't want your question mark to be interpreted in a special way is to escape it by writing %3F instead, the same way as %20 identifies a blank space, and there's this whole mechanism of how to encode any character in URLs.
Unfortunately, that's exactly what I was doing anyway, but it still didn't work. The URL with the question mark gave me a 404 page. I checked my program and found out that the question mark simply didn't arrive at all. I wasn't sure who was responsible for eating it up - the code that I used? Apache? I googled around for a while and found nothing.
That's when I got the idea that mod_rewrite might be the culprit. At first I didn't think that could be the case, a module designed to rewrite URLs that can't rewrite URLs properly?
Turns out that is actually the case. As weird as it sounds, mod_rewrite decodes the question mark, then splits the part at the question mark and (when using QSA) re-appends a regular, un-encoded question mark when putting everything back together. There is a horrendously long bug report from 2005 in their bug tracker, which is closed in 2011 essentially because the bug report got too complicated. I'm not kidding.
It took me a while to find a solution, and I'm not very happy with it. What I've got now is this:
Also note that $1 changed to %1. This means it references the match in parentheses in the RewriteCond and not the match in the RewriteRule (which would only have everything up to the question mark).
Apache's mod_rewrite is weird. The name suggests it was invented to rewrite URLs, but it doesn't do that so easily. But then again, you can hack it to do whatever you want with the browser's HTTP request...