Just for decorating
One of the problems that I was interested in since a long time ago is “how can we deliever a big file after we authenticated the client?”

Most of the workarounds for the problem involved CGI (i.e. PHP) and cookies. Therefore, I decided to find another workaround.

The workaround I found is “Rewrite Mapping” which apache2 offers. Rewrite Mapping is used for mapping based on files or external programs, making it possible to use function-like schemes for RewriteRule.

Let’s go a little deeper. Imagine we have a file called productmap.txt and it looks like this:

1
2
3
4
5
television 993
stereo 198
fishingrod 043
basketball 418
telephone 328

Now, we want to map the routes from the first value to the second in each row. For example, if the client requests /product/stereo, we respond serving /product.php?id=198.

First, we define the Rewrite Mapping rule:

1
RewriteMap product2id "txt:/etc/apache2/productmap.txt"

Then, we use product2id on RewriteRule:

1
RewriteRule "^/product/(.*)" "/product.php?id=${product2id:$1|NOTFOUND}" [PT]

We assume here that the prods.php script knows what to do when it received an argument of id=NOTFOUND when a product is not found in the lookup map.

Now, let’s return to the main problem. Interacting with database servers such as MySQL from Apache was always a pain. A better solution is connecting to an external program (like a Python script), so we can rewrite the routes dynamically.

Fortunately, apache2 supports a mapping protocol for external programs. It’s called prg. You can read more about it on the docs, as I’m going to explain how prg help us to solve the problem.

In this case, let’s assume that we have a digital store where we sell big downloads. We want to make sure that our customers receive the files healthly and are able to pause/resume downloads and we do not want to use CGI or cookies authorization, so we use token-based paradigm.

Each customer who buys a file receives a URL like this: https://content.our-store.com/files/bigFile.zip?token={TOKEN}.

We want to authorize {TOKEN} value and deliever the file if it’s correct or deliever another file (say, an HTML file) in case that token is expired or invalid.

First, we define the rules in a non-directory scope (I defined them in a VirtualHost scope). The mapping rule is for pointing at controller.py located at files/ directory, where files are hosted.

1
2
3
4
RewriteEngine on
RewriteCond %{REQUEST_URI} ^/files # make sure the rule only works for files directory
RewriteMap controller "prg:/home/user/public_html/files/controller.py" www-data:www-data
RewriteRule ^(.*)\.zip$ "${controller:%{REQUEST_URI}?%{QUERY_STRING}}" # rewrite all zip files

The second line defines controller pointing to /home/user/public_html/files/controller.py file. The third line defines the rewriting rule and the input format that should be delievered to the program. %{REQUEST_URI} and %{QUERY_STRING} are special variables for each request and they are seperated by a question mark. So, the input looks like: /files/bigFile.zip?token=TOKEN.

Then, the script should receive the input properly and output the destination. In this case, we make sure that token is okay, if it was, we’ll strap the query string from the input and print it, if it wasn’t, we point to /files/403.html. This is our controller.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/usr/bin/env python 
import sys
from urlparse import urlparse

while sys.stdin:
try:
strLine = sys.stdin.readline().strip() ## It is very important to use strip!
uriComponents = urlparse( strLine )
if uriComponents.query == "token=correct":
print uriComponents.path
else:
print "/files/403.html"
sys.stdout.flush()
except:
print "NULL"
sys.stdout.flush()

Note that the SHEBANG and .strip() are both important, also the file must be owned by www-data:www-data as we defined it earlier in our Apache configuration.

Now, all zip files in files/ directory are delievered only if token=correct is the query string, otherwise, the response would be a nice page saying that the token is invalid. The script can be connected to a database or an external API, it’s about your creativity and the limitations of a Python script.

Conclusion

Delievering big files to clients can always be a challenge. This workaround doesn’t need cookies or CGI and supports Accept-Bytes headers well.