Sunday, June 17th, 2012

Unique Web Services in Django

With the Internet moving heavily toward web services, and offering multiple methods to access data from a website remotely, such as apps, and desktop clients. Sometimes you would like to create a web service which only your application can access, or only a set of applications should access. Say that you also need to transfer a specific type of data set, lets say between Python applications. In Python, you can easily serialize data into a variety of formats, such as Pickle or JSON. Both of these formats should transfer over the Internet easily, and decode on the other side with minimal effort and overhead. The reason why I bring this up, is that most web standard protocols, such as OAuth require so much overhead just to perform a simple task on the server. If all your attempting to do is create a simple interface between your Django web application, and a client-side Python Desktop application, OAuth and other web standards just add extra coding and errorchecking which otherwise isn't needed for a trivial task. This article will go through serialization and data transfer between Python applications using the HTTP protocol in a non-standard, but compatible way.

This example will use Pickle, but feel free to use another serialization if you would like. Pickle allows full Python objects to be serialized and then deserialized and used on the other end of the pipe. First, we will write a simple decorator for Django views to allow us to easily write compatible services:

from django.http import HttpResponse, Http404
from django.views.decorators.csrf import csrf_exempt
import pickle

def service(f):
    @csrf_exempt
    def new_f(req, *args, **kwargs):
        if req.method != 'POST' or req.META['HTTP_USER_AGENT'] != 'Python-services/2.7':
            raise Http404
        kwargs.update(pickle.loads(req.read()))
        resp = f(req, *args, **kwargs)
        if not isinstance(resp, str):
            resp = pickle.dumps(resp)
        if not isinstance(resp, HttpResponse):
            resp = HttpResponse(resp)
        return resp
    new_f.__name__ = f.__name__
    return new_f

This decorator does quite a bit, not all of it is required and feel free to modify it as much as you please. Here is the workflow of this decorator and what it will do:

  1. Request comes in, force POST and specific HTTP_USER_AGENT to be used.
  2. Take the entire HTTPRequest.body, unpickle it, and turn it's dictionary into keywords for the view.
  3. Pass the request, any arguments, and the pickled keywords to the view.
  4. Once the view returns, check the response and convert it to a pickle and send it back to the client.

I would recommend using a secure channel when using this decorator, and use is_secure() to confirm HTTPS. This will further hide the request from any packet sniffers and hackers from knowing your secret HTTP_USER_AGENT. You can set other headers which can be checked as well. It's really up to you, you can also create another service which say, will authenticate you, and provide a special key in return which can be set in subsequent requests. The portion of this which is non-standard is that we are not sending over an urlencoded POST body, but rather POSTing serialized data directly. If the data being sent over is not a pickle, the server will respond back with a standard 500 error message, as one would expect. This is why I placed no typechecking on the request body, as I'll just let Django respond with an HTTP 500 if depickling fails. Here is a simple Python script to send data to the Service and parse the response back:

import urllib2, pickle

url = 'http://127.0.0.1:8000/srv/s1'
data = pickle.dumps({'data':'Hello World!'})
headers = {'User-Agent':'Python-services/2.7'}
req = urllib2.Request(url, data, headers)
resp = pickle.loads(urllib2.urlopen(req).read())

This amount of code compared to the implementation for OAuth, or another web standard is rather minimal and very Pythonic. The data being transferred is literally a serialized Python object. There are no 3rd-party modules required to implement this either, and with proper security precautions, it should be fairly safe to use in a production environment for one Python application to talk with another over the widely used HTTP protocol. This communication implies that the client application is trusted, as there is an exchange of pickled objects, which can be thought of insecure. If you plan on having a separate developer work on the client component, you should use JSON or check the Pickle for any unsafe executable Python code. Although an advantage to using pickle is that you can use cPickle to further increase the speeds of both applications involved. If you are needing to serialize a large amount of scientific data that needs to be transferred from a server over to your local machine for analysis, cPickle may increase the performance of the transaction.

Before attempting to use Pickle in your environment, be sure to secure it and only provide such access to trusted parties, you have been warned. Take time to read this very helpful blog entry, Why Python Pickle is Insecure | Nadia Alramli's Blog to understand why pickle is insecure.

Comment #1: Posted 2 years, 6 months ago by Nick Coghlan

While this may be a useful stopgap measure to avoid additional dependencies early in development, it's important to remember that the magic HTTP_USER_AGENT setting is, in essence, a hardcoded password.

Really, if you're doing application level authentication and don't want full OAuth signature overhead (or something equivalent, like Kerberos ticket forwarding) on each request, you're much better off using full authentication to get a secure cookie, then using that cookie for subsequent requests.

Comment #2: Posted 2 years, 6 months ago by Kevin Veroneau

Thank you for your comment Nick.

I did note in the post above about using a login cookie as you describe. Although, if you plan on only providing applications you build with access to this service, and the service remains unknown to the public. Then there should be no issues with using static authentication. You may, as the site admin wish to log every access to the service to confirm that it's only being accessed by your trusted application. The main point of using the User-Agent, is that it prevents standard browsers from seeing the service, or even bots for that matter. It's also the least suspected header to be used for such a purpose. If you transfer this data only over HTTPS, it should be for the most part, difficult to exploit on a trusted network. Of course, you can use any header, which browsers normally do not set, and use the User-Agent on a client application basis to detect which application is using the service at any given time. Web server log analyzing software provides the means to sort logs by User-Agent.

Comment #3: Posted 2 years, 6 months ago by Honza

Thanks for offering your thoughts. I definitely agree that it doesn't always seem worth the effort to implement the full OAuth-based authentication.

However, the major benefit I see in the service-based architecture is the fact that the different components communicate over a dumb protocol (HTTP/JSON) which is language agnostic. Each component shouldn't care what the other parts are written in as long as they share a language of communication. You logging server can be written in Clojure, your auth backed can be a Django app and maybe your front-end is a Node.js app. As such, I think the Pickle serialization isn't the best solution of course. Thanks!

Comment #4: Posted 2 years, 6 months ago by Robert Forkel

I have to agree with Nick regarding the authentication issue. With your approach, you basically invent an additional authentication scheme for your app, with all the disadvantages of homegrown authentication schemes. You already mention some in your reply: "you may, as the site admin wish to log ..." - but you simply opt out of all the infrastructure that comes with standard auth schemes. Other disadvantages: where do you store - and how - your secret header? plaintext in the code? How do you change it? How often? Does this mean redeploy server and client?

My advice is: If your app needs authentication, don't start out with something "simple", but try to find a solution which meets your security requirements from the start (or at least can grow to meet them).

Python Powered | © 2012-2014 Kevin Veroneau