Monday, October 29th, 2012

Developed a new type of form

Due to recent spammers/bots attempting to create comments, which leads me to having to remove them and block them entirely. I decided to develop an idea I built a few years ago. The idea came to me when I thought of how bots work and understand forms. They normally check for common field types and fill those in. Essentially what I did, is take way those easy to see field types in the HTML form. Now a comments form in Django is rendered in HTML like follows:

<form action="/comments/post/" method="post">
<div style='display:none'><input type='hidden' name='csrfmiddlewaretoken' value='7608951b675fe0b286951df9880276e8' /></div>
<input name="KRGRTNRWA" value="" type="hidden" />
<input name="RQGKZWFSK" value="blog.entry" type="hidden" />
<input name="DPCOMKEJS" value="68" type="hidden" />
<input name="TUSGZBKQE" value="1351495666" type="hidden" />
<input name="ICOHFKUPI" value="fb931472ccb6344f941e6416091d73706c92332e" type="hidden" />
<input name="KZQBWXPSX" value="" type="text" />
<input name="EAAAJQLNT" value="" type="text" />
<input name="OTVHKQTVY" value="" type="text" />
<textarea name="MPBQIMIKS" id="id_comment" cols="80" rows="10"></textarea>
<input name="IAUXROHTG" value="" type="hidden" />
<input type="hidden" name="auth" value="1d5d9efa72ee52386563aab8c8d573f1" />
<input class="btn btn-primary" type="submit" value="Post Comment">
</form>

It's hard to tell what needs to be filled out with what. If the server-side validation fails for a field, it's back to trying again and again until it validates for a successful comments post. Now your most likely thinking... Well, once they find out the pattern, they've basically cracked it and can start spamming with the new field names... No so fast... You didn't let me finish. These random ASCII letters you see there... They are randomly generated on each request, and are unique for that request. Well, what if they found out how to generate their own data in order to use the same field names over and over... I've thought of that too! The magic of this is, that it is absolutely bound to a single request no matter what. There's no way to trick the system. On the server-side, there are plenty of checks to make sure that everything is absolutely in check before even looking at the data and attempting to see what fits where.

I may open source this code, since it does utilize the SECRET_KEY in Django. These types of forms may help others as well, and open sourcing it might even make the code better... The code is sort of messy right now, and not properly attached to the Django forms class. It is currently built using it's own class and special constructor. Once the data is verified and valid, the code can then pass the posted data over to a standard Django form for handling. So, this can use Django forms, and model forms. However, this special class acts as a proxy between the POST data and the Django forms API. Here's a simple code example of what the backend does:

def my_view(req):
    frm = DynamicForm(req)
    frm.addField('text', 'name')
    frm.addField('text', 'age')
    frm.addField('textarea', 'bio')
    if req.method == 'POST':
        if frm.is_valid():
            form = ContactForm(frm.getFields())
            if form.is_valid():
                return HttpResponse('Form was valid: %s' % form.cleaned_data['name'])
    return render_to_response("dynamic_form.html", {'form':frm}, context_instance=RequestContext(req))

This is actual code I used during the testing of this new forms system, as I said, it's messy. Currently, it doesn't interface with Django's forms API, which is a complete shame. Since it doesn't read Django's forms, this can technically be used with any Python web framework, which is a positive about it. You can see from above, that it needs to declare the form, once the form is declared, it is validated. Once the validation is complete, it passes a standard dict of the POST data over to the real Django form for processing. Also, rather than passing a Django form to a template, you pass this special form over for rendering. This class basically manages the special mappings used in the HTML form code.

You can try out the comments form below and take a peek at it's source code as well. If you plan on submitting a comment, be sure you haven't visited other parts of this site which use the comments form. Currently, since this uses state-fulness, only one comments form is currently supported on a single domain. Although for most applications, this shouldn't really matter.

Comment #1: Posted 2 years, 1 month ago by Senyai

Autofill will not work

Comment #2: Posted 2 years, 1 month ago by Kevin Veroneau

Yes, blocking auto-fill is part of how to block bots. Bots are basically a really advanced version of an auto-filler, well not that advanced. They exploit the convenience of an auto-filler to spam forms. In order to effectively block bots, you hit them at the source. For relatively small forms, using the "click-mode" in your browsers auto-fill shouldn't be too much work. On these comment forms, only the Name and Email are required fields.

Comment #3: Posted 2 years, 1 month ago by Kevin Veroneau

Hmm.. An interesting idea would be to update a browsers auto-fill to use an alternate attribute, rather than 'NAME', say use the 'ID' field, or another attribute. This will prevent some older bots, while still allowing auto-fill to work in newer browsers. However, bots will evolve and start using this, cat/mouse situation. I plan on making an update soon, so that users can actually sign into PythonDiary. Anonymous comments will go through this system, while authenticated comments will use standard auto-fillable fields(not that they will be needed anymore...). I will write a new blog post on this soon, to explain the benefits of signing up.

Comment #4: Posted 2 years, 1 month ago by mg

That form is really bad for screen readers and accessibility. Don't use it

Comment #5: Posted 2 years, 1 month ago by Jonathan Street

It is nice to see comments re-enabled.

I'm not sure how successful it would be if implemented on all sites but I would guess while it is relatively unusual it could be quite successful. Although I do share the concerns raised by mg.

"On these comment forms, only the Name and Email are required fields."
This seems quite strange. Why isn't the comment field required? Also, do you actually need the email?

It would be interesting to see a before vs after comparison on the rate of spam submissions.

Comment #6: Posted 2 years, 1 month ago by Kevin Veroneau

Sorry, the comment field is also required. I am still tweaking the forms system, and noticed that your comment Jonathan was initially rejected due to the Token expiring. I am sorry for that. I do receive notices of these for now, to fine tune it. I may need to increase the expiry time.

As for screen readers, I am not sure why a screen reader needs to read the "NAME" attributes of a form, it should be fine enough for both a screen reader and accessibility to use the Labeling, which is in plain English. Although, I haven't used any accessibility options in a browser before. If you can explain your particular issues, I can see about fixing it in the next update.

Comment #7: Posted 2 years, 1 month ago by Kevin Veroneau

Well, since implementing the new forms, bots have ceased, but the low lives who are paid to sit at a computer to spam comment forms are still doing their job. I am sure the ones left are those people, since I don't see a bot working too easily with randomly generated forms. The person better be being paid a lot to sit down and spam these forms, since they cannot use their auto-fill tools.

Comment #8: Posted 2 years, 1 month ago by Adam Skutt

Sure, it stops the bots of today, but probably not for all that long.

All they have to do to crack this is to read the associated label tags instead of the form field names, just like screen readers and the like do. This only works as a technique if you break completely and utterly break accessibility. It's not even like they have to guess which label goes with which input, since you have to tie them together in the HTML.

Comment #9: Posted 2 years, 1 month ago by Kevin Veroneau

Adam, you are definitely correct, I did not take HTML parsing into consideration. Seeing how good Beautiful Soup is, bot programmers only have to use this Python library and so-called security I thought I made is gone... Is there there really no easy way to get rid of bots without using some sort of Captcha system? I'm not really fond of entering in Captchas. Another idea is to generate the actual form using pure JavaScript, and hope that most bots can't use JavaScript. I was really hoping that Google+ would have enabled a site comments system by now...

Python Powered | © 2012-2014 Kevin Veroneau