In the “real world” there are generally accepted standards set for access to a business and its services. One of the most common standards is “No shirt, no shoes, no service.” Folks not meeting this criteria are typically not allowed past the doors of a business.
But on the web, access to services is implicit in the fact that the business is offering the service. If the HTTP service is accessible, it’s implicitly allowing connections and providing service without any standard criteria for access. This results in access by more than just customers and potential customers. Bots, spiders, and miscreants are afforded the same access to business services as more desirable visitors. This can unfortunately lead to compromise, theft, and corruption of data via myriad injection and attack methods – many of them automated. While gating access to services that comes from offering a service is likely not the best solution (although it is a solution), there has to be a way to at least mitigate the automated abuse of open access to services by miscreants that leverage scripting to attack sites.
Right now the ability to connect and browse a business’ services via HTTP is unfettered. We don’t want to change that. History has taught us that gating – requiring a login – will deter miscreants, yes, but it will also drive away a goodly portion of legitimate visitors. Requiring authentication simply to access a site is neither desirable from a business or visitor perspective. Certainly authentication via identification is required for the use of forums or leaving comments, but not for accessing, say, “about.html.”
The problem is that visitors can be “turned off” by comment spam and by the potential fear of infection from user-generated content. The latest study “State of Internet Security” from WebSense indicates that 95% of all user-generated content is tainted. Even more frightening is the conclusion that “61 percent of the top 100 sites either hosted malicious content or contained a masked redirect” and “77 percent of Web sites with malicious code are legitimate sites that have been compromised.”
What we need is the digital equivalent of a “no shirt, no shoes, no service” policy for the web. If you aren’t human – or an authorized spider/bot – you aren’t allowed to access the site. Period. But we need to do it in such a way as to not require credentials. Not only do we not want to manage the additional data that would be generated from requiring authentication just for access (which increases our risk as well) but visitors do not expect nor will they likely put up with creating yet another account just to find out more about your business and services.
We need anonymous human authentication. AHA!
You’re probably thinking “but we have CAPTCHA for that.” We do, but CAPTCHAs are generally only used to authenticate humanity for the submission of data, not simply access to the site.
CAPTCHA is generally also integrated into the application and its architecture, which means visitors – legitimate and otherwise – must be able to access the application in the first place before being authenticated as human beings. What we want is to authenticate the humanity of the visitor the very first time they connect; when the first HTTP request is sent. We also don’t want to be intrusive; that is, we don’t want to force authentication on every request and every visit. We just want to require authentication once and never again.
So what we do is use network-side scripting and the ability of an intermediary – probably a load balancer or application delivery controller – to both provide the authentication and the means by which it is maintained transparently. The intermediary uses network-side scripting to check for the existence of a “human” cookie. If the cookie exists, it is assumed the visitor has already been verified as being human (wearing a shirt and shoes) and is requests flow normally.
If the cookie does not exist, as would be the case on the very first visit ever to the site, then the intermediary needs to generate some test – like a CAPTCHA – to verify the humanity of the client. CAPTCHA is not perfect, but it’s better than nothing. Another method might be to provide a page that contains a simple test. Perhaps there is a single button on the page with text explaining (1) what the test is for and why and (2) instructions to click outside the button. JavaScript can grab the coordinates of the mouse click and automatically submit the form. The intermediary verifies that the coordinates were outside the button (automated scripts and bots are almost always going to automatically click the button/submit the form without interaction) and then sets the appropriate cookie on the client. Subsequent requests flow normally with the anonymous human authentication occurring on the intermediary transparently.
The key to keeping visitors from becoming annoyed is to explain why you’re requiring this simple test for access. It is a rare individual that turns around at the door of a business upon which is hanging a “no shirt, no shoes, no service” sign, after all, and this is in implementation little more than its digital equivalent. It’s not intrusive in that it doesn’t require credentials; we aren’t asking visitors for names and addresses because that would in all likelihood drive them away. We’re asking them, for their own safety, to help us identify and stop miscreants and their auto-injecting scripts from making the site and its community-shared areas a digital slum, filled with spam and malicious links and other digital detritus.
Anonymous Human Authentication (which is a term I made up when the thought occurred to me, by the way) is not a panacea any more than the ability of web application firewalls to detect web scraping. It is not foolproof. The use of network-side scripting for implementation, however, allows for variation in technique that can further mitigate the potential of miscreants to script around the solution. Use a variety of techniques – randomly - rather than the same verification method every time. Being dynamic affords the ability to sidestep one of the key requirements for automation: repeatability. Take that away from miscreants and they might decide your site isn’t worth their time.
This technique has the added advantage, by the way, of potentially stopping mass SQLi attacks. That’s because one of the things miscreants count on is that the pages used to submit data and through which malicious code is ultimately injected is accessible without authentication. If an intermediary is providing transparent authentication, the miscreants automated script is not likely to be able to even attempt to submit the malicious code because it will have been identified at the perimeter as being suspect. Unless the miscreant is sitting in front of their desktop, ready to verify their humanity, their automated scripts will fail. Given that mass SQLi generally targets multiple sites at once in what is effectively a digital strafing run, the miscreant does not necessarily care that his/her script failed on your site. It’s expected that a high percentage of mass SQLi attacks fail; miscreants aren’t going to spend a lot of time trying to circumvent what security you have in place when there are myriad other insecure sites out there.
Another benefit is in the veracity of analytics. If you know that only humans – real visitors – are able to access your site and applications you can trust the data and the analysis of that data to be much more accurate and thus useful to the business. Similarly, keeping miscreants from repeatedly access and consuming compute resources keeps the costs associated with delivering applications down. Offloading the authentication and rejecting non-human visitors means fewer connections and associated resources are utilized on the application infrastructure which in a cloud computing environment can result in a reduction in operating expenses.
Unfettered access to HTTP services has long been the norm. We’ve seen where that got us. Maybe it’s time to implement a “no shoes, no shirt, no HTTP service” policy to crack down on miscreants and make the web a nicer, cleaner, safer place to be.