Last updated: Good bots, bad bots, and the troublesome ones in between

Good bots, bad bots, and the troublesome ones in between

17 shares

Listen to article

Download audio as MP3

Over the past 18 months, we’ve seen a significant increase in the number of intelligent web-bots that target specific e-commerce websites. At the very least these bots are an annoyance, and at the very worst they can effectively cause a DDoS attack and take a website down.

Web bots have been around for a long time and we all benefit from many of them. There are good bots (like Googlebot or Bingbot), and there are bad bots that automatically attempt to hack a web application or inject spam into websites. The good ones are generally beneficial, and the bad ones can often be dealt with by a solution such as a Web Application Firewall (WAF) that will recognize malicious requests and block them.

The problematic bots are often those that sit between good and bad. These can be hard to detect, as they will often impersonate a normal user and make requests that, on their own and in isolation, are perfectly safe, legitimate, and seemingly harmless.

Although their intention is normally something other than a DDoS attack, the effect can sometimes be the same when they are either too aggressive or too many instances of a bot hit a website at once.

These bots are used commercially for a number of reasons including:

  1. Automatic purchasing of products (aggressive purchase bots can cause severe performance issues during product launches)

  2. Aggregation of content (your content can be passed off as someone else’s)

  3. Competitor price analysis (competitors can use this data to undercut you)

  4. Aggressive content crawling (aggressive crawlers can put strain on your web platform)

Real-world example of a commercial bot causing a lot of issues

We have a client that often sells limited edition products which are very sought after. These products can often fetch 3 times the RRP when sold on eBay, and the retailer will only have a limited supply to sell. Most of these products have a coordinated world-wide launch and therefore the exact time of the launch is well known.

Over the last 18 months, we have increasingly seen extremely aggressive bots used in the many thousands to attempt to purchase these products to an extent where the performance of the e-commerce platform can be seriously compromised.

In this instance, the bots have been specifically designed for this retailer’s website, and know the exact requests that need to made to add the product to the basket and go through the checkout. They don’t even need to visit the product display page. They are normally distributed across multiple cloud servers with multiple instances of the bot installed on each server. Because the launch time is public and coordinated, the bots all start to attempt to add the product to the basket and go through the checkout at the exact same time, normally many thousands at once.

The record we have seen is 3 million attempts to purchase a single product in a 12 hour period.

Because the requests are all legitimate and the bot is impersonating a real user, it can be hard to block the bots quickly enough before they do the damage without blocking real users. There is no point in waiting 1 minute to record how many requests a particular IP has made and, if the number is over a certain threshold, you then block them. By this point, the damage has already been done and you have tens of thousands of bots simultaneously in your checkout.

The bots also disadvantage real users, as you can guarantee that the bots will be first in the queue to get the products, as they are timed to start purchasing the second the products go live. Although the retailer obviously still gets the sale, they can lose brand loyalty because of this since real and loyal customers will always lose out.

So how do you manage good bots and bad bots?

Many organizations, such as CDNs, have been rapidly developing bot management solutions over the last year in response to the increasing problems with bots that retailers are facing. Some, such as Akamai’s bot manager solution, can be very sophisticated in the way that they attempt to identify a bot, as well as with the options it will give the retailer in how they deal with the bot.

Simply blocking the bot is not always the answer. If they know they have been blocked, they can just jump to another IP or try to evolve in order to fool the bot manager.

A better solution is to fool the bot by showing them the wrong content (maybe higher prices – in the case of a bot used to analyze competitor’s prices) or just slow them down. This is also a useful technique to use for bots that are only harmful because they are too aggressive in their crawling. You don’t want to block them altogether, but you do want to slow them down a little to reduce the impact on your infrastructure.

Although a bot manager solution is certainly a useful tool, it is unlikely to identify and stop all bots and, in the real-world instance detailed above, by the time it would possibly identify the user as a bot, it may be too late as the damage would already be done. Bots will constantly adapt and evolve to stop bot managers blocking them and so it is a moving target.

The solution to effectively managing these bots is multi-faceted. There is no one, single, solution that will catch everything and give you all of the control you need. Different services and solutions will give protection in different areas against different types of bots. Only by deploying multiple defenses and solutions can you effectively manage these bots.

Discover the trends shaping the future of e-commerce HERE.

4 areas to consider when building a bot management strategy

CDN layer

A CDN can be a first line of defense against malicious or troublesome traffic. The ideal CDN configuration ensures that all requests to your web application, whether cachable or not, are filtered through the CDN. You can then use tools that the CDN will provide such as a WAF, bot manager or even some basic rate limiting rules to protect your website against the most obvious bots.

WAF layer

Many retailers have a WAF layer sitting between their CDN and their hosting infrastructure. A high-quality WAF, such as Imperva WAF, can be used to automatically detect and block malicious requests such as those made by many bad bots. Additionally, custom rules can be added to recognize and block or limit those bots that are not malicious but can be troublesome.

Application caching layer

Implementing a tool such as Varnish that sits between your firewall and your web application can not only improve speed and performance, but can also be used to limit the impact of aggressive bots. A number of Varnish modules (Vmods) are available that can be used to effectively limit the rate of requests being made to specific urls.

Application layer

Changes can be made to your application to protect it from aggressive or troublesome bots.

For example, using simple tools like Google reCAPTCHA at relevant times, limiting the number of users who can add a specific product to their basket at any one time or even introducing initiatives such as a raffle for the purchase of exclusive and limited edition products so that these products cannot be purchased in the conventional way will help to prevent the bots from being successful.

It’s important to consider implementing some or all of the solutions above rather than just relying on one of them as each will provide defence against these bots in slightly different ways.

For example, if you simply relied on an application change to prevent purchasing bots they will still be hammering the rest of your infrastructure and even cause issues such as filling apache or Varnish logs files to an extent that your server could run out of disk space.

Good bot vs bad bot: Don’t ignore the signs

In summary, bots are becoming an increasing commercial threat to e-commerce retailers and dealing with them effectively can be very complex. Estimates of how much web traffic is actually human versus bots vary but the general consensus is that up to 50% of all web traffic is generated by bots.

If you consider this number and the amount of bandwidth and capacity that they will use and the fact that around 50% of that bot traffic is from ‘bad’ or malicious bots, it is not something that any retailer should ignore.

Fast, flexible e-commerce
is just a few clicks away
.

Share this article

17 shares

Search by Topic beginning with