How to block Google Analytics referrer spam in nginx

Tamas Piros | 21 August 2015 | MEAN stack | nginx

Referrer spam in Google Analytics is a recent problem. Most bots don't execute JavaScript, and crawl your site for benign reasons, such as indexing for search engines. Some however do execute JavaScript, and some do it for a specific reason: to get their URL and link into your analytics.

Like most spam, I'm not sure why they bother, but there must be enough people out there clicking through to make it worthwhile.

What does this spam look like? If you log into your Google Analytics account and head to Acquisition >> All Traffic >> Referrals, you'll see a list of websites that have sent traffic to your website, and all the stats surrounding them. The chances are that among these are some spam links, like these:

Why is this a problem? Aside from being annoying it messes up your analytics and conversion statistics, makes it more difficult to identify the genuine referrers and takes up server resources and bandwidth.

So we're going to see how to block these referrers at the server level, so that they can't load a single resource from your site. This will keep your analytics nice and clean. In this post focusing on nginx, but the process is the same for other servers.

Step 1: Identify the spam referrer URLs

The first step towards removing the spam is to identify the URLs that you believe to be spam - it's generally obvious from the URL as you can see from the image above. Some things to help identify spam referrers are:

Suspicious URLs
A high bounce rate
A low number of pages per session
A short visit duration time

Create a list of the spam referrers you find as you'll need to add it to the nginx config.

Step 2: Find the nginx config file

To make changes to the nginx configuration you need to open your nginx.conf file, which is normally found under /etc/nginx/.

Step 3: Create a HTTP map to flag the spam referrers

Inside the nginx.conf file add an HTTP map like the one below, adding an entry for each of the spam domains. As per the nginx docs the HTTP map module "creates variables whose values depend on values of other variables".

http {
  map $http_referer $bad_referer {
    default 0;
   "~4webmasters.org" 1;
   "~buttons-for-website.com" 1;
  }
}

This map uses the $http_referer variable and defines a $bad_referer variable. The map will assign a value to $bad_referer based on the value of $http_referer.

First of all, set the default to be 0. If the $http_referer is not in your list you want $bad_referer to be zero.

Next add your list of spam referrers, starting each one with a tilde ~ and giving a value of1. The ~ character at the beginning of the URLs signifies the start of a regular expression.

Now every request will have a $bad_referer value associated with it: 1 for what you consider to be spam, 0 for legitimate traffic.

Next step is to use this flag to prevent traffic from these referrers from accessing your site.

Step 4: Deny traffic from spam referrers access to your site

To block the traffic you need to use the $bad_referer variable in the "enabled sites" configuration. To do this open the configuration for one of the "enabled sites", normally the config file under the /etc/nginx/sites-enabled/ folder.

Inside that file you'll have a server {} configuration block with a location {} configuration block inside it. This location block is the place to use the $bad_referer. In this case check to see if the $bad_referer variable is set to 1 (true), and if so return an HTTP status code of 444 which is a 'No Response'. It should look something like this:

server {
  listen 80;
  #other config values ...
  location / {
    if ($bad_referer) {
      return 444;
    }
    #other config values ...
  }
}

Step 5: Save an restart

Save both these configuration files and restart your nginx service. Once that's done, traffic from this spam referrers will be denied access to your site, keeping your analytics clean.

Step 6: Update the list as necessary

Unfortunately this probably isn't a one-time deal, as more and more spam domains will be created and try to reach your server. So just keep an eye on the URLs in Google Analytics and update your configuration file with any new spam URLs you find.