Referrer spam in Google Analytics is a recent problem. Most bots don't execute JavaScript, and crawl your site for benign reasons, such as indexing for search engines. Some however do execute JavaScript, and some do it for a specific reason: to get their URL and link into your analytics.
Like most spam, I'm not sure why they bother, but there must be enough people out there clicking through to make it worthwhile.
What does this spam look like? If you log into your Google Analytics account and head to Acquisition >> All Traffic >> Referrals, you'll see a list of websites that have sent traffic to your website, and all the stats surrounding them. The chances are that among these are some spam links, like these:
Why is this a problem? Aside from being annoying it messes up your analytics and conversion statistics, makes it more difficult to identify the genuine referrers and takes up server resources and bandwidth.
So we're going to see how to block these referrers at the server level, so that they can't load a single resource from your site. This will keep your analytics nice and clean. In this post focusing on nginx, but the process is the same for other servers.
Step 1: Identify the spam referrer URLs
The first step towards removing the spam is to identify the URLs that you believe to be spam - it's generally obvious from the URL as you can see from the image above. Some things to help identify spam referrers are:
- Suspicious URLs
- A high bounce rate
- A low number of pages per session
- A short visit duration time
Create a list of the spam referrers you find as you'll need to add it to the nginx config.
Step 2: Find the nginx config file
To make changes to the nginx configuration you need to open your nginx.conf
file, which is normally found under /etc/nginx/
.
Step 3: Create a HTTP map to flag the spam referrers
Inside the nginx.conf
file add an HTTP map like the one below, adding an entry for each of the spam domains. As per the nginx docs the HTTP map module "creates variables whose values depend on values of other variables".
http {
map $http_referer $bad_referer {
default 0;
"~4webmasters.org" 1;
"~buttons-for-website.com" 1;
}
}
This map uses the $http_referer
variable and defines a $bad_referer
variable. The map will assign a value to $bad_referer
based on the value of $http_referer
.
First of all, set the default
to be 0
. If the $http_referer
is not in your list you want $bad_referer
to be zero.
Next add your list of spam referrers, starting each one with a tilde ~
and giving a value of1
. The ~
character at the beginning of the URLs signifies the start of a regular expression.
Now every request will have a $bad_referer
value associated with it: 1
for what you consider to be spam, 0
for legitimate traffic.
Next step is to use this flag to prevent traffic from these referrers from accessing your site.
Step 4: Deny traffic from spam referrers access to your site
To block the traffic you need to use the $bad_referer
variable in the "enabled sites" configuration. To do this open the configuration for one of the "enabled sites", normally the config file under the /etc/nginx/sites-enabled/
folder.
Inside that file you'll have a server {}
configuration block with a location {}
configuration block inside it. This location
block is the place to use the $bad_referer
. In this case check to see if the $bad_referer
variable is set to 1
(true), and if so return an HTTP status code of 444
which is a 'No Response'. It should look something like this:
server {
listen 80;
#other config values ...
location / {
if ($bad_referer) {
return 444;
}
#other config values ...
}
}
Step 5: Save an restart
Save both these configuration files and restart your nginx service. Once that's done, traffic from this spam referrers will be denied access to your site, keeping your analytics clean.
Step 6: Update the list as necessary
Unfortunately this probably isn't a one-time deal, as more and more spam domains will be created and try to reach your server. So just keep an eye on the URLs in Google Analytics and update your configuration file with any new spam URLs you find.