Optimizing your feed, Part 1
- 0
- Add a Comment
Tim Bray analyzed the traffic his RSS feed gets and discovered that nearly 80% of requests (downloads of) to his feed were repeats. Most people that requested his feed had already done so previously that day and the feed hadn’t changed since their last request. I’d imagine that most RSS feeds are in a similar position and many feed publishers have RSS feeds that waste bandwidth.
You can fix that pretty easily by being smart about your feed. This is the first in a series of articles about how to do that.
Since most people read feeds through an aggregator or some sort, feed requests are typically made in an automated fashion by software, whether the feed has been updated or not. Some aggregators check your feed every 15 minutes. For the average 15kb feed, this translates to 1.44 MB of bandwidth every day, and that’s just from one person subscribing to the feed.
For those that aren’t familiar with the workings of Web servers and aggregators, a bit of explanation about how those things work is in order so you’ll understand the rest of this. Don’t worry, it’s simple to understand.
Each time a browser or other piece of software asks for a file from a web server, it sends a couple of things along with its request. The first is obviously the name of the file it’s requesting and the name of the server of which it is making the request. Another thing it can send is the date that it last asked for that file. When a Web server sees that date, it can tell the software, “that file hasn’t changed since then, so there’s no need to download the whole thing again.” The software then just acts as if the whole request never happened and displays the same copy of the file it showed last time. Most popular news aggregators work this way, but some do not. I’m going to show you how to reduce your bandwidth bills for both types of aggregators.
Your Web server is only able to check that date (called an If-Modified-Since header for those who care) if it is reading static files off the disk. You see, that’s how it figures out the date. It looks to see when the file was created and compares that against the date the Web client software sent over. If you use a RSS tool that creates the RSS automatically every time the file is requested, then your tool needs to tell the Web server when the feed was last modified. How exactly to do that is beyond the aim or this article. Your blogging or CMS tool might already do this. Ask the software vendor about it. If they don’t do it, Ask them to. Threaten to hold your breath until you turn blue if they don’t.
The point of all this is that your feed shouldn’t change very often. The less often it changes, the more likely it won’t be gratuitously downloaded every fifteen minutes. I don’t mean that you shouldn’t post new content. But you need to make sure your feed is only updated when you actually do post new content. Many feeds contain comment counts, actual reader comments, or other stuff that frequently changes. Sometimes feeds automatically insert the current date and time each time the feed is built. Doing this will cause your feed to show up as changed even when it’s not.
For example, Movable Type (as of version 2.51) is smart enough not to change the file date of your feed if the contents of the feed haven’t changed. But if you add a current date and time to your feed, you’ll ensure that the file date changes every time the file is rebuilt. Since the RSS feed is stored as an index template in MT, it will be rebuilt every time someone posts a comment or sends a TrackBack ping. Optimizing your feed means only including information that actually changes when you add new content.
This could mean real bandwidth savings. If our hypothetical 15kb feed is read by 100 different people whose news aggregators grab it every 15 minutes, you’ll save 143MB of bandwidth every day. This assumes that every person is using an aggregator that handles the change date system, which probably isn’t realistic, but you get the idea.
So how can you optimize your feed for aggregators who don’t follow this standard? You’ll find out in the next part of this series.
