Curious: Were real-time updates something that your customers had high demand for?
I'm asking because we thought about doing real-time updates for our product but decided against it because it made topic modelling (clustering based on topics) much harder for us.
Also, what is your experience with publishers being consistent about pinging the hub with updates? Do you still need to poll feeds periodically to make sure you didn't miss something?
I wouldn't go so far as to say there was a high demand for real-time. But some users asked for it by name, and generally, increasing speed (lowering latency) correlates strongly with increased usage. If I can make page loads faster, and feed fetch times lesser, then my users will use NewsBlur more. It's also why I chart the average load time on every user's dashboard.
In this case, real-time means NewsBlur is acting as the subscriber, and the publisher pushes out messages to NewsBlur. This way, instead of fetching every 3 minutes for what may only be updated stories or even no changes at all (not every feed behaves nicely by offering a 304 Not Modified), I only have to fetch the feed when it confirms that there is something new.
About 20% of all feeds have a PuSH (PubSubHubbub) option. But I now have more than 33% less work to do. That's another huge boost, since my feed fetchers have less work to do against the database.
I haven't been running with real-time with granular enough metrics to be able to tell if the PuSH-enabled feeds are pinging correctly or not. And part of the reason of that lapse on my part is that I still fetch the feeds regularly, just 1/20th as often. So instead of every 3 minutes, it's every hour or so. Over an order of magnitude less work, and I don't have to deal with feeds not doing what they should be doing.
Naturally, in the aggregate I can tell it's working well, since a number of new stories are coming in through PuSH.
My experience with topical subscription service is most mainstream people don't care much about realtime subscription. They come back time to time to read through stuff they are interested in. Of course, if your target is journalists or financial analysts, it's a different story.
I know realtime technology is cool and all, but I don't think realtime is what you should be doing. People are increasingly accessing RSS readers through mobile devices such as phones and tablets, and that's where opportunity is at. And this audience doesn't require realtime updates. If you just build what you think is the best experience for desktop consumption, you're fighting against google reader. not an efficient battle. My two cents.
I'm asking because we thought about doing real-time updates for our product but decided against it because it made topic modelling (clustering based on topics) much harder for us.
Also, what is your experience with publishers being consistent about pinging the hub with updates? Do you still need to poll feeds periodically to make sure you didn't miss something?