Past and present
Back in 2006 the site Dagospia.com didn’t provide a feed for its headlines and so I thought it was a good idea to do some web scraping and produce a valid Atom 1.0 feed out of it. If you understand Italian you can still read the announcement.
Finally, during the last year Dagospia added a RSS 2.0 feed — albeit invalid — to syndicate its latest headlines, so I thought it was time to move on and try something new. Since mobile computing is growing, and it will even more relevant in the future, I created Dagopocket, a stripped-down version of the Dagospia home page.
Let’s see how this thing works in detail.
Step 1: converting the feed
Since I wanted to continue to provide a valid Atom 1.0 feed for Dagospia the first thing to do was to fix the site RSS feed.
I used the amazing Universal Feed Parser to fetch the feed and extract any relevant information: news headlines, publishing dates, and categories. In the process I encountered only one serious problem: the feed titles was littered by several invalid characters as documented by feed validator help pages.
And guess what? A quick web search led me to Effbot’s
kill_gremlins Python function, which did exactly what I needed: it substitutes those bad characters with the corresponding Unicode counterparts.
Step 2: from the feed to the mobile version
There are a couple of important concepts to keep in mind while designing the Dagopocket page, and generally any application for the mobile market:
- Since screen size on smartphones is severely limited every superfluous graphic element must be removed.
- Users open Dagopocket to look up the latest headlines, so the page just need to give them what they want: the latest scoop. The user in that particular moment is essentially focused on a single task, so it is crucial to not add details or features which could distract the reader.
With this in mind building the page was then a matter of reading back the Atom feed and formatting the results with some HTML5 and CSS code.
As documented here some extra settings must be specified in the
head element to show the page with the correct viewport and scaling on the iPhone:
<meta name="viewport" content="initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no, width=device-width" />
Step 3: scheduling and caching
Finally, I scheduled the script with a server cron job and set to rebuild the
feed.atom and the corresponding web page every hour.
To serve the files from the web server with maximum efficiency I enabled the Apache caching mechanism with these lines on the
<IfModule mod_expires.c> ExpiresActive On ExpiresDefault "access plus 1 week" ExpiresByType text/html "access plus 1 hour" ExpiresByType application/atom+xml "access plus 1 hour" </IfModule>
Admittedly the current implementation is nothing more than a beautified feed. However, I think there is room for improvement. Here there’re some open questions:
- It will be cool to have a single story page optimized for the small iPhone screen, but this would stop a visitor to load up the Dagospia.com web page and I suspect the site owners would not be happy about this.
- Is it useful to group headlines by category? I did that initially but I recently scrapped that feature. My gut feeling is that a reader wants to look up the latest site headlines, regardless of their categories.
- Potentially linked to the previous point: how to take advantage of the iPhone landscape/portrait mode? Should a change of the device orientation to trigger a new arrangement/sorting of the headlines?
Dagopocket should be compatible with browsers featured on any modern smartphone, although there are some optimizations specific for the WebKit-based browsers.