Categories
Select a category to get more info
- * Beverages
- * Breweries
- * Conferences
- * Days of the Week
- * Frameworks
- * Local Bands
- * Portland
- * Programming Languages
- * Social Networking
- * Web Technologies
- * Webvisions
Methodology
The original version of the analytic code was created to find connections between bands playing at SXSW in Austin, TX this year. That site can be viewed here.
For SXSW the code I wrote consisted of getting messages, running searching for commonalities and caching them in the database. It basically consisted of 8 PHP scripts that were run one after another. More details can be found at the SXSW link above. For Webvisions I had to tune the code to work with a constant stream of incoming data.
Once a minute a cron job is run. 80% of the time it gets new messages from the @WV09 friends timeline. This is why the @WV09 account has to follow you back. For SXSW I ran 13,000 search queries, one for each user, to get messages, which worked great because all the data I wanted was in the past. Now the best data is in the immediate present. By auto-following I can get updates from everyone at once. This cron job also searches the new messages for keyword matches, so the displays around the conference can pick up messages immediately.
There are two other cron jobs. One auto follows followers and imports new followers twitter stream while the other cron job loads my intermediate (cache?) tables. The two intermediate tables speed up analytics between users and keywords by 1,000 and 10 respectively.
Limitations
There are a few limitations with my process.
- * All keyword references are considered equal.
- * Mis-spellings or incomplete keywords are not represented. So if someone didn't type out the keyword correctly their reference is currently ignored.
- * The code doesn't check context, so including a URL with .php in it counts toward the php keyword. Some steps to mitigate this will be taken in future revisions of the code, unfortunately it will not be fixed during Webvisions.