This month
High Scalability - High Scalability - Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
distributed… owning our data… etc. *sigh*Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.
January 2012
The Emperor's New Client
It's funny how you don't hear so much about service mashups these days, despite their undeniable coolness. I'll assert that it's because developing for Web data in the browser is bloody hard work, especially when there are NxN arbitrary API mappings to know.
December 2011
AntiMap Log | AntiMap
AntiMap Log is a smart phone utility application for ‘recording’ your own data. Whether your out snowboarding, skiing, mountain biking, driving, running, or whatever your into, AntiMap Log is a DIY solution for gathering real-time stats with your phone. The indexed data can then be used in conjunction with any of the free AntiMap post analysis applications (or your own creations) to visualize your every move.
November 2011
October 2011
September 2011
IM2GPS: estimating geographic information from a single image
Estimating geographic information from an image is an excellent, difficult high-level computer vision problem whose time has come. The emergence of vast amounts of geographically-calibrated image data is a great reason for computer vision to start looking globally — on the scale of the entire planet! In this paper, we propose a simple algorithm for estimating a distribution over geographic locations from a single image using a purely data-driven scene matching approach. For this task, we will leverage a dataset of over 6 million GPS-tagged images from the Internet. We represent the estimated image location as a probability distribution over the Earth's surface. We quantitatively evaluate our approach in several geolocation tasks and demonstrate encouraging performance (up to 30 times better than chance). We show that geolocation estimates can provide the basis for numerous other image understanding tasks such as population density estimation, land cover estimation or urban/rural classification.
August 2011
pandas: a python data analysis library — pandas v0.4.0dev documentation
pandas is a python package providing convenient data structures for time series, cross-sectional, or any other form of “labeled” data, with tools for building statistical and econometric models.
July 2011
The Microsoft Update: Google explains its data correlation privacy settings
A company spokesperson told me that the search giant is not looking at public phone directories to match phone numbers with user names. But it is looking through social media sites to correlate those accounts to your Google user name and profile.
The Decline of the Online Message Board - NYTimes.com
By contrast, the Web 2.0 juggernauts like Facebook and YouTube are driven by metrics and supported by ads and data mining. They’re networks, and super-fast — but not communities, which are inefficient, emotive and comfortable. Facebook — with its clean lines and social expressways — is Robert Moses par excellence.
June 2011
Official Google Blog: Introducing schema.org: Search engines come together for a richer web
ironic how Google is finally getting the way of the old Yahoo!introduces schemas for more than a hundred new categories, including movies, music, organizations, TV shows, products, places and more.
May 2011
Python 3: Building a Wiki Application | Packt Publishing Technical & IT Book and eBook Store
Implement a data layer for a wiki application
April 2011
Atlas of the Habitual
If you had a visualization of every place you've been for 200 days, what could you do with it? What could it tell you about yourself and how could others use the data?
Technology allows us to see information in a way we never could before. Atlas of the Habitual is about creating data out of the everyday, the hyper-digitizing of your life.
Jdrop | Welcome to Jdrop
Jdrop provides a place to store JSON data in the cloud.
The initial application is for storing performance data gathered from mobile devices.
It's hard to analyze large amounts of information (HTTP waterfall charts, HTTP headers, document source, etc.) on a mobile device.
Jdrop lets you gather this data on the mobile device but analyze it remotely on a larger screen.














