microformats

Liminal Existence

Clouds in Iceland

Sunday, May 11, 2008

Scalability

LOL.

For all those who don't get it, languages don't scale, architectures do.

Now, some languages are faster than others. That means that to complete a given operation, it costs less, everything else being equal. Costing less is a good thing. But developers also cost money, so if you have to spend money on developers' time porting from one language to another then you might not be saving any money at all, and really you're just treading water.

Once upon a time, Shell Scripts were used to write CGI applications. With the correct architecture, and enough money, you could build Google with tcsh. No, really. It wouldn't be fun, and you'd be dumb, because there are much cheaper ways to do it. But then again, if you stuck with it, perhaps you'd optimize tcsh to be really fast at spawning and serving up web requests. Faster than Java, faster than <insert your favourite language here>. Faster means cheaper, it doesn't mean more scalable.

I point to exhibit A. Perl used to be slow. Now it beats JoCaml with the bestest concurrency (re: “Scalability”) around. What was Perl built for? Parsing text. Lots of it. All the time. It's fast. Does it mean that you can't build Wide Finder with another language? Absolutely not. Does it mean that you couldn't build Wide Finder to scale out to a trillion documents with gawk? If you answered “yes”, go back to the start of this post and read again! :-) If you're still answering “yes,” try reading some more. Leonard, Ted, Joe, Cal, and Theo are good places to start.

If you answered “no,” congratulations! Pat yourself on the back for knowing what scalability means.

Labels: , , , , , ,

Saturday, March 08, 2008

FoWA Miami Rocked.

I've been busy getting [Twitter] ready for SXSW, and have completely failed at email and, well, everything except work, since then. Before I actually land get drunk in Austin, I wanted to make a quick post about how great FoWA Miami was.

After giving my workshop, I attended Joe's workshop on scalability, which was an amazingly thorough discussion, and I highly recommend attending anything that Joe does in the future (including the panel that he, Cal, and others are on at SxSW. I didn't catch many of the session talks on Friday, as I spent much of my time talking to attendees, but the "Building a Web App in 45 minutes" panel was a fun experiment, and both Cal and Gary were energetic, brilliant, amazing, all that good stuff. If you get a chance to see either of them speak, don't pass it up. Especially Gary, as you're likely to get free wine, even if he makes you eat dirt beforehand.

I had some hiccups during the demo / discussion portion of my workshop, but the first part went well, I think, and I'm looking forward to some great applications that incorporate Jabber soon. My talk, "Bringing your web app to the masses" went well, except for the part where Twitter went down in the middle of it, and I got a call from work (which I waited until after the talk to answer).

Some heckling ensued, but I was happy to be able to address the audience's questions about Twitter in an open and honest way. The atmosphere that Carsonified has managed to foster at FoWA helped a ton.

Mel, Ryan, Lisa, Keir, and Elliot did a fantastic job organizing everything, and Tantek and Brian put together an amazing lineup of speakers and workshops. Seriously inspiring stuff.

If they'll have me, I'll definitely be going to any conferences they hold in the future. They've come a long way since the FoWA San Francisco in Fall 2006, and it looks like they'll just keep getting better.

Thursday, February 21, 2008

Google Entaglement

I've done a fair bit of security work, and generally try to care about the finer details of privacy and security. However, one of the things that I've learned is that more often than not, no amount of digital security past a certain point is going to help, since usually the threat model isn't an advanced technological attack, it's a social one.

Thus far, Google has done a pretty good job of keeping private things private and public things public. I've spoken to people on the Google Reader team, and the main reason they haven't added support for private feeds is their acute concern for privacy.

Today Google announced a limited trial of storing health records online. This seems reasonable and doable in a secure way, but I'm sure they'll get lots of unwarranted flak for the long-awaited project.

However, there will and should be some warranted flak. It turns out that they're using your regular Google account to store this information, and will provide access to it using your regular password, no doubt through yet another Google login page. I've heard concerns that OAuth supports phishing (from Google people), but project infighting and power struggles at Google that result in tens of login pages, all slightly (or dramatically) different, all using the same credentials supports phishing much moreso.

I strongly support patients' rights to access their medical information, and Google is probably one of just a handful of organizations that can do the necessary coordination work and stand up to invasive organizations at scale. However, they need to stop thinking of this data as theirs, because it's not — it's your data. Using the same password as your email to access your health records is something that should be actively discouraged. If Google wants to present a unified interface, they should expose an API and use OAuth or AuthSub, just like any other third party that would consume the data.

Now, I may be over-reacting, but I had an interaction yesterday that suggests to me that I'm not. Someone using GTalk sent a chat request to blaine@twitter.com; this email address has an MX record that resolves to mail.twitter.com, and the corresponding JID resolves to jabber01.twitter.com. However, I have claimed my blaine@twitter.com address on GMail, and associated it with my primary GTalk ID (romeda@gmail.com). When I accepted the chat request, the response came from my GTalk account, romeda@gmail.com.

In effect, Google had done something clever, and in so doing broke the Jabber spec, ignored my own self-hosted Jabber server, and exposed my personal email address without asking my permission.

In this case, it wasn't a big deal, I don't care, etc. Others might, though, and I only knew that it was happening because the person on the other end of the chat was tech-savvy enough to realize what had happened. Also, email addresses and connections between them are hardly closely-guarded secrets. The thing I take away from this is that Google is being sloppy. There's a lot going on, and it's hard to keep track of it all. That your health records are being tied to your Google account just reeks of some power struggle where the Google account people want to bolster their product's internal importance (or have managed to do so that they get veto power where they shouldn't have it), and it's simply not a pragmatic choice. There's a reason your health records aren't stored at the DMV, and it's not out of convenience. Just sayin'.

Labels: , , , , ,

Sunday, January 20, 2008

FoWA Miami

Just a quick note that I'll be giving a workshop on building real-time web applications using Jabber at the Future of Web Apps in Miami, on February 28th. The conference runs from the 28th to the 1st of March, and should be a lot of fun.

We've been gradually improving the Jabber stack on Twitter, and we're now sending millions of messages every day, doing things that just don't fit into the polling-based world of Atom feeds. There are a ton of extremely awesome things that can be built, and so far we've just scratched the surface.

More to come; if I don't start blogging these things in small pieces, they'll never come.

Labels: , , , ,

Saturday, September 01, 2007

Stability

Most of the Twitter team's work in the weeks leading up to launching Blocks was to ensure that it wouldn't fall over as soon as we released it. It's an extremely punishing application, loading 10 timelines on every occasion that someone looks at it. So far, the servers haven't even noticed.

There have been a number of Twitter hiccups in the past few weeks, but they've all been weird, random bugs. Which is not to make excuses, but rather to say that in spite of (very time-consuming) challenges along the way, we've been myopically focused on making the site faster and more reliable. As evidence, here's a graph of page load times, as seen from an external observer:

Twitter Load Times, as monitored by an external observer, over the past month.

We're going to keep building a faster and more reliable Twitter. We're also going to add some awesome new features, and soon. Possibly better than contact search and GMail, even! Finally, we'll have more visualizations from the Stamen folks. Britt is off to Berlin for RailsConf mid-September. We'll then have more details about what we're doing to push Rails and Twitter.

Labels:

These are the people ...

Folly: "In architecture, a folly is an extravagant, frivolous or fanciful building, designed more for artistic expression than for practicality." – via Tom Coates, by way of Tom Carden.

We just released Twitter Blocks, a nice little visualisation done by the good folks at Stamen Design. It's fun! Go play!

Stamen's recent work highlights the playfulness inherent to Twitter. I can't wait to release more of these interfaces, and hope that it inspires similar work. Sam Ruby, Tim Bray and others have recently weighed in with their long bets. I'm willing to put down that playfulness — of the sort that Stamen, Schulze & Webb and Jane McGonigal explore and invent daily — is so important to who we are as people that the tech world won't be able to ignore it for much longer.

Not exactly a risky bet, but too often the tech industry just ignores these things, so there it is, just for kicks.

Labels:

Thursday, June 21, 2007

SELECT * FROM everything, or why databases are awesome.

I've just committed a patch to ActiveRecord that prevents a large number of very, very bad queries from hitting your database. Go update your code, ASAP.

We've made some pretty significant progress towards scaling Twitter, and we're now at the point where the majority of requests that hit our site complete in less than 70 ms (mostly API requests), and the really complicated front-end pages that we display complete in less than 160 ms. There are still a lot of hiccups, so the average is higher than that, but we're constantly working on getting it down.

One of the consistent problems we've been facing is errant queries. We've been seeing (off and on) queries like:

SELECT * FROM statuses WHERE user_id = 234223 ORDER BY created_at

If you know anything about relational databases, this is a very bad thing, especially when you have users that have more than 20,000 statuses.

One major downside of having an object-relational mapper is that you don't always control what goes on behind the scenes. In tracking down this problem, first we investigated all our code, and weren't able to find the source of these problems. Switching tactics, we isolated some test cases that replicated the problem and brought out the big guns: print. This pretty quickly brought us to an obscure corner of the ActiveRecord source (three cheers for source code!), where it became apparent that Rails was doing these gigantic loads from the database every time we saved even a single field in a related object. There are a bunch of mitigating circumstances that mean that this bug doesn't get triggered all the time, but it's still really really bad.

Thankfully, the patch will be committed soon has been committed (32 minutes patch-to-commit!), and no-one will have to deal with, as Coda put it: "Arg stabby stab stab stabbity fuck stab" anymore. The fact that no-one noticed really speaks to how freaking awesome relational databases (in our case, MySQL) are these days.

Perhaps underlying all of this is the simple fact that most of the time, ActiveRecord and Rails in general is pretty solid, and Ruby underneath is a fully sound language with which to build high-volume services. Kevin over at PowerSet has more on the topic - they've recently announced that they'll be doing their front-end development in Ruby (up until now, it's just been a glue language internally).