Hacking Twitter's Javascript

– 27 March 2011
– 1158 words

New Twitter uses a massive amount of javascript (for a webapp). Nearly 1 Megabyte of minified, obfuscated javascript powers the site's frontend.

For some unknown reason (maybe I couldn't sleep), I decided to read all of it.

https://si0.twimg.com/a/1301071706/javascripts/phoenix.bundle.js https://si2.twimg.com/a/1301071706/javascripts/api.bundle.js https://si0.twimg.com/sticky/base.10.bundle.js

...and some additional js on the twitter.com pages themselves.

There are parts of New Twitter I like, and there are some things I don't like. Namely, the two biggest gripes I have are that you can't refresh DM's manually (I found out through reading the js that this technically isn't true, but there is a 'bug' that prevents it from working in practice), and that the way that DM's auto-update themselves is horrendously slow.

Enter userscripts. Userscripts are scripts that you can install in your browser to be run on certain sites, injecting custom javascript to interact with the page. So, I set out to fix my pet peeves.

The code is over here on github.

First, a shiny new "Refresh" button is added to the DM page so that you can manually refresh DM's at your will.

Second, it fixes the polling logic on the DM page so that new DM's will update sooner and more intelligently than they would by default. Normally, the DM page checks for new DM's every 90 seconds. In internet terms, 90 seconds is equivalent to 3 times the lifespan of the known universe. With tools like Notifo to let you know instantly when you receive a DM, going to the Messages page only to wait forever to update so you can reply is a frustrating experience. Heck, I get the DM in email form from Twitter faster than I do on the site!

"But, Chad!" you say. "You can manually refresh the pages on New Twitter by pressing the period key! It's an awesome secret shortcut for Twitter power users!"

No, you can't. If you try, you get this nice error:

Apparently they didn't set the proper fetch function for the DM page.

Anyway, the userscript will start off by refreshing every 16 seconds. Why 16? I'm glad you asked, I'll explain below. Then it slowly backs off if there are no new Messages, eventually checking every 90 seconds like normal until you get a message. Then it will start back at 16 seconds. That way, when you are having a conversation over DM, it will feel much more natural.

So, why 16 seconds? Originally I had set it to 5 seconds, but I found that for some reason, even if I had new DM's waiting to be shown, no new ones would show up until the poller backed off to at least 16 seconds. It drove me crazy! If I manually checked with curl, the new ones were sitting there, but on the site, the ajax requests returned nothing new.

Eventually I concluded that this had to do with Twitter's Keep-alive setting. For all other 3rd parties making requests to Twitter, they send a "Connection: close" header which terminates the connection after the data is transmitted. But, if you are making a request from within Twitter.com itself, it sends a "Connection: keep-alive" header with a timeout of 15 seconds. I'm not surprised at all that they don't allow keep-alives for 3rd parties, but I was pretty shocked to see that they enable it within their site.

Anyway, I can only guess that requests are cached or something during this keep-alive phase, so making subsequent requests on the same connection will return the same data. So, when I tried to poll for new DM's in this time period, I got zilch. When I finally waited 16 seconds, the new data flowed forth. By the way, the same thing happens for any other requests on the site, not just DM's. This is also why when I try refreshing Mentions (with the period key) after getting a notification, nothing happens. For a long time I figured the code was just broken, but it is actually subject to this weird keep-alive 'bug' as well.

(Update: John Adams (@netik), one of the head ops guys at Twitter, explains in the comments that Twitter only enables keep-alives on SSL connections, which makes a bit more sense. That way you don't incur the SSL connection overhead for each asset request, which would indeed increase transmission performance. I'm still curious about the strange data-caching situation, and if the keep-alive is to blame or not.)

The next thing I want to add to this userscript is a way to override the default tweet timestamp update mechanism.

Try this: go to your timeline on twitter.com - look at the timestamps of the tweets (usually they say XX seconds ago). It helps if there are several that are less than a minute old. Now watch how the timestamps update themselves. It seems like they update individually and pretty randomly. When New Twitter was new, it led me to ask if every tweet object had its own javascript timer, but @dsa assured me that it was not the case.

Well, while I was writing this userscript, and a userscript to inspect javascript timers, I found the culprit.

function F() {
    var I = +new Date;
    var H = twttr.app.currentPage()
            .$find("._timestamp[data-time$=" + E + "000]")
            .each(function () {
        var K = $(this),
            J = parseInt(K.attr("data-time"), 10);
        if (I - J > 86400000) {
            K.removeClass("_timestamp").addClass("_old-timestamp")
        } else {
            var L = twttr.helpers
                    .timeAgo(J, 
                    JSON.parse(K.attr("data-long-form") || "false"), 
                    JSON.parse(K.attr("data-include-sec") || "false"));
            if (L !== K.text()) {
                K.text(L)
            }
        }
    });
    var G = +new Date - I;
    E++;
    E %= 10;
    setTimeout(F, 2000);
}

Basically, there is a timer that fires every 2 seconds to update timestamps, but each time it fires, it only updates timestamps that end in a certain number, starting at 0 and going up to 9 then repeating. So every 20 seconds all the timestamps update themselves.

This is the weirdest thing I have ever seen.

I would love to know why it was done this way, and why it is so important that they dedicated a timer that fires every 2 seconds to do it. Maybe it's to make the site look more real-time in that different parts of the page are changing periodically? It is a strange illusion if so. I would rather it update all the timestamps every 30 or 60 seconds. It would be fairly easy to intercept this timer and rewrite it, but I will leave that for the next update of the script.

(Update: I found out from Russell d'Sa (@dsa) and Ben Cherry (@bcherry), two of Twitter's javascript engineers, that the timestamp update code works this way as a performance optimization, so that as tweets accumulate on the page, updating each batch of timestamps does not take up too much time. Thanks for the info, guys!)

Additions are welcome! Just fork and send a pull request. I hope this makes New Twitter better for more people than just me.

More commentary over on Hacker News.

— Fin.

Like this post? Please share the love!

Chad Etzel exposits here

Hacking Twitter's Javascript