console.blog() | Hopping the Great Firewall of China

Due to family reasons, I’ve visited China quite a few times over the past few years:

winter of 2015 (~1 week)
summer of 2016 (a few weeks)
summer of 2017 (~2 months)
summer of 2018 (a few weeks)
summer of 2019 (~1 month)
winter of 2019 (a few weeks)

While in China, I spend most of my time in Beijing. As a result, I’ve been able to experience somewhat the evolution of the Great Firewall, China’s massive online censorship network. After spending some time in China wrestling with the Great Firewall (a man needs his Hacker News), I thought I’d turn my experiences into a blog post, just for the nerd cred.

Summer of 2016

When I visited China during the summer of 2016, pretty much any old VPN service would work. Honestly, I wasn’t very techy back then, so I just “sampled everything in the medicine cabinet” (read: downloaded a bunch of random VPN services off the iOS app store), which worked surprisingly well.

Summer of 2017

Later on, as I gained more familiarity with computers and the Internet in general, I decided to run my own proxy server on an AWS EC2 instance. I elected to use the Shadowsocks protocol, which is widely used to bypass Internet censorship. There are plenty of implementations available; shadowsocks-libev is a pretty lightweight and convenient one. (Actually, I’ve started work on a Haskell one, just for fun, but I haven’t gotten very far and am not sure if it is worth the effort to finish.)

This worked quite well: the connection wasn’t the fastest, but I was able to stream video at a low resolution. Every once in a while I’d notice that the connection was getting degraded, but I never faced a port or IP ban; I just had to wait a few minutes and then reconnect.

Summer of 2019

At this point, my family officially moved to China, so my parents subscribed to a paid VPN service. This has worked out pretty well (there are periods when it doesn’t work, but it is generally solid), but it’s more fun to spin up one’s own servers.

Winter of 2019

In the winter of 2019 (and the beginning of 2020), I started to notice that Shadowsocks has also started to fail. Interestingly, this failure was selective: it seemed to mostly work for sites that aren’t blocked by the Great Firewall, but it failed for sites that are. (Although Gmail was a strange exception in that it worked.) I found this behavior quite strange, since it would seem to imply that the Chinese government can decrypt my traffic (I’m not very familiar with the Shadowsocks protocol, but I’d assume that host names are encrypted). I eventually figured out why this was so, and the answer is rather silly.

The first clue that I had was that usually, trying to connect to www.google.com would just result in a time out, but every once in a while I’d actually get a bad cert. In fact, Firefox would report that www.google.com was serving the certificate for facebook.com, messenger.com, or another blocked service. This fits in with some other reports:

While the US agents all receive the same set of four IP addresses beginning with 151.101, the China agents each get a mapping to a single IP address, each of which belongs to a known blocked service like Facebook

This suggests the possibility that the Great Firewall is targeting the DNS queries, although directly typing the IP addresses (as reported by a dig www.google.com on a US-based server) didn’t seem to help. I also tried naiveproxy and v2ray, but the results were pretty similar to plain Shadowsocks. (For some strange reason, Duck Duck Go worked with CMU’s campus VPN.)

Eventually, I found a fairly robust and minimalist workaround: just ssh into the CMU Linux timeshare and then browse freely with Lynx! I managed to get my EC2 instance’s IP address blocked twice after trying to do the same with Browsh (I need more investigation into this); as noted by others, the Great Firewall is able to detect bulk traffic being moved through SSH.

After figuring that out, I decided to let the matter rest for a bit, since I honestly took a liking to the minimalist workflow of text-based browsing. Most “important” sites (like Gmail, Hacker News, Wikipedia, and Duck Duck Go) work perfectly fine in a text-based browser like Lynx, so I saw no great reason to figure out what had gone wrong with Shadowsocks. In fact, this method is extremely robust: while my personal EC2 instance might get blocked occasionally (this is easy to fix, since the AWS console is not blocked in China), there’s very little chance that the Great Firewall will decide to block ssh traffic to CMU’s servers.

Eventually, curiosity got the better of me, and I finally confirmed why Shadowsocks wasn’t originally working: the Great Firewall was indeed interfering with DNS queries. It turns out that Firefox by default attempts to resolve DNS queries before proxying the request through the SOCKS server. This setting can be turned off in about:config. This explains the certificate errors: the Great Firewall didn’t even have to decrypt my traffic to figure out what host I was trying to connect to. I’m not sure why Gmail managed to slip through this check.

It’s kind of sad that the solution was so simple, but I guess that’s a good thing. What’s sort of strange is that I’d never run into this problem before. Seeing that Firefox has apparently had this behavior for years, my guess is that the Great Firewall only added this capability recently. (That, or I was using a different name server back then.)

So I currently have a third working Great Firewall circumvention tactic: a naiveproxy server that’s running on my personal EC2 instance. Since the Great Firewall has become more sophisticated since my last encounter, I decided to up my own game slightly and shift away from raw Shadowsocks, which is apparently starting to be detected. Naiveproxy further obfuscates traffic by mimicking Chrome’s network stack, and it defeats active probing by hiding behind a legimiate dummy web service. (Basically, the Great Firewall will sometimes probe suspicious connections by making requests to a suspected proxy server; if the server fails to respond like a normal web service, the Great Firewall takes notice.) I’ve also started actually running my proxy behind port 443 to blend in more with regular web traffic.

I do wonder what the Great Firewall will look like next time I get a chance to play with it (I’m heading back to Pittsburgh this week, and I’m not sure when I’ll be in Beijing again). I’ll probably leave my EC2 proxy running as a back-up for my family in Beijing, but how long will it take for the Chinese authorities to figure out how to block it? It’s kind of a fun arms race between censors and foreigners who want to browse Wikipedia. I’m not terribly concerned, since assuming modern cryptography is sound, there is probably no way to distinguish Lynx ssh traffic from regular ssh traffic. Thus, even if all else fails, the only way for the Chinese government to block my minimalist evasion strategy is to cut off ssh traffic entirely to CMU’s servers, which would likely cause too much collateral damage to actually be done.