Foreword
There will never be a perfect camouflage. All protocols shares the risk of being identified.
Common middlebox attack methods
- Passive analysis (traffic characteristics, PoC vulnerabilities)
Often used to analyse plaintext traffic or TLS handshakes
- Active probing
Generally deployed against Shadowsocks, V2Ray, TLS 1.3 (Getting server SSL certificates)
- Traffic replay attack
Characteristics of proxy traffic
- Long connections
Most HTTP traffic just loads and disconnects, hence long connections to an address can be an easy way to identify proxy traffic
- Bi-direction traffic
A misconception of modern proxies is using web services to disguise it. However, 99% of HTTP is uni-directional traffic, or a set of Request -> Response. There are rarely any websites using WebSockets, hence having Bi-directional traffic on web services only deceives yourself.
- Large traffic
There are just no explanation why you are using 50GB of traffic participating in an online chatroom that uses websockets. In this case, censors have a valid reason to cut your traffic anyways, because it’s likely that you’re sending some kind of media that you aren’t supposed to be sending.
- High TCP connection counts
Proxies also forms a large amount of websockets connections to a server, whereas a normal web app utilising websockets forms a single connection. This is an obvious issue with proxy traffic.
- Point to point
You’re accessing a single encrypted site, 24/7, pushing through hundreds of gigabytes of traffic through. If that’s not suspicious enough, then it’s time to get glasses.
All proxy protocols shares these characteristics, with no practical method of hiding these traits.
Common myths
- With TLS, everything is safer and cannot be identified.
See TLS connection risk assessment
- With IP whitelisting, all probing and replay attacks are useless
All your traffic passes through the firewall, and hence the firewall obviously does have the ability to fake traffic from your IP address. Meanwhile, the fact that there is whitelisting just makes the traffic much more suspicious.
- I’ve never been blocked
That’s normal. It’s probably that you haven’t pushed enough data through the firewall to make it worth analysing.
- (For China) Domains with government registrations will not be blocked.
Both government-registered domains and normal domains are treated the same for international traffic. The only difference is that government-registered domains can use servers in China to serve websites.
TLS connection risk assessment
- ClientHello
TLS fingerprinting uses fields in a ClientHello message to identify a client. Different TLS stacks in different programs also have different fingerprints. For example, if Golang with GoTLS is used to access a ‘site’ and some characteristics matches that of a proxy’s, then there is a pretty high chance that it is a legit proxy connection (as there are a bit variety of proxy program written in Golang, for example, V2ray)
From testing in Iran, both curl and wget are blocked by SNI to access a non-whitelisted domain, where Go TLS and common browsers works fine.
*Hence, it is recommended to use browser-like fingerprints for proxy connections.
Golang had a library called uTLS. However, the library hasn’t been maintained for a long time and the Chrome fingerprint for the library is at Chrome 83. This may be a potential vulnerability. Coia’s fork of uTLS has the fingerprint for Chrome 104, the latest at the time of writing.
- SNI
Free and cheap domains are often at more risk to being blocked.
- TLS version
TLS 1.3 is the riskiest, as after ServerHello
, everything is encrypted. Hence, the middlebox needs to actively probe the certificate from the server.
TLS 1.2 is less risky, as the TLS handshake is still in plaintext. The middlebox can check with captured traffic.
SSLv3/TLS 1.0/TLS 1.1 is least risky, as there isn’t much traffic through older versions of TLS anyways. However, it isn’t recommended that older versions of TLS are used anyways, as some encryption methods are still at risk of being attacked.
- TLS Server certificate
Obviously self-signed certificates are the most suspicious for a middle box.
Then its Cloudflare universal certificates and Let’s Encrypt certificates, as these are all free.
The general rule is that if something is easy or cheap to get, its probably also more suspicious to the middlebox. After all, no-one will realistically spend a few hundred on certificates for proxies.
Identifying FakeTLS traffic
Analysis
MTProto FakeTLS and Shadow TLS (v1/v2) both mitigates SNI whitelists by simulating trusted certificate’s TLS handshake. Both authenticates their clients while handshaking.
- MTProto FakeTLS authentication
Hmac the ClientHello packet without the random field. The key is literally the ‘secret’ field. Servers also can use hmac to authenticate clients, and if random fails, it will fall back to the real web server.
Identification: MTProto’s TLS handshake isn’t actually regular. The firewall can capture the 3rd packet (hostCert) and analyze the packet length. (See mtg’s faketls code) hostCert length is random, ranging from 1024 to 4096 bytes.
- Shadow TLS v1 doesn’t authenticate clients
Identification:use curl to request from a server. Curl will fail.
- Shadow TLS v2 authentication
After requesting and receiving, it uses the data returned from the server as the challenge’s ‘response, the password to hmac. Likewise, the server uses the same method to authenticate the client.
As there is randomness in the data the server returns, a single sign in chance is made, and because the cannot return a valid challenge response without the password, it is safer than MTProto’s handshake.
New method: probing clients
Many protocol only authenticates that a client has the authority to access something, but the client doesn’t authenticate that the server is actually legit.
Hence, the server can spoof server packets to test the reaction from clients, hence client probing, and see if a client is a legit browser or is it some sneaky program attempting to hide its true identity.
MTProto uses the same hmac algorithm in ServerHello, hence with won’t work with MTProto.
PoC: identifying Shadow TLS v2
Protocol: Github Docs
A firewall can randomly hijack a TCP connection. After receiving TLS ClientHello, it can hijack its traffic to the real server with the SNI in the packet. After the TLS handshake, the Shadow TLS client will inject the challenge response in the first 8 bytes in Application Data.
However, the server the client is communicating to isn’t a Shadow TLS server, but a normal TLS server. The handshake will be completed successfully, but after hmac is sent, as the encryption isn’t the one negotiated in the handshake, the server will throw an alert (Encrypted Application Data), and close the TCP connection using FIN or RST. Hence, the client can be easily identified as being a ShadowTLS client.
Often a proxy program will form multiple connections. Intercepting any one of these connections can result to a client being identified.
Recommended proxy protocols
- Hysteria, TUIC (QUIC)
QUIC currently isn’t blocked or restricted by the GFW.
Credits
Author: CoiaPrant
Translation: dayCat