Making An Engine 3: Networks Suck (Part 1)

Let's say you are making a multiplayer game, and because you live in the 21st century, this game needs to be played over the Internet. That means you need to write network code. This is a fairly lengthy topic, so part 1 is going to be a general overview, and part 2 is going to be how we solved these problems in Zero Sum Future.

tl;dr: This is hard. If you have a lot of money, it's considerably less hard, like most things in life.

Fundamentally, the problem is this: Your game needs to run on multiple computers, almost certainly with different hardware, and definitely out-of-sync with each other. Oh, and because the game instances are supposed to be the same instance, they need to agree on some fundamental game logic. Because people like playing computer games (shocker), the industry has converged on some set in stone solutions to some of these problems.

Your game MUST be multi-threaded to accommodate for the syncing issue. You can run a multiplayer game on a single thread – this is called a lockstep solution. The problem with lockstep implementations is that they tend to hang really badly when it comes to choppy connections. If the game logic requires the next packet from the host to proceed properly, and the packet doesn't arrive on schedule, it's very noticeable to the player. Which leads to:

Your game MUST be OK with losing some packets. A packet is a piece of data that you need to run the game logic: Player information, state of the game, speed of the bullets racing towards your character, so on. Game logic cannot depend on the ordered arrival of packets: Over an internet connection, a packet has to, at the very least, travel from point A to point B through a mystery number of routers, relays, switches, servers, and god knows what else (seriously, I don't think it's physically possible to track what kind of path a packet takes through the nightmare web of the internet). If a router on the way decides to be paperweight instead of a router while handling one of your packets, it's bye bye packet. This is called a dropped packet, and it happens a lot in games. If you're playing an MMO, for instance, and your character jumps back in time to where he was 3 seconds ago, you experience something called rubber-banding caused by packet loss.

Now, there are communication protocols that guarantee drop-less, ordered packet delivery. The problem is that all these protocols run a lossy, packet dropping protocol underneath to emulate the behavior expected: because of how the internet is put together (badly), you can't have a true lossless, ordered delivery system. So the network protocol will send confirmation packets, receive acknowledgment packets, and order the packets under the hood. There are issues with using protocols like this, but I'll get to that in a bit.

Your game MUST also be OK with running on any hardware, and hardware type cannot influence game logic. Most games run visual physics on graphics hardware, because GPUs are well built for that (which you can read more about HERE). Implementing physical simulations on varying GPU hardware tends to be... unpredictable, to say the least, so in similar setups on differing (or even the same) hardware, you'll end up with diverging simulations. For this reason, I can confidently say that there are no online games with a real emphasis on physics: Every example I can think of uses an approximation of some flavor that is easily computed on the CPU.

I can talk about GPU physics simulations forever, but there's a clever way I can demonstrate this point: In Blizzard's Overwatch, the game has a pretty decent physics system built in, but it has no impact on gameplay. This is because the physics are tracked client side only, because different computers run the simulations differently. Get together with a friend, and run Overwatch side-by-side, in the same game, and try to screw around with physics.

Finally, your game MUST have some coordinator server to setup games. Let's say that player A and B want to play a quick game of chess online. Chess is really simple to write logic for, and the network code is very simple. So you can whip up an implementation fairly easily. But how do these players KNOW who their opponent is? A coordinator server of some flavor is needed

So you are locked into all of these things, what can you choose in your network design? Well, the first major choice is the obvious: Peer-to-peer, or dedicated?

Dedicated Servers

Man, dedicated servers are dope. They are basically a computer that stands behind a commercial-grade internet hookup with the sole purpose of running game instances. There is no end to the pluses of using a dedicated server infrastructure for your game:

  1. Dedicated servers are the best choice for quality of service for clients. Because they are dedicated machines with dedicated commercial connections, they are the LEAST prone to packet loss. Less packet loss means a better experience for your clients.
     
  2. Dedicated servers are resistant to malicious usage by design: because they run the game, they have the final say in how the game logic resolves (in the industry, this is called authoritativeness). A player says that they have 7 billion more moneys on their client? It doesn't matter, because the server says they are broke, and server beats client. Building anti-cheat measures into servers is trivial compared to other solutions.
     
  3. Dedicated servers are very nice for analytics. Want to know what your players are doing to break your game? Run a logging module on your server package, crunch numbers. Easy. Running a competitive game on dedicated servers means you can have information about how games resolve and tweak balance accordingly. This is a WONDERFUL crutch as games get more and more complicated.
     
  4. Dedicated servers can pretend to be coordinators. If your product has a dedicated server infrastructure, all a client has to do to get game logic packets is to query a static IP address, and then they are in business. Also, dedicated servers are GREAT for version control: if a client tries to connect with an out-of-date version of the client, you just direct them to update their stuff. Easy. You could even do it in real time (Blizzard does this) if your network engineers have their heads screwed on straight.
     
  5. Dedicated servers can host a LOT: because they tend to be behind commercial connections with large upload speeds, they can hold instances with hundreds of connections per instance. That's how MMO towns and cities get populated.

Wow, great. So everyone should be using dedicated servers, right? Yes. Yes they should. So why don't they?

Money, son. Having a dedicated server network (in multiple countries, in all likelihood) and paying for the upkeep of these commercial connections with static IPs costs MONEY. It's a continuous expense, the longer you have a server infrastructure, the more you pay the people that operate it. Oh, the box died? Server equipment is professional grade stuff, and they can get eye-wateringly expensive. Outsourced your server maintenance to a third party? Yeah, their engineering department doesn't really exist, so your server doesn't function on the third Tuesday of every month. Oh, and they decided that they want more money, so they can in effect strong-arm you into paying them more and more to keep your game up.

But if you're EA, or Valve, or Riot, or Blizzard, or any other major name in the business with your server infrastructure, you have a large network of servers, and you can build and deploy multiplayer games all day long, and they will run like butter if your network engineers are doing their jobs. You're basically able to throw money at a problem to solve it in a way that takes a big load off of your development team. It must be great to be able to money away a technical problem.

We're not EA. Or Valve, or even Blizzard. We're 5 schmucks. We can't afford a network, we do not have funds to rent one, even for testing. We (well, I) have to do this the hard way for Zero Sum Future.

Peer-To-Peer

A peer-to-peer game runs exactly like a dedicated server setup: One machine is the authoritative host, and this machine sends game logic packets to all other machines. In a peer-to-peer system, however, the host is one of the players. In essence, a client doubles up as a host.

This is awful. There is no end to the negatives of writing code that uses a peer-to-peer implementation.

  1. I'm gonna lead with the worst of the lot: NAT hole punching. See, most consumer computers are behind network routers of some flavor. When you hook up several computers to a network router, the router will assign an internal IP address to these computers, something in the flavor of 127.0.0.xxx. This makes it relatively easy to run a peer-to-peer connection in a local network: just ask "hey, who are the computers here, and what are their addresses" to the router, and the router acts like the coordinator server. So far, no problem.

    However, all of these clients hooked up to the this router have all the same IP address to an external computer. To a machine outside this network, all these machines have the address of the router. So if an external machine wants to send a packet to machine A on the network, it needs to be able to tell the router "hey, this packet is meant for machine A, pls deliver". Most routers at this point will assume (correctly, might I add) that the external machine is probably sending a malicious packet, and drop it.
     
  2. Consumer connections are awful. A host player will have to send game logic packets to the clients, right? Except that consumer connections will have very sketchy up connections: That's one of the many, many ways ISPs cut corners and make oodles and oodles of money: up connection speeds are reserved for premium, largely endowed corporations with dedicated servers. So when a consumer tries to supply 2-3 other clients with game logic packets, they run out of bandwidth rapidly.
    Compounding the above factor, if the host's internet connection tanks because someone on their network is hogging the up bandwidth (seeding torrents, for example, will destroy connections because it's a lot of up traffic), the game gets worse for everyone – a problem avoided in dedicated server implementations.
     
  3. Host migration implementation has negative effects on the health of network programmers. If a host decides that it wants to binge Orange is the New Black on Netflix instead of actually hosting the game, what happens? Game can't end just because the host decided to call it quits – maybe the host got rushed early on, and ragequit in disgust. The rest of the clients need to:
              A) Decide among themselves who the new host is
              B) Reconstruct the game state among themselves
              C) Resume gameplay
    This is a lot of logic that needs to run on non-synced machines. Which makes this hazardous to my health.
     
  4. Peer to peer games are extra vulnerable to hacking. In a dedicated server implementation, the host is protected because no player has direct access to game logic. Peer-to-peer systems are vulnerable precisely for this reason: The host player has direct access to game logic. If they can modify system memory, they can hack the game. There are a lot of trickery that a programmer can do to protect the game logic, but at the end of the day, you're at the mercy of the smartest guy who wants to break your game.
     
  5. You still need a coordinator server, by the way. Ideally, this coordinator server would handle some of the NAT hole punching (something I'll talk about at length in part 2). Oh, and because we'd like to eventually balance our game, we'd also like our coordinator server to hold some sort of game logic dump we can analyze for balance reasons. Oh, do we have some sort of competitive ladder (which Zero Sum Future does) this coordinator server needs to do that also. So you STILL have to have a bit of the dedicated server functionality. Sadness.

So, peer-to-peer is awful. It's ugly, it's messy, and it's a nightmare to get working properly. But it is free. It doesn't matter if there are 600 players worldwide or 6 million – a peer to peer system will still function. There only has to be one coordinator server, and it doesn't have to be nearly as beefy as a dedicated server. By coding and designing intelligently, you as a game developer save oodles and oodles of money.

So it's hard, but we have to do it for Zero Sum Future. In part 2, I'll dive into how we're solving a lot of these problems. Until then, stay tuned.