Making An Engine 4: Networks Suck (Part 2)
If you haven't read the first part, I highly recommend it.
Today comes the long awaited sequel to the networking series, where I talk about how we're solving most of the awful problems that accompany peer-to-peer games for Zero Sum Future. I'm also going to talk about some as-yet unsolved problems that have hypothetical solutions – that way, I can offer some insight into the thinking processes from behind the curtain.
Without further ado, let's begin.
Problem 1: NAT hole punching and coordinator server
This is the one I spent the greatest amount of time on. It's also the one with the laziest possible solution.
Quick recap: NAT hole punching is a procedure by which you establish the first connection to a computer behind a router. Until this first connection is established by some means (i.e. the hole is punched), the router will drop all packets destined for the client. The standard process for NAT hole punching uses a coordinator server to ferry the initial messages between the two clients, so solving one of these problems progresses you on the other.
I'm not going to go into the entire traversal protocol, mostly because I don't understand it. See, while I found it very easy (trivial, even) to write a basic networking code with Boost's ASIO library, I never could get the hole punching part working, even in highly controlled environments.
I also never could setup a server that would work for commercial purposes – I don't have the money, or the time, to setup a coordinator server in a rack offsite somewhere. For testing, a laptop running 5 feet away from my main box worked well enough, but it was patently obvious that such a setup would not be acceptable for a commercial product.
We ended up outsourcing both of these problems.
If you are intending to release your game on Steam (which Zero Sum Future will be, stay tuned!) then you can use Valve's network infrastructure to handle both of these issues, and it's a godsend. Your application simply needs Steam running and logged in with a valid account to make use of the ginormous Valve network. As far as I'm aware, you can't use Steam servers as dedicated servers without a special arrangement with Valve (let me know if it's otherwise), but you CAN use them as coordinator servers. These servers will also automatically handle the hole punching for you.
I cannot praise Steamworks enough – not only is the service great, but the documentation is the best I've ever seen for any API. While this is a great resource (https://partner.steamgames.com/doc/api), the real treasure is Spacewar, an open-source game that showcases the capabilities of the Steamworks Development Kit. If you're looking into networking for games, I highly recommend browsing the source code here.
That's two of the biggest problems taken care of. Now, let's talk about concurrency and updating client state.
Problem 2: Client Updates
In a peer-to-peer setup, the host needs to do two things: One is to update the game-state of the clients, and the other is to process their inputs to influence said game states. Let's tackle these one at a time.
First, updating the game state. This is the easier one – simply transcribe the entire affair into a string container, and then send it over to each client in a list. For the transcription part, Boost's serialization library is a pretty good solution, and what we ended up using for Zero Sum Future. Here's a tip – smaller packets tend to get dropped less, so if you can chop up your game state into discrete bits, you'll get better connections. I ended up sending each unit as its own packet, which seems to work well enough.
For player inputs, you do the exact same thing in the opposite order – whenever a client does something that impacts (or has the potential to impact) the actual game state, we transcribe that input into a string container, send it off to the host, un-transcribe it, and then process asynchronously. Easy, right?
Well, there's an issue here: Let's say that a client wants to put a building on a cliff face. He obviously is barred from doing so by the game logic, so he should get some kind of negative feedback. Some kind of buzzer, and maybe a message reading no, you can't put that there, you dunce.
But the game logic is running elsewhere. The input for the build command needs to go to the host, the host needs to determine the negative feedback appropriate, and then send it over as a game state update. But if that packet gets dropped, then the feedback is gone, and the client's experience is markedly worse than the host's.
The solution is to process player input twice – once on the client side, and once on the host side. The pre-process of commands on the client side is responsible for the feedback – so for the above example, the pre-process stage determines the invalidity of the build command by referencing the game state currently available to the client, drops that command (so there is less traffic), and provides the negative feedback.
This might seem really blindingly obvious, but this idea of running the game logic in a limited fashion on the client side extends to solve a large number of other problems. Suppose, for instance, Player A and B are playing a game with A as the host and B as the client. If A builds a factory on within line of sight of player B, B needs to see the feedback of that building being built: the sound and the animation. Problem is, if you send the entirety of the game state, your animation will arrive choppily to player B: B cannot depend on animation state updates over the network, the animation needs to run on B independently from game state.
So you run the animation/graphical component logic on B: When the class responsible for the management of units receives the information that there is a new building, it should know to start with a build animation. This might seem bothersome at first: After all, B will see the animation starting and completing while the building might be in a working state! But the visual experience for B will be much better – less accurate, maybe, but much smoother. For a simulation game like ours, this is much more important than raw accuracy. If we were developing a hardcore shooter or a fighting game, it might be different.
Problem 3: Host Migration
This is where it gets tricky. Some parts of host migration simplify when we start using Valve's network. Valve has a system of lobbies that you interface with to coordinate games – so it is trivial to assign host status to the first player in the lobby, the so-called owner. So, for our networking system, we query the owner of the lobby every cycle. If it is different than the host written to local memory, we can determine the next host player. The new host then sends a “hi guys I am new host now plox" packet to every other player. Since we are smart and we made our management classes have host and client modes, we simply throw a switch, and now our client turned host is running the game logic and dispensing packets.
So far, so good. But the real issue comes from reconstructing game data.
See, there is no reason why you'd ever send the player information of one player to a different one. Player A should never have access to the inventory of Player B (assuming neither are hosts). You'd waste bandwidth, make your code more complicated, and introduce a serious security risk. Run a packet sniffer on the client, and you can cheat to your heart's content by accessing information that you should not have access to.
So clients must only be sent game state info that they have access to. This takes quite a bit of gameplay logic, and showcases what I truly despise about network programming: Everything that supports the multiplayer component has to be tailor-made to the game to work properly, and there's not a lot of room for recycling code. But let's assume that all of that is a given.
There still is the issue with assigning host status to client. Because as host, the program DOES need info not available to itself by default. To do this, we use the initial "hi guys I am new host now plox" packet. When a client receives this packet, it sends (using a reliable protocol, thankfully Steamworks allows us to do this) the unique game state data available to itself, and the game resumes as soon as all other clients have "reported in", as the case may be.
If you're observant, you'll note that I'm glossing over reconnecting protocols. If host drops, it's curtains for his game state because no one will have access to his/her unique data. The end result will probably a periodic dispensation of the host unique data to the client in some sort of encrypted way – but I have no idea how I'm going to implement that yet.
Speaking of more parts of an already run-on blog...
Problem 4: Cheaters
This is one area that I haven't attacked yet. Steamworks has built-in anti-cheat systems, but my area of expertise is computation and graphics, and I don't really have the patience for Crypto work.
But I will share my ideas with you regardless (mostly so that my wonderful audience can keep telling me how WRONG I am about things). In Zero Sum Future, the worst kinds of the cheating are going to be a) accessing info that you shouldn't have access to, and b) modifying host memory to alter game state.
I presume (and I'm sure I'll be corrected) that the only real defense against the latter sort of host exploitation attack is to use Steamwork's built-in system. There is only so much obfuscation someone can do to stop local memory from being fidgeted with. For the former, we simply do not send players game state info they do not have access to, and if we do send it, we use some flavor of rapid encryption to make real-time packet sniffing worthless.
So that's how we handle most of the network-related nonsense in Zero Sum Future. This networking series is going to have a 3rd installment sometime in the future, when I get around to addressing some of the hanging questions. Until that time, expect more graphics-related content up here.