This one is another technical post. Today, we're going to be talking about a modern particle system. With code from real programmers!

If you're not familiar with particle systems, there are two issues that every particle system eventually runs into. One is data transfer between the application and the GPU, and the other is physics. Zero Sum Future has very minimal particle physics, so this post is more about the former problem.

For a particle system to work, it needs to accomplish the following:

Particles need to be created. For each particle source in a scene, you need to create some number of particles per cycle.

Particles need to be simulated. If we're not ignoring physics, this is where that would resolve.

Particles need to die off. Particles can time out based on age, they can be destroyed because their source is no longer present, or perhaps you ran into your max particle limit.

Particles need to be drawn. This is fairly self-explanatory.

Clearly, the drawing part is the GPU-relegated part of the system. The rest is easily done on my CPU, perhaps with some form of multi-threading. While that is a fairly easy way to implement, it's not the most optimal – Every time you pass in simulation data into the GPU from main memory, you run into massive bandwidth issues. Not to mention that your graphics thread is probably choking from simulating the half a million particles I'm sure you'd like to draw.

Therefore, we'd ideally like to assign all of these tasks to the GPU somehow.

The easiest one is particle simulation – OpenGL lets us do some fairly funky stuff with shaders that allow us to manipulate large sets of data. This can either come in the form of a compute shader, or it can come in the form of some cleverly arranged vertex shaders.

The emission/deletion is a little trickier.

In this post, we're going to go over an implementation that is very heavily inspired by an Nvidia example of transform feedback particles (Which can be found HERE). This is perhaps not the most intelligent way of solving this problem. A series of compute shaders with atomic counters can also handle this problem fairly effectively. But the transform feedback is so much cooler, I had to show this version.

Now, without further ado.

**The Implementation**

We're going to start with our shader programs, then our back-end to see how they function/fit together. We have 3 shader stages: Emission, Simulation, and Draw.

Let's start with vertex shader for the emission stage:

```
#version 430
in vec4 a_PositionType;
in vec4 a_VelocityAge;
in vec4 a_AuxVecOne;
out vec4 vs_PositionType;
out vec4 vs_VelocityAge;
out vec4 vs_AuxVecOne;
void main()
{
vs_PositionType = a_PositionType;
vs_VelocityAge = a_VelocityAge;
vs_AuxVecOne = a_AuxVecOne;
};
```

For every particle source, we push in a position, a type enumeration, velocity, age, and a buffer vector that can define how that particle behaves. So if you have jet exhaust from a player-owned shuttle, we can color that exhaust to the player color. Very cool. But the real magic happens in the geometry stage:

```
#version 430
layout(points) in;
layout(points) out;
layout(max_vertices = 50) out;
in vec4 vs_PositionType[];
in vec4 vs_VelocityAge[];
in vec4 vs_AuxVecOne[];
out vec4 gs_PositionType;
out vec4 gs_VelocityAge;
out vec4 gs_AuxVecOne;
//particleEnums happens here
uniform float emitCount1 = 5000; //SUN_ALPHA
uniform float emitCount2 = 2000; //SUN_BETA
uniform float emitCount3 = 500; //SUN_GAMMA
uniform float deltaTime;
uniform highp float randomSeed1;
uniform highp float randomSeed2;
uniform highp float randomSeed3;
uniform float particleEmissionMultiplier;
#define HASHSCALE1 .1031
highp float hash12(vec2 p) {
vec3 p3 = fract(vec3(p.xyx) * HASHSCALE1);
p3 += dot(p3, p3.yzx + 19.19);
return 2.0*(fract((p3.x + p3.y) * p3.z) - 0.5);
}
void emitParticles(int type) {
int num;
float emitCount;
if(type == 1){
emitCount = emitCount1;
}
else if(type == 2){
emitCount = emitCount2;
}
else if(type == 3){
emitCount = emitCount3;
}
else{
emitCount = 0;
}
num = int(particleEmissionMultiplier * deltaTime * emitCount);
for (int i = 0; i < num; i++) {
float velocityMagnitude;
switch(type){
case(1): velocityMagnitude = 15.5f; break;
case(2): velocityMagnitude = 100.5f; break;
case(3): velocityMagnitude = 200.5f; break;
default: break;
}
float xfloat = hash12(vec2(randomSeed1, randomSeed3 * (i) ));
float yfloat = hash12(vec2(randomSeed1 * (i+1), randomSeed2));
float zfloat = hash12(vec2(randomSeed2 * (2*i+1), randomSeed3 * (2*i+1)));
vec3 positionVector = normalize(vec3(xfloat, yfloat, zfloat));
gs_VelocityAge.xyz = positionVector.xyz * velocityMagnitude;
gs_PositionType.xyz = vs_PositionType[0].xyz + (positionVector * 299.0f); //the sun position is hard coded in at the moment
gs_PositionType.w = 1.0;
gs_VelocityAge.w = 0.0;
gs_AuxVecOne = vec4(1.0, 1.0, 0.0, 0.0);
EmitVertex();
EndPrimitive();
}
}
void main() {
emitParticles(int(vs_PositionType[0].w));
}
```

This is the shader that is responsible for creating the particles. We emit a particle based on the time between shader executions (deltaTime), particle setting (particleEmissionMultiplier), and some hard-coded number based on the particle source enumeration. For the purpose of this demo, I have 3 particle sources for the sun, all originating from the surface of a 300 unit radius sphere at the origin.

There is one caveat: We can at most create 50 particles per source per shader execution: This limit is actually a hardware limit on the geometry shader. In the main program, I'll explain how we sidestep this silly issue. In the meantime, we simply capture the output of this shader program with transform feedback (thus, we don't need a fragment shader), and move on to simulation.

Here's the simulation vertex shader:

```
#version 430
in vec4 a_PositionType;
in vec4 a_VelocityAge;
in vec4 a_AuxVecOne;
out vec4 vs_PositionType;
out vec4 vs_VelocityAge;
out vec4 vs_AuxVecOne;
uniform float u_DeltaTime;
uniform highp float randomSeed1;
uniform highp float randomSeed2;
uniform highp float randomSeed3;
#define HASHSCALE1 .1031
highp float hash12(vec2 p) {
vec3 p3 = fract(vec3(p.xyx) * HASHSCALE1);
p3 += dot(p3, p3.yzx + 19.19);
return 2.0*(fract((p3.x + p3.y) * p3.z) - 0.5);
}
void main() {
vec3 myPosition = a_PositionType.xyz;
myPosition += u_DeltaTime * a_VelocityAge.xyz;
vs_PositionType.xyz = myPosition;
vs_PositionType.w = a_PositionType.w;
vec3 myVelocity = a_VelocityAge.xyz;
myVelocity = a_VelocityAge.xyz;
vs_VelocityAge.xyz = myVelocity;
vs_VelocityAge.w = a_VelocityAge.w + u_DeltaTime;
vs_AuxVecOne = a_AuxVecOne;
}
```

As you can see, all of the physics happens in the simulation shader. In this example, I simply compute the natural movement of the particles absent a force, but that can easily be changed, even based on particle enumeration. We also advance the age of the particles – you'll see what that does in the geometry stage of the simulation:

```
#version 430
layout(points) in;
layout(points) out;
layout(max_vertices = 1) out;
in vec4 vs_PositionType[];
in vec4 vs_VelocityAge[];
in vec4 vs_AuxVecOne[];
out vec4 gs_PositionType;
out vec4 gs_VelocityAge;
out vec4 gs_AuxVecOne;
uniform float u_ParticleLifetime = 5.0f;
void main() {
if (vs_VelocityAge[0].w < u_ParticleLifetime) {
gs_PositionType.w = vs_PositionType[0].w;
gs_VelocityAge.xyz = vs_VelocityAge[0].xyz;
gs_PositionType.xyz = vs_PositionType[0].xyz;
gs_VelocityAge.w = vs_VelocityAge[0].w;
gs_AuxVecOne = vs_AuxVecOne[0];
EmitVertex();
EndPrimitive();
}
}
```

This is by far my favorite bit of this method. It's so simple, but so smart. I wish I had originally thought of it. What we do simply exploit a quirk of geometry shaders – they can not only add to the geometry of a scene, but also remove it. We simply remove the particles that have run out of time. That's it. Thus the hardest part of the particle problem is solved.

After we simulate the particle, we capture that output with yet another transform feedback buffer. This means that we need at least 2 buffers in our method, but I'll get to that in a bit. Now that we've done that, we move on to the drawing part:

```
#version 430
layout (location = 0) in vec4 a_PositionType;
layout (location = 1) in vec4 a_VelocityAge;
layout (location = 2) in vec4 a_AuxVecOne;
layout (location = 0) out float g_Type;
layout (location = 1) out float g_Age;
layout (location = 2) out vec3 g_WorldPos;
layout (location = 3) out vec4 g_AuxVecOne;
layout (binding = 1, offset = 0) uniform atomic_uint atomicTestCounter;
uniform mat4 worldTransform;
uniform vec3 planetPosition;
void main() {
vec4 position = worldTransform * vec4(a_PositionType.xyz, 1.0);
gl_Position.xyz = position.xyz;
gl_Position.w = 1.0;
g_WorldPos = a_PositionType.xyz;
atomicCounterIncrement(atomicTestCounter);
g_Type = a_PositionType.w;
g_Age = a_VelocityAge.w;
g_AuxVecOne = a_AuxVecOne;
}
```

The vertex shader is not interesting. It's mostly a pass-through shader. The atomic counter is there if we ever want to know how many particles are being drawn at once, which is kinda neat if you want to brag on your blog about how many particles your system can handle, but otherwise not needed. The real magic happens in the geometry shader:

```
#version 430 core
layout(points) in;
layout(triangle_strip) out;
layout(max_vertices = 4) out;
//inputs
layout (location = 0) in float g_Type[];
layout (location = 1) in float g_Age[];
layout (location = 2) in vec3 g_WorldPos[];
layout (location = 3) in vec4 g_AuxVecOne[];
//outputs
layout (location = 0) out flat int f_Type;
layout (location = 1) out float f_Age;
layout (location = 2) out vec2 f_gCoord;
layout (location = 3) out vec3 f_gWorldPos;
layout (location = 4) out vec4 f_AuxVecOne;
uniform mat4 projection;
uniform mat4 view;
uniform vec3 vQuad1;
uniform vec3 vQuad2;
void main() {
float particleSize = 8.0f;
int particleType = int(g_Type[0]);
vec3 vPosOld = gl_in[0].gl_Position.xyz;
mat4 mVP = projection * view;
vec3 vPos = vPosOld+(-vQuad1-vQuad2) * particleSize;
gl_Position = mVP*vec4(vPos, 1.0);
f_Type = particleType;
f_Age = g_Age[0];
f_gCoord = vec2(-1,-1);
f_gWorldPos = g_WorldPos[0];
f_AuxVecOne = g_AuxVecOne[0];
EmitVertex();
vPos = vPosOld+(-vQuad1+vQuad2) * particleSize;
gl_Position = mVP*vec4(vPos, 1.0);
f_Type = particleType;
f_Age = g_Age[0];
f_gCoord = vec2(-1,1);
f_gWorldPos = g_WorldPos[0];
f_AuxVecOne = g_AuxVecOne[0];
EmitVertex();
vPos = vPosOld+(vQuad1-vQuad2) * particleSize;
gl_Position = mVP*vec4(vPos, 1.0);
f_Type = particleType;
f_Age = g_Age[0];
f_gCoord = vec2(1,-1);
f_gWorldPos = g_WorldPos[0];
f_AuxVecOne = g_AuxVecOne[0];
EmitVertex();
vPos = vPosOld+(vQuad1+vQuad2) * particleSize;
gl_Position = mVP*vec4(vPos, 1.0);
f_Type = particleType;
f_Age = g_Age[0];
f_gCoord = vec2(1,1);
f_gWorldPos = g_WorldPos[0];
f_AuxVecOne = g_AuxVecOne[0];
EmitVertex();
EndPrimitive();
}
```

The geometry shader for the draw stage takes in 1 vertex for each particle (that contains position, type, velocity, age information) and then outputs 4 primitives. If you're a graphics programming veteran, you'll know why this is – particle systems draw billboards, not actually 3 dimensional objects. To do this, we need 2 vectors that define the billboard – vQuad1 and vQuad2. I compute this on the CPU side and then push it through as a uniform. The rest of it is basically a pass-through shader, with the exception of the f_gCoord variable. That is the texture coordinates for the billboard, so if we want to put a texture on our particle, we do it using those coordinates.

If you're a more advanced user, you could add in motion distortion to the billboard based on the velocity vector of the particle, but that's not needed for our application.

Finally, we get to the draw fragment shader.

```
#version 430
layout (location = 0) in flat int f_Type;
layout (location = 1) in float f_Age;
layout (location = 2) in vec2 f_gCoord;
layout (location = 3) in vec3 f_gWorldPos;
layout (location = 4) in vec4 f_AuxVecOne;
layout (location = 0) out vec4 gAlbedoSpec;
layout (binding = 0) uniform sampler2D depthTexture;
uniform float sunParticleLifeTime = 5.0;
float linearize_depth(float d,float zNear,float zFar) {
float z_n = 2.0 * d - 1.0;
return 2.0 * zNear * zFar / (zFar + zNear - z_n * (zFar - zNear));
}
uniform int screenWidth;
uniform int screenHeight;
void main(){
float particleAlpha = 1.0;
float softnessDueToAge = 1.0;
vec2 coords;
coords.x = (gl_FragCoord.x / screenWidth); // this was the demon tweak
coords.y = (gl_FragCoord.y / screenHeight);
float sceneDepth = texture(depthTexture, coords.xy).w;
float particleDepth = linearize_depth(gl_FragCoord.z, 1.0, 4000.0f);
int particleType = int(f_Type);
if(sceneDepth < particleDepth){
discard;
}
float d = abs(sceneDepth - particleDepth);
float depthSoftness = d / 50.0f;
depthSoftness = clamp(depthSoftness, 0.0, 1.0);
vec3 particleColor;
switch(particleType){
case(1): particleColor = vec3(1.0, 0.0, 0.0); break;
case(2): particleColor = vec3(0.0, 1.0, 0.0); break;
case(3): particleColor = vec3(0.0, 0.0, 1.0); break;
default: particleColor = vec3(0.0, 1.0, 1.0); break;
}
if((sunParticleLifeTime - f_Age) < 2.5 ){
softnessDueToAge = (sunParticleLifeTime - f_Age) / 2.5;
}
vec2 st = f_gCoord;
float pct = 1.0;
pct = 1.2 - distance(st , vec2(0.0));
pct = clamp(pct, 0.0, 1.0);
particleAlpha = pct;
if (particleAlpha < 0.1){
discard;
}
gAlbedoSpec.xyzw = vec4(particleColor, particleAlpha * softnessDueToAge * depthSoftness);
}
```

So, we finally draw it. The really complicated part of the drawing process is adjusting the alpha values of the fragments. We have 3 contributions:

Softness due to age. As the particle gets older, this value goes up. This makes older particles look smaller as time goes on.

Softness due to particle shape. I take a Gaussian sample over the billboard to get a rudimentary circle of a particle. You can sample a particle texture also.

Softness due to depth. Particles that appear on the surface of the sun will “pop” quite annoyingly, and look very artificial. To overcome this problem, we introduce a softness on the billboard based on the depth of the underlying fragment, gotten from the primary rendering loop. We linearize the depth to avoid z-fighting issues.

Finally, to look cool for the demo screenshot, I paint the particles absurdly garish colors.

That's the shaders. The important parts of the CPU side program is up next, but I'll be cutting out the bits you can find on Nvidia's sample, mostly because this post is dragging on and I really, really want to get to the cool screenshot at the end.

First, the emission stage:

```
glBeginTransformFeedback(GL_POINTS);
int cycleCount = 0;
float papaRandom = randomFloat01();
float randomSeed1, randomSeed2, randomSeed3;
while (totalTime > 0){
float delta;
if (totalTime < 0.004){
delta = totalTime;
totalTime = 0;
}
else{
totalTime -= 0.004;
delta = 0.004;
}
if (cycleCount == 0){
randomSeed1 = papaRandom;
randomSeed2 = randomFloat01FromSeed(randomSeed1);
randomSeed3 = randomFloat01FromSeed(randomSeed2);
}
else{
randomSeed1 = randomFloat01FromSeed(randomSeed3 * cycleCount);
randomSeed2 = randomFloat01FromSeed(randomSeed1 * cycleCount);
randomSeed3 = randomFloat01FromSeed(randomSeed2 * cycleCount);
}
particleEmitterShader.setUniform("deltaTime", FLOAT1, 1, &delta);
particleEmitterShader.setUniform("randomSeed1", FLOAT1, 1, &randomSeed1);
particleEmitterShader.setUniform("randomSeed2", FLOAT1, 1, &randomSeed2);
particleEmitterShader.setUniform("randomSeed3", FLOAT1, 1, &randomSeed3);
glDrawArrays(GL_POINTS, 0, m_emittersCount);
cycleCount++;
}
```

There are 2 important pieces here: first, we slice up the chunk of time of the update into 4 millisecond pieces. We do this, because if any of our particle sources emit more than 50 particles in a given update cycle, we won't be able to capture those. Finally, I do some crazy dancing with some custom random functions to introduce some noise into the shader. If you are also creating your own rendering engine, never ever use the in-built random functions of GLSL. One, they are deprecated. Two, I never could get them to work reliably on multiple systems.

The only other thing I want to show before we move on to the good stuff is the computation of the billboard quads. I'm only including them here because it took me way to long to find them in the first place, and if anyone else wants to do what I do, they really should save themselves the grief:

```
glm::mat4 projection = camera->projectionMatrix;
glm::mat4 view = camera->GetViewMatrix();
glm::mat4 worldTransform = inverseworldTransform;
glm::vec3 quad1;
glm::vec3 quad2;
glm::vec3 front = camera->Front;
glm::vec3 up = camera->Up;
quad1 = glm::cross(glm::normalize(front), up);
quad1 = glm::normalize(quad1);
quad2 = glm::cross(glm::normalize(front), quad1);
quad2 = glm::normalize(quad2);
```

And now with that outta the way, let's look at a very garish sun in Zero Sum Future's new map editor:

Cool, right? You might be wondering – how many particles is that? It's roughly 20000, computed with that atomic counter I mentioned earlier. The really cool part is the performance impact: On my system, at least, I can barely detect a few frames lost while running at around 200fps. Because all the particles live on the GPU, we basically have no CPU overhead for particles. This means that on proper graphics hardware, we scale perfectly with more particle sources/higher emission rates.

Now, there are drawbacks. If you want particles with collision physics for whatever reason, you're going to have to be awfully creative with your simulation stage. It's also not very intuitive – when pushing in particle data for the emission stage and tracking each particle through the transform feedbacks, you MUST use particle descriptions that are measured in discrete vec4s. Some systems will be okay with odd numbers of variables with transform feedbacks, but I've had bad experiences when I tried to abandon the multiples of 4. Do yourself a favor and stick with that.

You can also achieve a very similar pipeline using compute shaders and atomic counters. That might be a lot more intuitive, but I just think the transform feedback method is a lot cooler.

I hope you enjoyed that lengthy post, and I hope it was of some use. Until next time,

-WhatGerzery