Developer's Blog

The Dev Blog for the upcoming game Zero Sum Future, where our lead programmer walks through engine design, game design, and more!

Atomic Bloom

We're only weeks away from early access release, and I chose now to experiment with bloom. I think the results are very pleasing.

Bloom is a visual artifact that gives the effect of an imperfect viewing apparatus. Basically, bright light sources “bleed” into their environments, looking very bright and cinematic. A great explanation (And perhaps the best vanilla method you'll find on the internet) can be found HERE. I'll be using this example as a starting point, so if you're not familiar with bloom, I'd highly recommend reading it.

The basic bloom algorithm is very simple. You:

  1. Note down which texels are “emissive” and what color they emit.

  2. Pass over that with a bloom shader, where the shader samples all texels within a certain distance of that texel, adding up the total bloom amount.

  3. Apply a blur filter over the whole affair to smooth out artifacts.

This is a very common algorithm that is present in most modern engines. Bloom looks great, is relatively cheap when done intelligently, and adds a lot to most graphics applications.

So, where can you improve?

In the traditional bloom algorithm, the bloom computation starts at the receiver texel, and then samples in a predetermined radius. The higher the radius, the wider each blooming texel bleeds. So if you have bright lights that you'd want to bleed all over the place, you'd set that radius to be a relatively higher value.

But what if you have an admixture of lights?

This is a deviously hard problem: You can't pick and choose your radius based on any information in the shader, at least not accurately. You can pick a radius that'll work for most sources in your scene (Which, to my knowledge, is what most people do), but other than that you can't have a light bloom out of control. To address this, we introduce the bloom radius into the fragment shader by way of an atomic texture.

The way the algorithm works is pretty straightforward, we:

  1. Note which texels are emissive and what color they emit.

  2. Perform an atomic bloom pass: For each texel with a bloom source, atomically write the value of that texel's bleeding radius to nearby texels, favoring the highest radius.

  3. Perform the bloom pass, using the bleeding radius information written to the atomic texture.

  4. Apply a Gaussian blur over the whole affair to smooth out artifacts.

As you can see, it's very similar to the original implementation. The only really different bit is the atomic pass, which is just one shader.


The atomic shader is the heart and soul of this whole affair, so let's start here:

in vec2 TexCoords;
layout (binding = 0) uniform sampler2D bloomInput;
uniform int SCR_HEIGHT;
uniform int SCR_WIDTH;

layout(r32i, binding = 1) uniform iimage2D atomicOutputTex;

The fragment shader is a whole-screen quad, so we start with the relative coordinates for each texel. We also push in our screen's width and height so we can translate the device coordinates into texel coordinates image load store can use. Finally, we bind the texture we intend to use for atomic writes. OpenGL mandates that you use a r32i or r32ui texture for atomic operations, so we pick the integer texture. Note that the iimage2D is NOT a typo, if you want to use a non-floating point texture for image load store operations, you have to use iimage2D or uimage2D. The official wiki doesn't mention this anywhere, so I had a good 2 hours chasing that shader error.

float bloomMagnitude = (texture(bloomInput, TexCoords).a * sampleCount);

ivec2 coords = ivec2(TexCoords * vec2(SCR_WIDTH, SCR_HEIGHT));

int bloomAmount = int(bloomMagnitude);

if(bloomMagnitude > 0.0){
	ivec2 offsetY = ivec2(0, 1) * bloomAmount;
	ivec2 offsetX = ivec2(1, 0) * bloomAmount;

	int plusXVal = imageAtomicCompSwap(atomicOutputTex, (coords + offsetX), 999, bloomAmount);
	int plusYVal = imageAtomicCompSwap(atomicOutputTex, (coords + offsetY), 999, bloomAmount);
	int minusXVal = imageAtomicCompSwap(atomicOutputTex, (coords - offsetX), 999, bloomAmount);
	int minusYVal = imageAtomicCompSwap(atomicOutputTex, (coords - offsetY), 999, bloomAmount);
	if(plusXVal < bloomAmount && plusXVal != 999){		
		imageAtomicCompSwap(atomicOutputTex, (coords + offsetX), plusXVal, bloomAmount);
	if(plusYVal < bloomAmount && plusYVal != 999){		
		imageAtomicCompSwap(atomicOutputTex, (coords + offsetY), plusYVal, bloomAmount);

	if(minusXVal < bloomAmount && minusXVal != 999){		
		imageAtomicCompSwap(atomicOutputTex, (coords - offsetX), minusXVal, bloomAmount);

	if(minusYVal < bloomAmount && minusYVal != 999){		
		imageAtomicCompSwap(atomicOutputTex, (coords - offsetY), minusYVal, bloomAmount);

Now the actual atomic operation: First, we construct our image coordinates. Next, we query if the texel the shader is executing for is emissive. If not, we do nothing. If it is, we construct offset vectors based on the amount of emissivity: If the fragment has an emissivity value of 1, we offset by 1 texel, if 2, we offset by 2, and so on.

OpenGL doesn't let us have a lot of options for atomic operations. We can atomically add, we can use bitwise logic operations, and we can apply min/max operations. Ideally, we'd like to have an atomic operation that compares to the value of the texel written to, and if it is less than our bloom magnitude, we'd overwrite it. But we can't do that.

So we do the next best thing: We populate our atomic texture with a dummy amount: I chose 999 for my application, because no light source is ever gonna get close to that amount. Next, we use imageAtomicCompSwap. This operation swaps the value of the texel with the last variable if the 3rd variable is equal to the texel being written to. Then, it always returns the original value of the target texel. Using this operation, we construct our makeshift atomic if-less-than statement.

With that, the atomic bloom texture is complete. Next, let's take a quick look at how we apply that in the bloom shader proper:

#version 430 core

layout(location = 0) out vec4 f_Bloom;

in vec2 TexCoords;

uniform sampler2D BloomInput;
uniform int horizontal;
uniform int SCR_WIDTH;
uniform int SCR_HEIGHT;

layout(r32i, binding = 1) uniform iimage2D atomicBloomTex;

void main(){              
	ivec2 atomicCoords = ivec2(TexCoords * vec2(SCR_WIDTH, SCR_HEIGHT));

	int bloomPower = imageLoad(atomicBloomTex, atomicCoords).x;

	float BloomTexOffset;

	if(bloomPower == 999){
		BloomTexOffset = 0;
		BloomTexOffset = (float(bloomPower) / 5.0f);
	vec3 bloomResult = texture(BloomInput, TexCoords).rgb * 0.5;
	vec2 texOffsetBloom = BloomTexOffset / textureSize(BloomInput, 0);
	if(bloomPower > 0){
		float bloomWeight = 0.25;
		if(horizontal == 1) {
			for(int i = 1; i < sampleCount; ++i) {
				bloomResult += (texture(BloomInput, TexCoords +
                			vec2(texOffsetBloom.x * i, 0.0)).rgb *
				bloomResult += (texture(BloomInput, TexCoords -
                			vec2(texOffsetBloom.x * i, 0.0)).rgb *
				bloomWeight *= 0.5;

		 else {
			 for(int i = 1; i < sampleCount; ++i){
				bloomResult += (texture(BloomInput, TexCoords +
                			vec2(0.0, texOffsetBloom.y * i)).rgb *
				bloomResult += (texture(BloomInput, TexCoords -
                			vec2(0.0, texOffsetBloom.y * i)).rgb *
				bloomWeight *= 0.5;
	 } = bloomResult;

If you bothered to look at the tutorial linked early on in this article, you'll note that this is eerily similar to it. The only real diffierence is the texOffsetBloom variable, and how it dynamically changes based on the atomic texture value.

After these two passes, I run the bloom through another gaussian blur (to taste), and then use it in my lighting shader.

So, does it work? Well, allow me to plug our current project, Zero Sum Future:

Yes. Yes it does. Note how the bloom pulses through the edge of the planet over time. A traditional bloom implementation can't do that easily, as far as I can work out. With the final blurring, it's really hard to spot any artifacts.

One last thing I ought to address: The performance of this method isn't terrible, primarily because in most scenes you won't get multiple bloom sources with differing emissivity right next to one another. If you do, this method might run into issues. But besides that, I could not detect a significant drop in performance compared to a standard implementation.

If you are planning on implementing bloom anytime soon, or if you're not happy with how bright and shiny your lights are, give this method a go! If you do, drop us a like or follow on social media so I know to keep writing posts like this one.