From fcaf022c43cb6c435082744d23cd490ba0309cc9 Mon Sep 17 00:00:00 2001 From: Shi Yan Date: Sat, 27 Jul 2024 15:05:22 -0500 Subject: [PATCH] add openable code block --- Advanced/mega_texture.html | 22 +++++++++---------- Advanced/skeleton_animation.html | 12 +++++----- Advanced/toon_shading.html | 8 +++---- Advanced/transparency_with_depth_peeling.html | 16 +++++++------- Basics/applying_hardcoded_vertex_colors.html | 4 ++-- Basics/creating_an_empty_canvas.html | 8 +++---- ...colored_triangle_with_a_single_buffer.html | 8 +++---- Basics/drawing_a_triangle.html | 12 +++++----- ...wing_a_triangle_with_defined_vertices.html | 12 +++++----- Basics/understanding_uniforms.html | 12 +++++----- Basics/using_different_vertex_colors.html | 8 +++---- style.css | 1 + 12 files changed, 62 insertions(+), 61 deletions(-) diff --git a/Advanced/mega_texture.html b/Advanced/mega_texture.html index f095e7d..f4c30e5 100644 --- a/Advanced/mega_texture.html +++ b/Advanced/mega_texture.html @@ -180,7 +180,7 @@

5.4 Mega Texture

mega texture is also called the virtualized texture cropped_image = resized_image.crop((x*256-padding, y*256-padding, (x+1)*256+padding,(y+1)* 256+padding)) # Save the tile into a png file. cropped_image.save('../crab_nebula/crab_'+str(lv)+'_'+str(y)+'_'+str(x)+'.png') -

5_04_mega_texture/preprocess.py:1-24 Tile Preprocessor

notice that for easy loading in the code, we name our tiles using this pattern crab_<level>_<index y>_<index x>. level is the detail level, y is the vertical index of all tiles of the same level, x is the horizontal index.

notice that I have added a padding to all tiles, i.e. the actual size is 4 pixels larger than 256 along each dimension. I will explain the necessity later.

Image Show Some Tile Samples of Different Levels
Image Show Some Tile Samples of Different Levels

first, let's configure some constants. here is the relevant code:

const imageWidth = 10752;
+

notice that for easy loading in the code, we name our tiles using this pattern crab_<level>_<index y>_<index x>. level is the detail level, y is the vertical index of all tiles of the same level, x is the horizontal index.

notice that I have added a padding to all tiles, i.e. the actual size is 4 pixels larger than 256 along each dimension. I will explain the necessity later.

Image Show Some Tile Samples of Different Levels
Image Show Some Tile Samples of Different Levels

first, let's configure some constants. here is the relevant code:

const imageWidth = 10752;
 const imageHeight = 9216;
 const tileSizeWithoutPadding = 256;
 const textureSizeWithoutPadding = 2048;
@@ -206,7 +206,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture for (let i = 0; i < levelCount; ++i) { overallTileCount += levelTileCount[i * 4] * levelTileCount[i * 4 + 1]; } -

5_04_mega_texture/index.html:155-186 Setup Constants

Here imageWidth and imageHeight hardcode the raw image size. textureSizeWithoutPadding holds the texture map we will use to hold visible tiles. maxVisibleTileCountOnTexture is the maximum possible visible tiles, this number is limited by the size of the texture map.

levelCount is the total number of levels. This number is the count we need to halve a tile of size 256 down to a single pixel. tileH and tileV are the horizontal and vertical tile counts on the raw image.

levelTileCount is an array recording the horizontal and vertical tile counts for all levels, plus each level's tile size on the original image. overallTileCount is the sum of the tile counts of all levels.

function keyToLevel(key) {
+

Here imageWidth and imageHeight hardcode the raw image size. textureSizeWithoutPadding holds the texture map we will use to hold visible tiles. maxVisibleTileCountOnTexture is the maximum possible visible tiles, this number is limited by the size of the texture map.

levelCount is the total number of levels. This number is the count we need to halve a tile of size 256 down to a single pixel. tileH and tileV are the horizontal and vertical tile counts on the raw image.

levelTileCount is an array recording the horizontal and vertical tile counts for all levels, plus each level's tile size on the original image. overallTileCount is the sum of the tile counts of all levels.

function keyToLevel(key) {
     let keyRemain = key;
     let level = 0;
 
@@ -230,7 +230,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture return y * tileH + x + base; } -

5_04_mega_texture/index.html:187-210 Tile Coordinates and Key Conversion

in our program, we will use a key value store to function like our tile cache. we need to define two function to serialize a tile's info, including a tile's horizontal and vertical coordinates and level, into a key, and conversely, we also need a function to convert a key to recover the corresponding tile info.

the key used in our program is a single index that uniquely identify a tile. if we view all the tiles on all levels as a pyramid, the way we assign an index to each tile is that we stretch the pyramid into a single string, starting from level 0 to level 8. on certain level, a tile at the location x,y (we will call them the level coordinates in the following text.) has its id calculated by tileH * y + x + the count of all tiles from levels under it.

similarly, given a key, we can recover the corresponding tile's x,y and level.

the first task we need to perform is visibility test. in the first pass of rendering, we want to determine which tiles are actually visible. let's look at the visibility test shader:

@group(0) @binding(0)
+

in our program, we will use a key value store to function like our tile cache. we need to define two function to serialize a tile's info, including a tile's horizontal and vertical coordinates and level, into a key, and conversely, we also need a function to convert a key to recover the corresponding tile info.

the key used in our program is a single index that uniquely identify a tile. if we view all the tiles on all levels as a pyramid, the way we assign an index to each tile is that we stretch the pyramid into a single string, starting from level 0 to level 8. on certain level, a tile at the location x,y (we will call them the level coordinates in the following text.) has its id calculated by tileH * y + x + the count of all tiles from levels under it.

similarly, given a key, we can recover the corresponding tile's x,y and level.

the first task we need to perform is visibility test. in the first pass of rendering, we want to determine which tiles are actually visible. let's look at the visibility test shader:

@group(0) @binding(0)
 var<uniform> transform: mat4x4<f32>;
 @group(0) @binding(1)
 var<uniform> projection: mat4x4<f32>;
@@ -284,7 +284,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture discard; return vec4(1.0,0.0,0.0,1.0); } -

5_04_mega_texture/index.html:32-85 Visibility Shader

The vertex shader is relatively simple to explain. Here we have two inputs, inPos one of the vertices of a 256x256 tile positioned at the origin and the tile's texture coordinates. loc is the tile's level coordinates, i.e. tileH and tileV. What the vertex shader does is applying an offset to the tile's vertices based on tileH and tileV, so that the tile is positioned correctly.

The vertex shader passes three types of information to the fragment shader. the clip position, the texture coordinates and the current tile's level coordinates. notice that, we have a @interpolate(flat) decoration, because the tile coordinates are integers, we don't want the graphics pipeline performs any interpolation on them.

what bears for more explanation is the fragment shader. What this shader does is actually exactly the same as the built-in function textureSample does. The reason we have to manually implement this function is that we want to explicitly get the texture level.

an advanced concept of this shader is the derivative functions dpdx and dpdy. they are functions to measure how fast a value p changes along the x and y axis. for example, dpdx(in.tex * tileSizeWithoutPadding) means how fast the value in.tex * tileSizeWithoutPadding between two horizontal nearby fragments. as we know, the way fragment shaders work is identical to the compute shader. at the same time, there are many invocations of the fragment shader executing in parallel processing different fragments. what's conter-intuitive about these derivative functions is that only looking at a fragment alone, we can't measure the how fast the value changes. the current fragment execution needs to work with other fragment shader executions to return the value, i.e. there is thread synchronization involved in the process. previously we have learned that when thread synchronization is involved, it is required that the relevant code is in a uniform control flow. same requirements are needed for the derivative functions too.

in the chapter about mipmaps, we haven't answered the question why the textureSample function has to be in uniform control flow. now the answer should be clear, because internally textureSample also rely on the derivative functions to obtain the texture level.

but we haven't explained why measuring how fast in.tex * tileSizeWithoutPadding changes can help us get the texture level. as we can see, the in.tex is the texture coordinates, its value span from 0.0 to 1.0. and the tileSizeWithoutPadding is 256, hence tileSizeWithoutPadding times the texture coordinates, we should be in the range of [0.0, 256.0].

no looking at a single 256x256 tile parallel to the screen space. if there is no zooming at all or zooming in, the value change of tileSizeWithoutPadding*in.tex across two horizontal adjacent fragments should be less or equal to 1.0. in this scenario, we want to choose the texture level 0 = max(log2(d),0) for sampling, because only the level 0 can give us the best rendering quality. no imagine if we zoom out, the 256x256 tile will be rendered on screen in a smaller size. in this case, two adjacent fragments will have a value change faster than 1.0. but the same principle applies, we can simply get the level by max(log2(d),0). in the most extreme case, if we zoom out enough, so much so that the entire 256x256 tile is smaller or equal to a single pixel, two adjacent fragments will have a derivative larger than 256. hence we will need to sample from the log2(256)=8 level.

dx and dy may change in different speed. in the shader, we don't consider the dx and dy separately, but picking the maximum change among the two.

once we have figured out the level. the next step is obtain the key in the same manner we have seen before. The first step is getting the level coordinates x and y. This is done by first calculating the position of the current fragment on the original image, then divide the position by the level tile size. the level coordinates can give us the index of the tile on that level. tile count of each level is passed in as level_tile_count, with it, we can count the number of tiles on all levels below. and together, we can get the unique tile id for the current fragment.

finally, we need to update the visibility table var visible_tiles: array;, this is a storage buffer we can write to. for each tile, we have an entry for it indexed by the tile id. for all fragments we actually render, we set the corresponding tile's entry to 1 indicating "visible". in the next step, we gether all visible tiles' information to assemble a single texture map to hold them for the actual rendering.

now let's look at the corresponding javascript code that works with this shader.

const positionAttribDesc = {
+

The vertex shader is relatively simple to explain. Here we have two inputs, inPos one of the vertices of a 256x256 tile positioned at the origin and the tile's texture coordinates. loc is the tile's level coordinates, i.e. tileH and tileV. What the vertex shader does is applying an offset to the tile's vertices based on tileH and tileV, so that the tile is positioned correctly.

The vertex shader passes three types of information to the fragment shader. the clip position, the texture coordinates and the current tile's level coordinates. notice that, we have a @interpolate(flat) decoration, because the tile coordinates are integers, we don't want the graphics pipeline performs any interpolation on them.

what bears for more explanation is the fragment shader. What this shader does is actually exactly the same as the built-in function textureSample does. The reason we have to manually implement this function is that we want to explicitly get the texture level.

an advanced concept of this shader is the derivative functions dpdx and dpdy. they are functions to measure how fast a value p changes along the x and y axis. for example, dpdx(in.tex * tileSizeWithoutPadding) means how fast the value in.tex * tileSizeWithoutPadding between two horizontal nearby fragments. as we know, the way fragment shaders work is identical to the compute shader. at the same time, there are many invocations of the fragment shader executing in parallel processing different fragments. what's conter-intuitive about these derivative functions is that only looking at a fragment alone, we can't measure the how fast the value changes. the current fragment execution needs to work with other fragment shader executions to return the value, i.e. there is thread synchronization involved in the process. previously we have learned that when thread synchronization is involved, it is required that the relevant code is in a uniform control flow. same requirements are needed for the derivative functions too.

in the chapter about mipmaps, we haven't answered the question why the textureSample function has to be in uniform control flow. now the answer should be clear, because internally textureSample also rely on the derivative functions to obtain the texture level.

but we haven't explained why measuring how fast in.tex * tileSizeWithoutPadding changes can help us get the texture level. as we can see, the in.tex is the texture coordinates, its value span from 0.0 to 1.0. and the tileSizeWithoutPadding is 256, hence tileSizeWithoutPadding times the texture coordinates, we should be in the range of [0.0, 256.0].

no looking at a single 256x256 tile parallel to the screen space. if there is no zooming at all or zooming in, the value change of tileSizeWithoutPadding*in.tex across two horizontal adjacent fragments should be less or equal to 1.0. in this scenario, we want to choose the texture level 0 = max(log2(d),0) for sampling, because only the level 0 can give us the best rendering quality. no imagine if we zoom out, the 256x256 tile will be rendered on screen in a smaller size. in this case, two adjacent fragments will have a value change faster than 1.0. but the same principle applies, we can simply get the level by max(log2(d),0). in the most extreme case, if we zoom out enough, so much so that the entire 256x256 tile is smaller or equal to a single pixel, two adjacent fragments will have a derivative larger than 256. hence we will need to sample from the log2(256)=8 level.

dx and dy may change in different speed. in the shader, we don't consider the dx and dy separately, but picking the maximum change among the two.

once we have figured out the level. the next step is obtain the key in the same manner we have seen before. The first step is getting the level coordinates x and y. This is done by first calculating the position of the current fragment on the original image, then divide the position by the level tile size. the level coordinates can give us the index of the tile on that level. tile count of each level is passed in as level_tile_count, with it, we can count the number of tiles on all levels below. and together, we can get the unique tile id for the current fragment.

finally, we need to update the visibility table var visible_tiles: array;, this is a storage buffer we can write to. for each tile, we have an entry for it indexed by the tile id. for all fragments we actually render, we set the corresponding tile's entry to 1 indicating "visible". in the next step, we gether all visible tiles' information to assemble a single texture map to hold them for the actual rendering.

now let's look at the corresponding javascript code that works with this shader.

const positionAttribDesc = {
     shaderLocation: 0, // @location(0)
     offset: 0,
     format: 'float32x4'
@@ -348,7 +348,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture passEncoder.end(); commandEncoder.copyBufferToBuffer(tile.tileVisibilityBuffer, 0, tile.tileVisibilityBufferRead, 0, overallTileCount * 4); -

5_04_mega_texture/index.html:338-655 Buffer Setup, Command Encoding for Visibility Test

as we have seen in the shader, there are two vertex attributes. The position buffer holds the four vertices of a 256x256 tile. The tileLocBufferLayoutDesc holds a set of level coordinates x and y. this attribute is a per instance attribute. We will use the instancing technique to duplicate the 256x256 tile for each set of level coordinates.

the tileVisibilityBuffer contains the output array. before each round of rendering, we need to clear this buffer. hence we have tileVisibilityBufferZeros with zeros of the same size. we will use it to clear tileVisibilityBuffer. tileVisibilityBufferRead is used for result readback.

for command encoding, we will draw the tile for tileH * tileV instances. This will cover the entire image.

await tile.tileVisibilityBufferRead.mapAsync(GPUMapMode.READ, 0, overallTileCount * 4);
+

as we have seen in the shader, there are two vertex attributes. The position buffer holds the four vertices of a 256x256 tile. The tileLocBufferLayoutDesc holds a set of level coordinates x and y. this attribute is a per instance attribute. We will use the instancing technique to duplicate the 256x256 tile for each set of level coordinates.

the tileVisibilityBuffer contains the output array. before each round of rendering, we need to clear this buffer. hence we have tileVisibilityBufferZeros with zeros of the same size. we will use it to clear tileVisibilityBuffer. tileVisibilityBufferRead is used for result readback.

for command encoding, we will draw the tile for tileH * tileV instances. This will cover the entire image.

await tile.tileVisibilityBufferRead.mapAsync(GPUMapMode.READ, 0, overallTileCount * 4);
 
 let vb = tile.tileVisibilityBufferRead.getMappedRange(0, overallTileCount * 4);
 
@@ -363,7 +363,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture await visibleTiles.assembleTexture(device, imageWidth, imageHeight, vt); -

5_04_mega_texture/index.html:663-677 Read the Resulting Buffer

After buffer submission, we read back the resulting buffer. and we extract the indices of visible tiles and hand the result to a hash table visibleTiles to assemble the actual texture map. now let's look at how the texture map is assembled:

class KeyIdManager {
+

After buffer submission, we read back the resulting buffer. and we extract the indices of visible tiles and hand the result to a hash table visibleTiles to assemble the actual texture map. now let's look at how the texture map is assembled:

class KeyIdManager {
 
     constructor() {
         this.used = new Map();
@@ -420,7 +420,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture return result; } } -

5_04_mega_texture/index.html:212-268 Key Manager

first, let's look at a helper class called the KeyId manager. recall that our actual texture map has a size of 2048x2048, and our tile size is 256x256. hence the actual texture map can hold at most 64 tiles. we need a class to manage which ones of the 64 (maxVisibleTileCountOnTexture) spots have been occupied by visible tiles. And after a new round of visibility test, the manager will recycle invisible tiles and load visible ones into the texture map. in a more advanced version, we could implement a LRU cache, such that we will retire the least used tiles first. here, for simplicity, we always retire a tile if it is not visible.

in the class, available is a list of available spots' ids. at the beginning, all 64 posts are available, hence we push all ids into the list. the used variable is a map from a tile's key to its id on the texture map.

the key function is generate. The inputs are the keys of all visible tiles and a helper function that can paste a tile into the texture map.

the logic of this function is not difficult to understand. first, we load all keys into a hash table (a set). because next, we need to perform a lot existence queries. next, we visit all visible tiles of the previous round, if any has become invisible in this round, we recycle its id into the available list. next, we visit all visible tiles of this round, if the tile was also visible in the previous round, we skip the loading step. otherwise, we utilize the texture loading utility function to paste the tile onto an available spot on the texture.

now, let's look at the implementation of the loading utility function:

async loadTileIntoTexture(device, bufferUpdate, imageWidth, imageHeight, x, y, level, tileKey, id) {
+

first, let's look at a helper class called the KeyId manager. recall that our actual texture map has a size of 2048x2048, and our tile size is 256x256. hence the actual texture map can hold at most 64 tiles. we need a class to manage which ones of the 64 (maxVisibleTileCountOnTexture) spots have been occupied by visible tiles. And after a new round of visibility test, the manager will recycle invisible tiles and load visible ones into the texture map. in a more advanced version, we could implement a LRU cache, such that we will retire the least used tiles first. here, for simplicity, we always retire a tile if it is not visible.

in the class, available is a list of available spots' ids. at the beginning, all 64 posts are available, hence we push all ids into the list. the used variable is a map from a tile's key to its id on the texture map.

the key function is generate. The inputs are the keys of all visible tiles and a helper function that can paste a tile into the texture map.

the logic of this function is not difficult to understand. first, we load all keys into a hash table (a set). because next, we need to perform a lot existence queries. next, we visit all visible tiles of the previous round, if any has become invisible in this round, we recycle its id into the available list. next, we visit all visible tiles of this round, if the tile was also visible in the previous round, we skip the loading step. otherwise, we utilize the texture loading utility function to paste the tile onto an available spot on the texture.

now, let's look at the implementation of the loading utility function:

async loadTileIntoTexture(device, bufferUpdate, imageWidth, imageHeight, x, y, level, tileKey, id) {
     const writeArray = new Float32Array(bufferUpdate.getMappedRange(tileKey * 2 * 4, 8));
     writeArray.set([(tileSizeWithoutPadding / textureSizeWithoutPadding) * (id % (textureSizeWithoutPadding / tileSizeWithoutPadding)),
     (tileSizeWithoutPadding / textureSizeWithoutPadding) * Math.floor(id / (textureSizeWithoutPadding / tileSizeWithoutPadding))]);
@@ -437,7 +437,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture origin: { x: (padding * 2 + tileSizeWithoutPadding) * (id % (textureSizeWithoutPadding / tileSizeWithoutPadding)), y: (padding * 2 + tileSizeWithoutPadding) * Math.floor(id / (textureSizeWithoutPadding / tileSizeWithoutPadding)) } }, { width: tileSizeWithoutPadding + padding * 2, height: tileSizeWithoutPadding + padding * 2 }); } -

5_04_mega_texture/index.html:288-304 Helper Function to Load Tile Onto Texture

the function accomplishes two things. first, it updates the texture lookup table. for each tile in the lookup table, there is a corresponding entry of two float numbers. the two numbers are the texture coordinates of this tile's upper left corner on the texture.

next, we use the fetch API to load the corresponding tile image into a imageBitmap. our filename comes handy now. we rely on the filename to fetch the right tile image.

finally, we utilize the copyExternalImageToTexture function to paste the imageBitmap onto the texture map. the actual tiles have paddings and so does the texture map. hence when calculating the tile's position on the texture map, we need to consider the paddings.

Next, let's look at the class VisibleTileHashTable. This is the wrapper of everything related to texture update.

class VisibleTileHashTable {
+

the function accomplishes two things. first, it updates the texture lookup table. for each tile in the lookup table, there is a corresponding entry of two float numbers. the two numbers are the texture coordinates of this tile's upper left corner on the texture.

next, we use the fetch API to load the corresponding tile image into a imageBitmap. our filename comes handy now. we rely on the filename to fetch the right tile image.

finally, we utilize the copyExternalImageToTexture function to paste the imageBitmap onto the texture map. the actual tiles have paddings and so does the texture map. hence when calculating the tile's position on the texture map, we need to consider the paddings.

Next, let's look at the class VisibleTileHashTable. This is the wrapper of everything related to texture update.

class VisibleTileHashTable {
     constructor() {
         this.texture = null;
         this.tileTexCoordBuffer = null;
@@ -468,7 +468,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture this.tileTexCoordBufferUpdate.unmap(); } } -

5_04_mega_texture/index.html:269-316 Visible Tile Hash Table

the class contains four members. the texture, the lookup table, and an additional buffer for updating the lookup table. finally the keyIdManager.

After we perform a new round of visibility test, we will need to call the assembleTexture function to build up the new texture. this function maps the update buffer for the lookup table and pass it to other helper function we have explained above.

finally, with the visibility test done, we will run the second pass to actually render the tiles. this is accomplished by the following shader:

@group(0)
+

the class contains four members. the texture, the lookup table, and an additional buffer for updating the lookup table. finally the keyIdManager.

After we perform a new round of visibility test, we will need to call the assembleTexture function to build up the new texture. this function maps the update buffer for the lookup table and pass it to other helper function we have explained above.

finally, with the visibility test done, we will run the second pass to actually render the tiles. this is accomplished by the following shader:

@group(0)
 @binding(2)
 var<uniform> level_tile_count: array<vec4<u32>, 8>; //must align to 16bytes
 @group(0)
@@ -502,7 +502,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture ((y-floor(y))*tileSizeWithoutPadding + padding) * tileSizeWithoutPadding/ ((padding*2 + tileSizeWithoutPadding)*textureSizeWithoutPadding) ), 0); } -

5_04_mega_texture/index.html:118-151 The Shader Does the Actual Rendering

this shader is very similar to the visibility test shader. in fact, the vertex shader is exactly the same, hence I omit the vertex shader here. The fragment shader is also not too different. first to notice is the hash table has become read only now. we still perform the same calculation to obtain the key of each visible tile.

Use Image to Illustrate?
Use Image to Illustrate?

the calculation looks intimidating. let's break it down. first, let's assume that there is no padding at all. x-floor(x) is the texture coordinates on the current visible tile for the current fragment. But since the current visible tile is only a part of the texture, we need to convert this local texture coordinates the the global texture coordinates on the texture.

recall that our lookup table contains the correspondence of tile keys to tiles' upper left corner texture coordinates on the texture map. hence hash[base] will give us the global texture coordinates of the upper left corner. now, the global texture coordinates can be obtained by hash[base] + local_coordinates * tile_size / texture_size;

this works when there is no padding. with padding, we need to slightly update the local coordinates. first, we times the local coordinates with tileSizeWithoutPadding to get local coordinates in pixels. Then we add the paddings to it and divide this adjusted coordinates with the tile size With coordinates to get a new local coordinates. The rest of the calculation is the same as above.

With this process, we can get the precise texture coordinates for each fragment. We use it to look up for the color value from the texture map.

The javascript code that sets up the second rendering pass is shared with the first pass. we will omit the details. but it is worth checking the navigation code.

let translateMatrix = glMatrix.mat4.lookAt(glMatrix.mat4.create(),
+

this shader is very similar to the visibility test shader. in fact, the vertex shader is exactly the same, hence I omit the vertex shader here. The fragment shader is also not too different. first to notice is the hash table has become read only now. we still perform the same calculation to obtain the key of each visible tile.

Use Image to Illustrate?
Use Image to Illustrate?

the calculation looks intimidating. let's break it down. first, let's assume that there is no padding at all. x-floor(x) is the texture coordinates on the current visible tile for the current fragment. But since the current visible tile is only a part of the texture, we need to convert this local texture coordinates the the global texture coordinates on the texture.

recall that our lookup table contains the correspondence of tile keys to tiles' upper left corner texture coordinates on the texture map. hence hash[base] will give us the global texture coordinates of the upper left corner. now, the global texture coordinates can be obtained by hash[base] + local_coordinates * tile_size / texture_size;

this works when there is no padding. with padding, we need to slightly update the local coordinates. first, we times the local coordinates with tileSizeWithoutPadding to get local coordinates in pixels. Then we add the paddings to it and divide this adjusted coordinates with the tile size With coordinates to get a new local coordinates. The rest of the calculation is the same as above.

With this process, we can get the precise texture coordinates for each fragment. We use it to look up for the color value from the texture map.

The javascript code that sets up the second rendering pass is shared with the first pass. we will omit the details. but it is worth checking the navigation code.

let translateMatrix = glMatrix.mat4.lookAt(glMatrix.mat4.create(),
     glMatrix.vec3.fromValues(0, 0, 10), glMatrix.vec3.fromValues(0, 0, 0), glMatrix.vec3.fromValues(0.0, 1.0, 0.0));
 
 let orthProjMatrix = glMatrix.mat4.ortho(glMatrix.mat4.create(), canvas.width * -0.5 * scale, canvas.width * 0.5 * scale, canvas.height * 0.5 * scale, canvas.height * -0.5 * scale, -1000.0, 1000.0);
@@ -563,7 +563,7 @@ 

5.4 Mega Texture

mega texture is also called the virtualized texture updatedProjectionMatrix = glMatrix.mat4.ortho(glMatrix.mat4.create(), pivotX - canvas.width * 0.5 * scale, pivotX + canvas.width * 0.5 * scale, pivotY + canvas.height * 0.5 * scale, pivotY - canvas.height * 0.5 * scale, -1000.0, 1000.0); } } -

5_04_mega_texture/index.html:602-765 Navigation Related Code

Panning is achieved by updating the translation Matrix. We use the lookAt function to derive the updated matrix. At the beginning, we look at the origin. and mouse move will update both the lookAt's from and to parameters.

Zooming is achieved by updating the projection matrix. For image viewing, we use only the orthogonal matrix. When zoom in and out, we adjust the viewing range of the orthogonal matrix.

Image to Show the Artifact if No Padding
Image to Show the Artifact if No Padding

In the end, let's discuss what's gonna happen if we don't include a padding in the texture. to see the effect, we can set the padding to zero. As we can see, we can easily see seams between the tiles. the seams are caused by numerical errors when sampling the texture map. sampling at a coordinates close to the boarder of a tile might read from a nearby tile. to avoid this, we add an extra buffer between tiles, so that sampling close to the border of a tile won't read nearby tiles, but from the buffering area.

Image to Illustrate Buffering
Image to Illustrate Buffering

+
5_04_mega_texture/index.html:602-765 Navigation Related Code

Panning is achieved by updating the translation Matrix. We use the lookAt function to derive the updated matrix. At the beginning, we look at the origin. and mouse move will update both the lookAt's from and to parameters.

Zooming is achieved by updating the projection matrix. For image viewing, we use only the orthogonal matrix. When zoom in and out, we adjust the viewing range of the orthogonal matrix.

Image to Show the Artifact if No Padding
Image to Show the Artifact if No Padding

In the end, let's discuss what's gonna happen if we don't include a padding in the texture. to see the effect, we can set the padding to zero. As we can see, we can easily see seams between the tiles. the seams are caused by numerical errors when sampling the texture map. sampling at a coordinates close to the boarder of a tile might read from a nearby tile. to avoid this, we add an extra buffer between tiles, so that sampling close to the border of a tile won't read nearby tiles, but from the buffering area.

Image to Illustrate Buffering
Image to Illustrate Buffering

5.6 Skeleton Animation

so far we have learned how to create animatio return 0; } -

5_06_skeleton_animation/preprocess/main.cpp:1-245 DAE Preprocessor

The library we use to parse the DAE file is called Assimp. it is a generic tool that can parse many common 3D formats. Assimp organizes 3D objects in a scene structure. Since we know that there is only one mesh presents in our demo file, we simply retrieve the mesh zero. after that, we first need to dump the geometry data. this is not different from the obj file. we need to dump vertices along with vertex normals and triangle mesh.

what we need to focus is how to dump the bones and animations. first of all, the scene struct contains a list of bones, each has a unique name. at the same time, the scene struct also contains an animation list. since we only have one animation, there is actually only one animation object. The animation object has multiple animation channels. each channel is associated with a bone. a channel contains a node name which is the same as the name of the associated bone. using this information, we can establish the correspondence of bones and animations.

the scene struct organizes the objects in the scene in a hierarchical manner, this is a common approach for scene management in computer graphics. For example, imagine a 3D car model with four wheels. the four wheels are the children of the car body. And in a 3D scene, we might have multiple car models. The scene is the root node and the the cars are the children. And all the parts and accessories, such as wheels and doors are the children of the car nodes. One benefit of this arraignment is that the transformation applied to a parent node will automatically applied to all its children. For example, moving a car will also moving its doors and wheels with the body. Hence when calculating the transformation of a leaf node, we need to apply all the transformation accumulated from the scene root.

in our demo scene, we only have one object, there doesn't seem to be a need for a complex tree. But in addition to the mesh, we also have the bones. The bones are also organized in hierarchical manner, hence moving the shoulder will affect the forearm and fingers.

in the second part of data dumping, we need to dump this hierarchical structure. we start from the scene root. the scene root is not actually a 3D object, but the scene itself. since the scene root has no transformation, we simply assign an identity matrix to it. for the rest of the nodes, we need to check if the node is a bone. if it is not a bone, i.e. the 3D mesh, we simply save its transformation. if it is a bone, we will also have to save its offset matrix, weights and animations.

the offset matrix is an additional offset only presents in bones. the weights are a set of tuples for all vertices. the first element of the tuple is vertex id and the second is an influence number.

and for the animation, it is a sequence of transformations. The length of the sequence is the number of the key frame. each key frame has a timestamp and three transformations, the translation, the rotation and scaling. translation and scaling are represented as vectors, the rotation is represented as quaternions.

let boneWeights = new Float32Array(this.objBody.vert.length * 16 / 3);
+

The library we use to parse the DAE file is called Assimp. it is a generic tool that can parse many common 3D formats. Assimp organizes 3D objects in a scene structure. Since we know that there is only one mesh presents in our demo file, we simply retrieve the mesh zero. after that, we first need to dump the geometry data. this is not different from the obj file. we need to dump vertices along with vertex normals and triangle mesh.

what we need to focus is how to dump the bones and animations. first of all, the scene struct contains a list of bones, each has a unique name. at the same time, the scene struct also contains an animation list. since we only have one animation, there is actually only one animation object. The animation object has multiple animation channels. each channel is associated with a bone. a channel contains a node name which is the same as the name of the associated bone. using this information, we can establish the correspondence of bones and animations.

the scene struct organizes the objects in the scene in a hierarchical manner, this is a common approach for scene management in computer graphics. For example, imagine a 3D car model with four wheels. the four wheels are the children of the car body. And in a 3D scene, we might have multiple car models. The scene is the root node and the the cars are the children. And all the parts and accessories, such as wheels and doors are the children of the car nodes. One benefit of this arraignment is that the transformation applied to a parent node will automatically applied to all its children. For example, moving a car will also moving its doors and wheels with the body. Hence when calculating the transformation of a leaf node, we need to apply all the transformation accumulated from the scene root.

in our demo scene, we only have one object, there doesn't seem to be a need for a complex tree. But in addition to the mesh, we also have the bones. The bones are also organized in hierarchical manner, hence moving the shoulder will affect the forearm and fingers.

in the second part of data dumping, we need to dump this hierarchical structure. we start from the scene root. the scene root is not actually a 3D object, but the scene itself. since the scene root has no transformation, we simply assign an identity matrix to it. for the rest of the nodes, we need to check if the node is a bone. if it is not a bone, i.e. the 3D mesh, we simply save its transformation. if it is a bone, we will also have to save its offset matrix, weights and animations.

the offset matrix is an additional offset only presents in bones. the weights are a set of tuples for all vertices. the first element of the tuple is vertex id and the second is an influence number.

and for the animation, it is a sequence of transformations. The length of the sequence is the number of the key frame. each key frame has a timestamp and three transformations, the translation, the rotation and scaling. translation and scaling are represented as vectors, the rotation is represented as quaternions.

let boneWeights = new Float32Array(this.objBody.vert.length * 16 / 3);
 
 function assignBoneWeightsToVerticesHelper(bone) {
 
@@ -424,7 +424,7 @@ 

5.6 Skeleton Animation

so far we have learned how to create animatio for (let i = 0; i < this.objBody.skeleton.length; ++i) { assignBoneWeightsToVerticesHelper(this.objBody.skeleton[i]); } -

5_06_skeleton_animation/index.html:853-875 Gather Bone Weights

the first step of reading the file is creating a flatten vertex bone weight array. we have 16 bones, including the scene itself as the root, hence we for each vertex, we have 16 float numbers. but we actually have 13 bones in total, but we need to round it to a multiply of four, which we will see why later. since the bones are organized in a tree structure, we need to rely on a helper function to recursively visit all bone nodes.

updateAnimation(time) {
+

the first step of reading the file is creating a flatten vertex bone weight array. we have 16 bones, including the scene itself as the root, hence we for each vertex, we have 16 float numbers. but we actually have 13 bones in total, but we need to round it to a multiply of four, which we will see why later. since the bones are organized in a tree structure, we need to rely on a helper function to recursively visit all bone nodes.

updateAnimation(time) {
     let boneTransforms = new Float32Array(16 * 16);
 
     function interpolatedV(time, V, interpolate) {
@@ -604,7 +604,7 @@ 

5.6 Skeleton Animation

so far we have learned how to create animatio return boneTransforms; } -

5_06_skeleton_animation/index.html:664-843 Given a Timestamp, Return the Current Bone Transformations

Next, we will need to look at a helper function. What this function does is that given a timestamp. The function returns the current bone transformations. this function accomplish this step by first obtain all necessary transformations for each bone, this includes the local translation, rotation and scaling, and the accumulated transformations from its parent, as well as the offset matrix. then, the function needs to do an interpolation in the temporal direction, as the requested timestamp is not necessarily one of the key frame timestamps. and finally it needs to recursively process all its children. what this function returns is a flattened transformation array for all bones. we will pass this array as a uniform to our shader.

@group(0) @binding(0)
+

Next, we will need to look at a helper function. What this function does is that given a timestamp. The function returns the current bone transformations. this function accomplish this step by first obtain all necessary transformations for each bone, this includes the local translation, rotation and scaling, and the accumulated transformations from its parent, as well as the offset matrix. then, the function needs to do an interpolation in the temporal direction, as the requested timestamp is not necessarily one of the key frame timestamps. and finally it needs to recursively process all its children. what this function returns is a flattened transformation array for all bones. we will pass this array as a uniform to our shader.

@group(0) @binding(0)
 var<uniform> modelView: mat4x4<f32>;
 @group(0) @binding(1)
 var<uniform> projection: mat4x4<f32>;
@@ -695,7 +695,7 @@ 

5.6 Skeleton Animation

so far we have learned how to create animatio return out; } -

5_06_skeleton_animation/index.html:162-252 Animation Shader

the shader is modified from the shadow map shader. The fragment shader is the same, hence we only look at the vertex shader. We now pass two additional information: the bone transform is passed as a uniform containing the bone transformations of the current animation frame. And the bone weights, passed in as the vertex attributes. Because vertex attribute doesn't support large data like a matrix, we need to break the weights down into four vectors, each holds four float numbers, hence previously we need to round the number of bones to 16.

in the vertex shader, we perform a weighted sum of the bone transforms based on bone weights. Then we apply the calculated transform to the vertex followed by the modelView matrix and the project matrix as before.

notice that we assign a dedicated group id to the bone transformation uniform, because this uniform will be updated frequently, we want to separate it from the rest that will stay stable.

this.boneWeightBuffer = createGPUBuffer(device, boneWeights, GPUBufferUsage.VERTEX);
+

the shader is modified from the shadow map shader. The fragment shader is the same, hence we only look at the vertex shader. We now pass two additional information: the bone transform is passed as a uniform containing the bone transformations of the current animation frame. And the bone weights, passed in as the vertex attributes. Because vertex attribute doesn't support large data like a matrix, we need to break the weights down into four vectors, each holds four float numbers, hence previously we need to round the number of bones to 16.

in the vertex shader, we perform a weighted sum of the bone transforms based on bone weights. Then we apply the calculated transform to the vertex followed by the modelView matrix and the project matrix as before.

notice that we assign a dedicated group id to the bone transformation uniform, because this uniform will be updated frequently, we want to separate it from the rest that will stay stable.

this.boneWeightBuffer = createGPUBuffer(device, boneWeights, GPUBufferUsage.VERTEX);
 
• • •
const boneWeight0AttribDesc = {
     shaderLocation: 2,
     offset: 0,
@@ -750,7 +750,7 @@ 

5.6 Skeleton Animation

so far we have learned how to create animatio format: 'depth32float' } }; -

5_06_skeleton_animation/index.html:881-1185 Setup the Bone Weight Buffer

next, we look at how to setup the bone weight buffer. Previously we have already created the bone weight flat array, now we set it as the source of the four vertex attributes that contains the bone weights.

if (!startTime) {
+

next, we look at how to setup the bone weight buffer. Previously we have already created the bone weight flat array, now we set it as the source of the four vertex attributes that contains the bone weights.

if (!startTime) {
     startTime = timestamp;
 }
 const elapsed = timestamp - startTime;
@@ -758,7 +758,7 @@ 

5.6 Skeleton Animation

so far we have learned how to create animatio let boneTransformBufferUpdate = createGPUBuffer(device, boneTransforms, GPUBufferUsage.COPY_SRC);

• • •
commandEncoder.copyBufferToBuffer(boneTransformBufferUpdate, 0,
     runCube.boneTransformUniformBuffer, 0, boneTransforms.byteLength);
-
5_06_skeleton_animation/index.html:1464-1567 Animation Update in the Render Loop

lastly, let's see how the animation transforms are updated, basically for each frame, we measure the elapsed time, and use that to obtain the bone transforms and load it into the uniform buffer.

+
5_06_skeleton_animation/index.html:1464-1567 Animation Update in the Render Loop

lastly, let's see how the animation transforms are updated, basically for each frame, we measure the elapsed time, and use that to obtain the bone transforms and load it into the uniform buffer.

5.2 Toon Shading

achieving realistisity is not the only goal of rend fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> { return vec4<f32>( 0.0,0.0,0.0,1.0); } -

5_02_toon_shading/index.html:37-68 Outline Shader

The above shader does the inflation in the clip space. The two parameters passed into the vertex shader are the vertex position in 3D and the vertex's normal in 3D. We convert the position to the clip space position as always. For the normal vector, we apply the normal matrix and then projection. notice that the w component is set to 0.0 because it is a vector.

after that, we need to inflate the clip space position. again, since the silhouette is in 2D, we don't want to touch the zw components, we leave them as they are. for the xy axis, we offset them by normalize(clip_normal.xy)*6.4/screenDim * out.clip_position.w. here, dividing the clip space normal by the screen dimension is to compensate for the screen size and aspect ratio. We want the silhouette width to be uniform regardless of the screen size and aspect ratio. timing it by out.clip_position is because the graphics pipeline will later divide the clip space position by the w component when converting it to the normalized device coordinates. to avoid further change to the silhouette thickness during this conversion, we premultiply it by the w.

the fragment shader is simple. we simply output pure black pixels. not let's examine the javascript code:

let { positionBuffer, normalBuffer, indexBuffer, indexSize } = await loadObj(device, '../data/teapot.obj');
+

The above shader does the inflation in the clip space. The two parameters passed into the vertex shader are the vertex position in 3D and the vertex's normal in 3D. We convert the position to the clip space position as always. For the normal vector, we apply the normal matrix and then projection. notice that the w component is set to 0.0 because it is a vector.

after that, we need to inflate the clip space position. again, since the silhouette is in 2D, we don't want to touch the zw components, we leave them as they are. for the xy axis, we offset them by normalize(clip_normal.xy)*6.4/screenDim * out.clip_position.w. here, dividing the clip space normal by the screen dimension is to compensate for the screen size and aspect ratio. We want the silhouette width to be uniform regardless of the screen size and aspect ratio. timing it by out.clip_position is because the graphics pipeline will later divide the clip space position by the w component when converting it to the normalized device coordinates. to avoid further change to the silhouette thickness during this conversion, we premultiply it by the w.

the fragment shader is simple. we simply output pure black pixels. not let's examine the javascript code:

let { positionBuffer, normalBuffer, indexBuffer, indexSize } = await loadObj(device, '../data/teapot.obj');
 this.positionBuffer = positionBuffer;
 // The normal buffer contains vertex normals calculated as averages of adjacent surface normals.
 this.normalBuffer = normalBuffer;
@@ -216,7 +216,7 @@ 

5.2 Toon Shading

achieving realistisity is not the only goal of rend format: 'depth32float' } } -

5_02_toon_shading/index.html:253-656 Outline Pipeline Setup

when we load the obj file, we have already established the normal buffer containing the vertex normals calculated as an average of adjacent surface normals. When setting up the pipeline for rendering the outline, we set the cullMode to front to peal the front side of the inflated object. Also, notice that we enable the depth testing here.

in the second part of this tutorial, we will look at how to achieve the painterly shading effect. painterly styles don't usually have smooth color transition as we see on a physically accurate picture. the color transition is often discretized. also the color use is also false colors that can't be described by lighting equations attempt to achieve physical accuracy.

but it is not difficult to achieve this discretized false color effect. all we need to do is introducing another layer of indirection by using a look up table. We will still using the same phong shading algorithm to calculate lighting as before. The difference is, before, we directly calculate the final color. this time, we only calculate a light intensity as a single value. then, we use a prebuild lookup table to transfer the intensity into colors. our lookup table is in 1D, and is able to convert a value in the range [0,1] into an arbitrary RGB color. in our setup, our 1D texture contains a few bands of false colors, our final image will be rendered using these false colors.

let's look at the shader first. Again, this shader is modified from the shadow map shader.

// Instead of setting colors as RGB, we use scalars here as we only want to calculate intensity.
+

when we load the obj file, we have already established the normal buffer containing the vertex normals calculated as an average of adjacent surface normals. When setting up the pipeline for rendering the outline, we set the cullMode to front to peal the front side of the inflated object. Also, notice that we enable the depth testing here.

in the second part of this tutorial, we will look at how to achieve the painterly shading effect. painterly styles don't usually have smooth color transition as we see on a physically accurate picture. the color transition is often discretized. also the color use is also false colors that can't be described by lighting equations attempt to achieve physical accuracy.

but it is not difficult to achieve this discretized false color effect. all we need to do is introducing another layer of indirection by using a look up table. We will still using the same phong shading algorithm to calculate lighting as before. The difference is, before, we directly calculate the final color. this time, we only calculate a light intensity as a single value. then, we use a prebuild lookup table to transfer the intensity into colors. our lookup table is in 1D, and is able to convert a value in the range [0,1] into an arbitrary RGB color. in our setup, our 1D texture contains a few bands of false colors, our final image will be rendered using these false colors.

let's look at the shader first. Again, this shader is modified from the shadow map shader.

// Instead of setting colors as RGB, we use scalars here as we only want to calculate intensity.
 const diffuseConstant:f32 = 1.0;
 const specularConstant:f32 = 0.0;
 const ambientConstant:f32 = 0.0;
@@ -224,7 +224,7 @@ 

5.2 Toon Shading

achieving realistisity is not the only goal of rend var intensity:f32 = max(dot(-lightDir, n), 0.0)* diffuseConstant + specular(-lightDir, viewDir, n, shininess) * specularConstant; // With the light intensity, we look up the final color. var diffuse:vec3<f32> = textureSample(t_shade, s_shade, intensity * visibility).xyz; -

5_02_toon_shading/index.html:86-171 Painterly Shader

And finally, let's see, on the javascript side, how this 1D lookup texture is set up.

// 1D texture, width is hardcoded to 128.
+

And finally, let's see, on the javascript side, how this 1D lookup texture is set up.

// 1D texture, width is hardcoded to 128.
 const shadeTextureDesc = {
     size: [128],
     dimension: '1d',
@@ -268,7 +268,7 @@ 

5.2 Toon Shading

achieving realistisity is not the only goal of rend }, { width: 128 }); // Wait for completion. await device.queue.onSubmittedWorkDone(); -

5_02_toon_shading/index.html:260-303 Setup 1D Lookup Texture

Together with the outline shader, we now have the toon shading effect:

Image to Show the Result
Image to Show the Result

+
5_02_toon_shading/index.html:260-303 Setup 1D Lookup Texture

Together with the outline shader, we now have the toon shading effect:

Image to Show the Result
Image to Show the Result

5.5 Transparency with Depth Peeling

transparency is another example //return vec4<f32>(uv.xy,0.0,1.0); return vec4<f32>(radiance * diffuseColor.w, diffuseColor.w); } -

5_05_transparency/index.html:96-197 Depth Peeling Shader

this shader should look familiar to us. it's almost the same lighting shader with a slight difference. This shader loads a depth map, and does a comparison of the current depth value with what's in the depth map. it renders the fragment only when the new depth is larger than that on the depth map. this is how we peel away things before the depth map.

out.clip_position = projection * wldLoc;
+

this shader should look familiar to us. it's almost the same lighting shader with a slight difference. This shader loads a depth map, and does a comparison of the current depth value with what's in the depth map. it renders the fragment only when the new depth is larger than that on the depth map. this is how we peel away things before the depth map.

out.clip_position = projection * wldLoc;
 out.inPos = projection * wldLoc;
 
• • •
var uv:vec2<f32> = 0.5*(in.inPos.xy/in.inPos.w + vec2(1.0,1.0));
 var visibility:f32 = textureSampleCompare(
@@ -277,7 +277,7 @@ 

5.5 Transparency with Depth Peeling

transparency is another example debug = in.clip_position; //debug = in.inPos; //debug = vec4(uv,in.inPos.z/in.inPos.w, in.clip_position.z); -

5_05_transparency/index.html:153-176 Calculate Depth and Uv Coordinates

What worth noticing is the above logic to calculate the uv coordinates for depth map fetching, as well as the current depth. what may appear strange to you is that I have two variables clip_position and inPos, both seem to save the clip space position vector. they appear to be duplicated.

I'm doing so intentionally to show you that what has been passed into the clip_position in the vertex shader will be changed before the fragment shader. hence once we read it back in the fragment shader, it is not the same clip position, despite that the variable's name is the same.

recall when we introduced the gpu pipeline, we learned that during the vertex shader stage, we only sparsely define properties on each vertices. And then we go through a step called rasterization. during rasterization, we convert triangle geometries into fragments similar to laying bricks. the values defined on each fragments are obtained through bilinear interpolation. interpolation can explain some of the value changes, but not all. Another obvious change is that, the pipeline will also alter the coordinate system. for x and y, the pipeline will change it from the clip_space to the normalized device coordinates and then the normalized device coordinates are converted to the framebuffer coordinates (The top-left corner is at (0.0, 0.0). x increases to the right. y increases down.) for x and y. for the z, we will map it to a value in the viewport depth range: vp.minDepth + n.z × ( vp.maxDepth - vp.minDepth );

there is a technical term to describe the values passed into the fragment shader: RasterizationPoint. bear in mind that there are data changes despite the variable naming might be the same.

with that in mind, it shouldn't be difficult to understand the uv calculation, we basically map the NDC range in [-1,1] to [0,1]; and we will need to flip the y axis, as for the texture coordinate, y increases down. And for the z value, we just read it from the clip_position.z;

now, let's see how this pipeline is set up.

the first thing is that we need to have an empty canvas to serve as the background. the order of the rendering is that we render from the front to the back order and finally we overlay what's been rendered on the background.

at the beginning of each round, we need to clear our dst texture to (0,0,0,1). This is easy to achieve using the built-in functionality of clearing color attachments when loading, without needing to draw anything. the dst texture is the layer zero for our front-to-back alpha blending. The first layer will blend with this texture, and the second layer will blend with the resulting texture and so on. and finally the dst texture should have all layers composed in the front to back order. we then apply it to a background.

you may wonder why the dst texture map is cleaned up with the value (0,0,0,1)? rather than (0,0,0,0). shouldn't the layer zero be completely transparent. yes, the first layer should be fully transparent and have the color (0,0,0,0). but recall that when doing the front-to-back blending, we utilize the alpha channel to save (1.0 - A_{dst}) rather than the A_{dst} itself. hence we assign the alpha to 1.0 here.

const renderPassCleanupDesc = {
+

What worth noticing is the above logic to calculate the uv coordinates for depth map fetching, as well as the current depth. what may appear strange to you is that I have two variables clip_position and inPos, both seem to save the clip space position vector. they appear to be duplicated.

I'm doing so intentionally to show you that what has been passed into the clip_position in the vertex shader will be changed before the fragment shader. hence once we read it back in the fragment shader, it is not the same clip position, despite that the variable's name is the same.

recall when we introduced the gpu pipeline, we learned that during the vertex shader stage, we only sparsely define properties on each vertices. And then we go through a step called rasterization. during rasterization, we convert triangle geometries into fragments similar to laying bricks. the values defined on each fragments are obtained through bilinear interpolation. interpolation can explain some of the value changes, but not all. Another obvious change is that, the pipeline will also alter the coordinate system. for x and y, the pipeline will change it from the clip_space to the normalized device coordinates and then the normalized device coordinates are converted to the framebuffer coordinates (The top-left corner is at (0.0, 0.0). x increases to the right. y increases down.) for x and y. for the z, we will map it to a value in the viewport depth range: vp.minDepth + n.z × ( vp.maxDepth - vp.minDepth );

there is a technical term to describe the values passed into the fragment shader: RasterizationPoint. bear in mind that there are data changes despite the variable naming might be the same.

with that in mind, it shouldn't be difficult to understand the uv calculation, we basically map the NDC range in [-1,1] to [0,1]; and we will need to flip the y axis, as for the texture coordinate, y increases down. And for the z value, we just read it from the clip_position.z;

now, let's see how this pipeline is set up.

the first thing is that we need to have an empty canvas to serve as the background. the order of the rendering is that we render from the front to the back order and finally we overlay what's been rendered on the background.

at the beginning of each round, we need to clear our dst texture to (0,0,0,1). This is easy to achieve using the built-in functionality of clearing color attachments when loading, without needing to draw anything. the dst texture is the layer zero for our front-to-back alpha blending. The first layer will blend with this texture, and the second layer will blend with the resulting texture and so on. and finally the dst texture should have all layers composed in the front to back order. we then apply it to a background.

you may wonder why the dst texture map is cleaned up with the value (0,0,0,1)? rather than (0,0,0,0). shouldn't the layer zero be completely transparent. yes, the first layer should be fully transparent and have the color (0,0,0,0). but recall that when doing the front-to-back blending, we utilize the alpha channel to save (1.0 - A_{dst}) rather than the A_{dst} itself. hence we assign the alpha to 1.0 here.

const renderPassCleanupDesc = {
     colorAttachments: [{
         view: dstTexture.createView(),
         clearValue: { r: 0, g: 0, b: 0, a: 1 },
@@ -288,7 +288,7 @@ 

5.5 Transparency with Depth Peeling

transparency is another example

• • •
let passEncoderCleanup = commandEncoder.beginRenderPass(renderPassCleanupDesc);
 passEncoderCleanup.setViewport(0, 0, canvas.width, canvas.height, 0, 1);
 passEncoderCleanup.end();
-
5_05_transparency/index.html:1218-1269 Clean Up the Background

An Image to Show Individual Layers
An Image to Show Individual Layers

for the actual rendering, we first define two depth maps for alternative usage. This is because for each peeling step, we want to read the depth values from one of them, and write to the other one. since we can't read and write to the same depth map, we need to have two depth maps used in an alternating pattern.

depthAttachment0 = {
+

An Image to Show Individual Layers
An Image to Show Individual Layers

for the actual rendering, we first define two depth maps for alternative usage. This is because for each peeling step, we want to read the depth values from one of them, and write to the other one. since we can't read and write to the same depth map, we need to have two depth maps used in an alternating pattern.

depthAttachment0 = {
     view: depthTexture1.createView(),
     depthClearValue: 1,
     depthLoadOp: 'clear',
@@ -342,7 +342,7 @@ 

5.5 Transparency with Depth Peeling

transparency is another example blender.encode(passEncoder1); passEncoder1.end(); } -

5_05_transparency/index.html:1148-1288 The Actual Rendering Logic

at the beginning, we set the depth maps to be all ones. because one is the maximum possible value for depth. this is to make sure during the first peeling step, we can render the front most layer.

for the color attachment, we always clear it to (0,0,0,0) for the front to back rendering.

notice that we loop the peeling step for 6 times. this is a hardcoded value meaning at most, we can peel 6 layers. if we have more layers in the scene, our program won't be able to handle. this is a draw back of the current implementation, we have to predetermine the maximum number of supported layers.

An Image to Show the Rendering Order, Front to Back and Then on the Background
An Image to Show the Rendering Order, Front to Back and Then on the Background

after rendering objects, we then call into blender to compose the newly rendered layer onto the existing layer. we will explain the blender in detail later.

struct VertexOutput {
+

at the beginning, we set the depth maps to be all ones. because one is the maximum possible value for depth. this is to make sure during the first peeling step, we can render the front most layer.

for the color attachment, we always clear it to (0,0,0,0) for the front to back rendering.

notice that we loop the peeling step for 6 times. this is a hardcoded value meaning at most, we can peel 6 layers. if we have more layers in the scene, our program won't be able to handle. this is a draw back of the current implementation, we have to predetermine the maximum number of supported layers.

An Image to Show the Rendering Order, Front to Back and Then on the Background
An Image to Show the Rendering Order, Front to Back and Then on the Background

after rendering objects, we then call into blender to compose the newly rendered layer onto the existing layer. we will explain the blender in detail later.

struct VertexOutput {
     @builtin(position) clip_position: vec4<f32>,
     @location(0) tex_coords: vec2<f32>
 };
@@ -368,7 +368,7 @@ 

5.5 Transparency with Depth Peeling

transparency is another example var color:vec4<f32> = textureSample(t_src, s, in.tex_coords); return color; } -

5_05_transparency/index.html:38-63 Blending Shader

The shader is very simple, it simply load a texture (the rendered layer) and apply it on the framebuffer. the crucial part is the alpha blending setup of the pipeline:

const colorState = {
+

The shader is very simple, it simply load a texture (the rendered layer) and apply it on the framebuffer. the crucial part is the alpha blending setup of the pipeline:

const colorState = {
     format: 'bgra8unorm',
     blend: {
         color: {
@@ -383,7 +383,7 @@ 

5.5 Transparency with Depth Peeling

transparency is another example } } }; -

5_05_transparency/index.html:387-401 Blending Function Setup

Notice that this is the same blending equation we have explained above, under the condition that the src color is already premultiplied (see the shader for object rendering).

const renderPassBlend = {
+

Notice that this is the same blending equation we have explained above, under the condition that the src color is already premultiplied (see the shader for object rendering).

const renderPassBlend = {
     colorAttachments: [{
         view: dstTexture.createView(),
         clearValue: { r: 0, g: 0, b: 0, a: 0 },
@@ -395,7 +395,7 @@ 

5.5 Transparency with Depth Peeling

transparency is another example passEncoder1.setViewport(0, 0, canvas.width, canvas.height, 0, 1); blender.encode(passEncoder1); passEncoder1.end(); -

5_05_transparency/index.html:1235-1287 Render Pass for Blending

The above is the render pass descriptor. it starts with a cleaned up dst texture. notice that the dst texture was cleaned up with (0,0,0,1) and from then on, we will not clear the texture anymore, we will load it for each blending operation, as the texture contains layers we have already composed.

struct VertexOutput {
+

The above is the render pass descriptor. it starts with a cleaned up dst texture. notice that the dst texture was cleaned up with (0,0,0,1) and from then on, we will not clear the texture anymore, we will load it for each blending operation, as the texture contains layers we have already composed.

struct VertexOutput {
     @builtin(position) clip_position: vec4<f32>,
     @location(0) tex_coords: vec2<f32>
 };
@@ -438,7 +438,7 @@ 

5.5 Transparency with Depth Peeling

transparency is another example

• • •
let finalEncoder = commandEncoder.beginRenderPass(renderPassFinal);
 final.encode(finalEncoder);
 finalEncoder.end();
-
5_05_transparency/index.html:68-1291 Final Step

finally, in the very last step, we render the composed layers onto a black background using the back-to-front blending. Since our color is premultiplied, we will be using the C_src+(1-A_src)*C_dst formula.

Image to Show Rendering Result
Image to Show Rendering Result

+
5_05_transparency/index.html:68-1291 Final Step

finally, in the very last step, we render the composed layers onto a black background using the back-to-front blending. Since our color is premultiplied, we will be using the C_src+(1-A_src)*C_dst formula.

Image to Show Rendering Result
Image to Show Rendering Result

1.3 Applying Hardcoded Vertex Colors

Welcome to our fourth tutorial! out.color = vec3<f32>(0.0, 0.0, 1.0); return out; } -

1_03_vertex_color/index.html:9-22 Vertex Stage

Let's examine the changes. In the vertex output struct, we've introduced a new field called color. Since there are no built-ins for vertex color, we use @location(0) to store it. At the end of the vertex stage, we assign a hard-coded color to out.color.

@fragment
+

Let's examine the changes. In the vertex output struct, we've introduced a new field called color. Since there are no built-ins for vertex color, we use @location(0) to store it. At the end of the vertex stage, we assign a hard-coded color to out.color.

@fragment
 fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> {
     return vec4<f32>(in.color, 1.0);
 }
-

In the fragment stage, we can now access this color from the input and directly pass it as the output. Despite this change, you'll observe the same triangle being rendered.

Now, let's consider an important aspect of GPU rendering. In the vertex stage, we process individual vertices, while in the fragment stage, we deal with individual fragments. A fragment is conceptually similar to a pixel but can contain rich metadata such as depth and other values.

Between the vertex and fragment stages, there's an automatic process called rasterization, handled by the GPU. This process converts geometry data to fragments.

Here's an interesting question to ponder: If we assign a different color to each vertex, how will the GPU assign colors to the fragments, especially for those fragments that lie in the middle of the triangle and not directly on any of the vertices?

I encourage you to modify the sample code and experiment with this concept yourself. Try assigning different colors to each vertex and observe how the GPU interpolates these colors across the triangle's surface. This exercise will deepen your understanding of how data flows through the GPU pipeline and how the interpolation process works.

+
1_03_vertex_color/index.html:24-27 Fragment Stage

In the fragment stage, we can now access this color from the input and directly pass it as the output. Despite this change, you'll observe the same triangle being rendered.

Now, let's consider an important aspect of GPU rendering. In the vertex stage, we process individual vertices, while in the fragment stage, we deal with individual fragments. A fragment is conceptually similar to a pixel but can contain rich metadata such as depth and other values.

Between the vertex and fragment stages, there's an automatic process called rasterization, handled by the GPU. This process converts geometry data to fragments.

Here's an interesting question to ponder: If we assign a different color to each vertex, how will the GPU assign colors to the fragments, especially for those fragments that lie in the middle of the triangle and not directly on any of the vertices?

I encourage you to modify the sample code and experiment with this concept yourself. Try assigning different colors to each vertex and observe how the GPU interpolates these colors across the triangle's surface. This exercise will deepen your understanding of how data flows through the GPU pipeline and how the interpolation process works.

1.0 Creating an Empty Canvas

Creating an empty canvas might initiall console.error("Failed to request Adapter."); return; } -

1_00_empty_canvas/index.html:8-18 Obtain an Adapter

Following this, we proceed to acquire an adapter through navigator.gpu and subsequently obtain a device via the adapter. Admittedly, this process might appear somewhat verbose in comparison to WebGL, where a single handle (referred to as glContext) suffices for interaction. Here, navigator.gpu serves as the entry point to the WebGPU realm. An adapter, in essence, is an abstraction of a software component that implements the WebGPU API. It draws a parallel to the concept of a driver introduced earlier. However, considering that WebGPU is essentially an API implemented by web browsers rather than directly provided by GPU drivers, the adapter can be envisioned as the WebGPU software layer within the browser. In Chrome's case, the adapter is provided by the "Dawn" subsystem. It's worth noting that multiple adapters can be available, offering diverse implementations from different vendors or even including debug-oriented dummy adapters that generate verbose debug logs without actual rendering capabilities. Subsequently, the adapter yields a device, which is an instantiation of that adapter. An analogy can be drawn here to JavaScript, where an adapter can be likened to a class, and a device, an object instantiated from that class.

The specification emphasizes the need to request a device shortly after an adapter request, as adapters have a limited validity duration. While the inner workings of adapter invalidation remain somewhat obscure without knowing the inter workings, it's not a critical concern for software developers. An instance of adapter invalidation is cited in the specification: unplugging the power supply of a laptop can render an adapter invalid. When a laptop transitions to battery mode, the operating system might activate power-saving measures that invalidate certain GPU functions. Some laptops even boast dual GPUs for distinct power states, which can trigger similar invalidations during switches between them. Other reasons for this behavior, per the specification, include driver updates, etc.

Typically, when requesting a device, we need to specify a set of desired features. The adapter then responds with a matching device. This process can be likened to providing parameters to a class constructor. For this example, however, I'm opting to request the default device. In the forthcoming chapters, I'll discuss querying devices using feature flags, providing more comprehensive examples.

let device = await adapter.requestDevice();
+

Following this, we proceed to acquire an adapter through navigator.gpu and subsequently obtain a device via the adapter. Admittedly, this process might appear somewhat verbose in comparison to WebGL, where a single handle (referred to as glContext) suffices for interaction. Here, navigator.gpu serves as the entry point to the WebGPU realm. An adapter, in essence, is an abstraction of a software component that implements the WebGPU API. It draws a parallel to the concept of a driver introduced earlier. However, considering that WebGPU is essentially an API implemented by web browsers rather than directly provided by GPU drivers, the adapter can be envisioned as the WebGPU software layer within the browser. In Chrome's case, the adapter is provided by the "Dawn" subsystem. It's worth noting that multiple adapters can be available, offering diverse implementations from different vendors or even including debug-oriented dummy adapters that generate verbose debug logs without actual rendering capabilities. Subsequently, the adapter yields a device, which is an instantiation of that adapter. An analogy can be drawn here to JavaScript, where an adapter can be likened to a class, and a device, an object instantiated from that class.

The specification emphasizes the need to request a device shortly after an adapter request, as adapters have a limited validity duration. While the inner workings of adapter invalidation remain somewhat obscure without knowing the inter workings, it's not a critical concern for software developers. An instance of adapter invalidation is cited in the specification: unplugging the power supply of a laptop can render an adapter invalid. When a laptop transitions to battery mode, the operating system might activate power-saving measures that invalidate certain GPU functions. Some laptops even boast dual GPUs for distinct power states, which can trigger similar invalidations during switches between them. Other reasons for this behavior, per the specification, include driver updates, etc.

Typically, when requesting a device, we need to specify a set of desired features. The adapter then responds with a matching device. This process can be likened to providing parameters to a class constructor. For this example, however, I'm opting to request the default device. In the forthcoming chapters, I'll discuss querying devices using feature flags, providing more comprehensive examples.

let device = await adapter.requestDevice();
 if (!device) {
     console.error("Failed to request Device.");
     return;
@@ -206,7 +206,7 @@ 

1.0 Creating an Empty Canvas

Creating an empty canvas might initiall }; context.configure(canvasConfig); -

With the device acquired, the next step is to configure the context to ensure the canvas is appropriately set up. This involves specifying the color format, transparency preferences, and a few other options. Context configuration is achieved by providing a canvas configuration structure. In this instance, we'll focus on the essentials.

The format parameter dictates the pixel format used for rendering outcomes on the canvas. We'll use the default format for now. The usage parameter pertains to the "buffer usage" of the texture provided by the canvas. Here, we designate RENDER_ATTACHMENT to signify that this canvas serves as the rendering destination. We will address the intricacies of buffer usage in upcoming chapters. Lastly, the alphaMode parameter offers a toggle for adjusting the canvas's transparency.

let colorTexture = context.getCurrentTexture();
+

With the device acquired, the next step is to configure the context to ensure the canvas is appropriately set up. This involves specifying the color format, transparency preferences, and a few other options. Context configuration is achieved by providing a canvas configuration structure. In this instance, we'll focus on the essentials.

The format parameter dictates the pixel format used for rendering outcomes on the canvas. We'll use the default format for now. The usage parameter pertains to the "buffer usage" of the texture provided by the canvas. Here, we designate RENDER_ATTACHMENT to signify that this canvas serves as the rendering destination. We will address the intricacies of buffer usage in upcoming chapters. Lastly, the alphaMode parameter offers a toggle for adjusting the canvas's transparency.

let colorTexture = context.getCurrentTexture();
 let colorTextureView = colorTexture.createView();
 
 let colorAttachment = {
@@ -219,7 +219,7 @@ 

1.0 Creating an Empty Canvas

Creating an empty canvas might initiall const renderPassDesc = { colorAttachments: [colorAttachment] }; -

Moving forward, our focus shifts to configuring a render pass. A render pass acts as a container for the designated rendering targets, encompassing elements like color images and depth images. To use an analogy, a rendering target is like a piece of paper we want to draw on. But how is it different from the canvas we just configured?

If you have used Photoshop before, think of the canvas as an image document containing multiple layers. Each layer can be likened to a rendering target. Similarly, in 3D rendering, we sometimes can't accomplish rendering using a single layer, so we render multiple times. Each rendering session, called a rendering pass, outputs the result to a dedicated rendering target. In the end, we combine these results and display them on the canvas.

Our first step involves obtaining a texture from the canvas. In rendering systems, this process is often implemented through a swap chain—a list of buffers facilitating rendering across multiple frames. The graphics subsystem recycles these buffers to eliminate the need for constant buffer creation. Consequently, before initiating rendering, we must procure an available buffer (texture) from the canvas.

Following this, we generate a view linked to the texture. You might wonder about the distinction between a texture and a texture view. Contrary to popular belief, a texture isn't necessarily a single image; it can encompass multiple images. For example, in the context of mipmaps, each mipmap level qualifies as an individual image. if mipmap is a concept new to you, it is a pyrimid of the same image at different levels of details. mipmap is very useful for improving texture map sampling quality. We'll discuss mipmaps in later chapters. The key point is that a texture isn't synonymous with an image, and in this context, we need a single image (a view) as our rendering target.

We then create a colorAttachment, which acts as the color target within the render pass. A color attachment can be thought of as a buffer that holds color information or pixels. +

1_00_empty_canvas/index.html:36-48 Create a Render Target

Moving forward, our focus shifts to configuring a render pass. A render pass acts as a container for the designated rendering targets, encompassing elements like color images and depth images. To use an analogy, a rendering target is like a piece of paper we want to draw on. But how is it different from the canvas we just configured?

If you have used Photoshop before, think of the canvas as an image document containing multiple layers. Each layer can be likened to a rendering target. Similarly, in 3D rendering, we sometimes can't accomplish rendering using a single layer, so we render multiple times. Each rendering session, called a rendering pass, outputs the result to a dedicated rendering target. In the end, we combine these results and display them on the canvas.

Our first step involves obtaining a texture from the canvas. In rendering systems, this process is often implemented through a swap chain—a list of buffers facilitating rendering across multiple frames. The graphics subsystem recycles these buffers to eliminate the need for constant buffer creation. Consequently, before initiating rendering, we must procure an available buffer (texture) from the canvas.

Following this, we generate a view linked to the texture. You might wonder about the distinction between a texture and a texture view. Contrary to popular belief, a texture isn't necessarily a single image; it can encompass multiple images. For example, in the context of mipmaps, each mipmap level qualifies as an individual image. if mipmap is a concept new to you, it is a pyrimid of the same image at different levels of details. mipmap is very useful for improving texture map sampling quality. We'll discuss mipmaps in later chapters. The key point is that a texture isn't synonymous with an image, and in this context, we need a single image (a view) as our rendering target.

We then create a colorAttachment, which acts as the color target within the render pass. A color attachment can be thought of as a buffer that holds color information or pixels. While we previously compared a rendering target to a piece of paper, it often consists of multiple buffers, not just one. These additional buffers act as scratch spaces for various purposes and are typically invisible, storing data that may not necessarily represent pixels. A common example is a depth buffer, used to determine which pixels are closest to the viewer, enabling effects like occlusion. Although we could include a depth buffer in this setup, our simple example only aims to clear the canvas with a solid color, making a depth buffer unnecessary.

Let's break down the parameters of colorAttachment:

commandEncoder = device.createCommandEncoder();
 
 passEncoder = commandEncoder.beginRenderPass(renderPassDesc);
@@ -227,7 +227,7 @@ 

1.0 Creating an Empty Canvas

Creating an empty canvas might initiall passEncoder.end(); device.queue.submit([commandEncoder.finish()]); -

In the final stages, we create a command and submit it to the GPU for execution. This particular command is straightforward: it sets the viewport dimensions to match those of the canvas. Since we're not drawing anything, the rendering target will simply be cleared with the default clearValue, as specified by our loadOp.

During development, it's advisable to use distinctive colors for debugging purposes. In this case, we choose red instead of the more conventional black or white. This decision is strategic: black and white are common default colors used in many contexts. For instance, the default webpage background is typically white. Using white as the clear color could be misleading, potentially obscuring whether rendering is actually occurring or if the canvas is missing altogether. By opting for a vibrant red, we ensure a clear visual indicator that rendering operations are indeed taking place.

This approach provides an unambiguous signal of successful execution, making it easier to identify and troubleshoot any issues that may arise during the development process.

Debugging GPU code is significantly more challenging than CPU code. Generating logs from GPU execution is complex due to the parallel nature of GPU operations. This complexity also makes traditional debugging methods, such as setting breakpoints and pausing execution, impractical. In this context, color becomes an invaluable debugging tool. By associating distinct colors with different meanings, we can enhance our ability to interpret results accurately. As we progress through subsequent chapters, we'll explore various examples demonstrating how colors serve as an essential debugging aid in GPU programming.

In addition, experienced graphics programmers employ other strategies to enhance code readability, maintainability and debuggability.

  1. Descriptive variable naming: Graphics APIs can be verbose, with seemingly repetitive code blocks throughout the source. Using detailed, descriptive names for variables helps identify and navigate the code efficiently.

  2. Incremental development: It's advisable to start simple and gradually build complexity. Often, this means rendering solid color objects first before adding more sophisticated effects.

  3. Consistent coding patterns: Establishing and following consistent patterns in your code can significantly improve readability and reduce errors.

  4. Modular design: Breaking down complex rendering tasks into smaller, manageable functions or modules can make the code easier to understand and maintain.

By adopting these practices, developers can create more robust, readable, and easily debuggable GPU code, even in the face of the unique challenges presented by graphics programming.

1_00_empty_canvas/index.html:49-55 Encode and Submit a Command

In the final stages, we create a command and submit it to the GPU for execution. This particular command is straightforward: it sets the viewport dimensions to match those of the canvas. Since we're not drawing anything, the rendering target will simply be cleared with the default clearValue, as specified by our loadOp.

During development, it's advisable to use distinctive colors for debugging purposes. In this case, we choose red instead of the more conventional black or white. This decision is strategic: black and white are common default colors used in many contexts. For instance, the default webpage background is typically white. Using white as the clear color could be misleading, potentially obscuring whether rendering is actually occurring or if the canvas is missing altogether. By opting for a vibrant red, we ensure a clear visual indicator that rendering operations are indeed taking place.

This approach provides an unambiguous signal of successful execution, making it easier to identify and troubleshoot any issues that may arise during the development process.

Debugging GPU code is significantly more challenging than CPU code. Generating logs from GPU execution is complex due to the parallel nature of GPU operations. This complexity also makes traditional debugging methods, such as setting breakpoints and pausing execution, impractical. In this context, color becomes an invaluable debugging tool. By associating distinct colors with different meanings, we can enhance our ability to interpret results accurately. As we progress through subsequent chapters, we'll explore various examples demonstrating how colors serve as an essential debugging aid in GPU programming.

In addition, experienced graphics programmers employ other strategies to enhance code readability, maintainability and debuggability.

  1. Descriptive variable naming: Graphics APIs can be verbose, with seemingly repetitive code blocks throughout the source. Using detailed, descriptive names for variables helps identify and navigate the code efficiently.

  2. Incremental development: It's advisable to start simple and gradually build complexity. Often, this means rendering solid color objects first before adding more sophisticated effects.

  3. Consistent coding patterns: Establishing and following consistent patterns in your code can significantly improve readability and reduce errors.

  4. Modular design: Breaking down complex rendering tasks into smaller, manageable functions or modules can make the code easier to understand and maintain.

By adopting these practices, developers can create more robust, readable, and easily debuggable GPU code, even in the face of the unique challenges presented by graphics programming.

Launch Playground - 1_00_empty_canvas

The code in this chapter produces an empty canvas renderred in red. Please use the playground to interact with the code. Try changing the background to a different color.

1.5 Drawing a Colored Triangle with a Single Buffer

In our previous offset: 4 * 3, format: 'float32x3' }; -

1_05_colored_triangle_with_a_single_buffer/index.html:52-62 Modified Color Attribute Descriptor With an Offset

First, we modify the color attribute descriptor. We introduce an offset because we're interleaving position and color data. The offset signifies the beginning of the first color data within the buffer. Since we've placed color after vertex positions, the offset is set to 12 bytes (4 bytes * 3), which is the size of a vertex position vector.

const positionColorBufferLayoutDesc = {
+

First, we modify the color attribute descriptor. We introduce an offset because we're interleaving position and color data. The offset signifies the beginning of the first color data within the buffer. Since we've placed color after vertex positions, the offset is set to 12 bytes (4 bytes * 3), which is the size of a vertex position vector.

const positionColorBufferLayoutDesc = {
     attributes: [positionAttribDesc, colorAttribDesc],
     arrayStride: 4 * 6, // sizeof(float) * 3
     stepMode: 'vertex'
 };
-

Secondly, instead of creating separate buffers for positions and colors, we now use a single buffer called positionColorBuffer. When creating the descriptor for this buffer, we include both attributes in the attribute list. The arrayStride is set to 24 bytes (4 * 6) instead of 12, because each vertex now has 6 float numbers associated with it (3 for position, 3 for color).

const positionColors = new Float32Array([
+

Secondly, instead of creating separate buffers for positions and colors, we now use a single buffer called positionColorBuffer. When creating the descriptor for this buffer, we include both attributes in the attribute list. The arrayStride is set to 24 bytes (4 * 6) instead of 12, because each vertex now has 6 float numbers associated with it (3 for position, 3 for color).

const positionColors = new Float32Array([
     1.0, -1.0, 0.0, // position
     1.0, 0.0, 0.0, // 🔴
     -1.0, -1.0, 0.0, 
@@ -182,7 +182,7 @@ 

1.5 Drawing a Colored Triangle with a Single Buffer

In our previous ]); let positionColorBuffer = createGPUBuffer(device, positionColors, GPUBufferUsage.VERTEX); -

When creating data for this buffer, we supply 18 floating-point numbers (3 vertices * 6 floats per vertex), with positions and colors interleaved:

const pipelineDesc = {
+

When creating data for this buffer, we supply 18 floating-point numbers (3 vertices * 6 floats per vertex), with positions and colors interleaved:

const pipelineDesc = {
     layout,
     vertex: {
         module: shaderModule,
@@ -206,7 +206,7 @@ 

1.5 Drawing a Colored Triangle with a Single Buffer

In our previous passEncoder.setVertexBuffer(0, positionColorBuffer); passEncoder.draw(3, 1); passEncoder.end(); -

In the pipeline descriptor, we now only have one buffer descriptor in the buffers field. When encoding the render command, we set only one buffer at slot zero:

This program is functionally equivalent to the previous one, but it uses a single buffer instead of two. Using a single buffer in this case is more efficient because it avoids creating extra resources and eliminates the need to copy data twice from the CPU to the GPU. Transferring data from CPU to GPU incurs latency, so minimizing these transfers is beneficial for performance.

You might wonder when to use multiple buffers versus a single buffer. It depends on how frequently the attributes change. In this example, both the positions and colors remain constant throughout execution, making a single buffer suitable. However, if we need to update vertex colors frequently, perhaps in an animation, separating the attributes into different buffers might be preferable. This way, we can update color data without transferring the unchanged position data to the GPU.

+
1_05_colored_triangle_with_a_single_buffer/index.html:84-123 Pipeline Descriptor and Draw Command Formation

In the pipeline descriptor, we now only have one buffer descriptor in the buffers field. When encoding the render command, we set only one buffer at slot zero:

This program is functionally equivalent to the previous one, but it uses a single buffer instead of two. Using a single buffer in this case is more efficient because it avoids creating extra resources and eliminates the need to copy data twice from the CPU to the GPU. Transferring data from CPU to GPU incurs latency, so minimizing these transfers is beneficial for performance.

You might wonder when to use multiple buffers versus a single buffer. It depends on how frequently the attributes change. In this example, both the positions and colors remain constant throughout execution, making a single buffer suitable. However, if we need to update vertex colors frequently, perhaps in an animation, separating the attributes into different buffers might be preferable. This way, we can update color data without transferring the unchanged position data to the GPU.

1.1 Drawing a Triangle

Our first tutorial is a bit boring as we were fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> { return vec4<f32>(0.3, 0.2, 0.1, 1.0); } -

1_01_triangle/index.html:8-26 Shader to Render a Triangle

Our first shader renders a triangle in a solid color. Despite sounding simple, the code may seem complex at first glance. Let's dissect it to understand its components better.

Shader programs define the behavior of a GPU pipeline. A GPU pipeline works like a small factory, containing a series of stages or workshops. A typical GPU pipeline consists of two main stages:

  1. Vertex Stage: Processes geometry data and generates canvas-aligned geometries.

  2. Fragment Stage: After the GPU converts the output from the vertex stage into fragments, the fragment shader assign them a color.

In our shader code, there are two entry functions:

While the input to the vs_main function @builtin(vertex_index) in_vertex_index: u32 looks similar to function parameters in languages like C, it's different. Here, in_vertex_index is the variable name, and u32 is the type (a 32-bit unsigned integer). The @builtin(vertex_index) is a special decorator that requires explanation.

In WGSL, shader inputs aren't truly function parameters. Instead, imagine a predefined form with several fields, each with a label. @builtin(vertex_index) is one such label. For a pipeline stage's inputs, we can't freely feed any data; we must select fields from this predefined set. In this case, @builtin(vertex_index) is the actual parameter name, and in_vertex_index is just an alias we've given it.

The @builtin decorator indicates a group of predefined fields. We'll encounter other decorators like @location, which we'll discuss later to understand their differences.

Shader stage outputs follow a similar principle. We can't output arbitrary data; instead, we populate a few predefined fields. In our example, we're outputting a struct VertexOutput, which appears custom-defined. However, it contains a single predefined field @builtin(position), where we'll write our result.

The Screen Space Coordinate System.
The Screen Space Coordinate System.

The content of the vertex shader may seem puzzling at first. Before we delve into it, let me explain the primary goal of a vertex shader. A vertex shader receives geometries as individual vertices. At this stage, we lack geometry connectivity information, meaning we don't know which vertices connect to form triangles. This information is not available to us. We process individual vertices with the aim of converting their positions to align with the canvas.

Without this conversion, the vertices wouldn't be visible correctly. Vertex positions, as received by a vertex shader, are typically defined in their own coordinate system. To display them on the canvas, we must unify the coordinate systems used by the input vertices into the canvas' coordinate system. Additionally, vertices can exist in 3D space, while the canvas is always 2D. In computer graphics, the process of transforming 3D coordinates into 2D is called projection.

Now, let's examine the coordinate system of the canvas. This system is usually referred to as the screen space or clip space. Although in WebGPU we typically render to a canvas rather than directly to a screen, the term "screen space coordinate system" is inherited from other native 3D APIs.

The screen space coordinate system has its origin at the center, with both x and y coordinates confined within the [-1, 1] range. This coordinate system remains constant regardless of your screen or canvas size.

Recall from the previous tutorial that we can define a viewport, but this doesn't affect the coordinate system. This may seem counter-intuitive. The screen space coordinate system remains unchanged regardless of your viewport definition. A vertex is visible as long as its coordinates fall within the [-1, 1] range. The rendering pipeline automatically stretches the screen space coordinate system to match your defined viewport. For instance, if you have a viewport of 640x480, even though the aspect ratio is 4:3, the canvas coordinate system still spans [-1, 1] for both x and y. If you draw a vertex at location (1, 1), it will appear at the upper right corner. However, when presented on the canvas, the location (1, 1) will be stretched to (640, 0).

The Visible Area Remains Constant Regardless of Screen Size or Aspect Ratio.
The Visible Area Remains Constant Regardless of Screen Size or Aspect Ratio.

In the code above, our inputs are vertex indices rather than vertex positions. Since a triangle has three vertices, the indices are 0, 1, and 2. Without vertex positions as input, we generate their positions based on these indices, instead of performing vertex position transformation. Our goal is to generate a unique position for each index while ensuring that the position falls within the [-1, 1] range, making the entire triangle visible. If we substitute 0, 1, 2 for vertex_index, we'll get the positions (0.5, -0.5), (0, 0.5), and (-0.5, -0.5) respectively.

let x = f32(1 - i32(in_vertex_index)) * 0.5;
+
1_01_triangle/index.html:8-26 Shader to Render a Triangle

Our first shader renders a triangle in a solid color. Despite sounding simple, the code may seem complex at first glance. Let's dissect it to understand its components better.

Shader programs define the behavior of a GPU pipeline. A GPU pipeline works like a small factory, containing a series of stages or workshops. A typical GPU pipeline consists of two main stages:

  1. Vertex Stage: Processes geometry data and generates canvas-aligned geometries.

  2. Fragment Stage: After the GPU converts the output from the vertex stage into fragments, the fragment shader assign them a color.

In our shader code, there are two entry functions:

While the input to the vs_main function @builtin(vertex_index) in_vertex_index: u32 looks similar to function parameters in languages like C, it's different. Here, in_vertex_index is the variable name, and u32 is the type (a 32-bit unsigned integer). The @builtin(vertex_index) is a special decorator that requires explanation.

In WGSL, shader inputs aren't truly function parameters. Instead, imagine a predefined form with several fields, each with a label. @builtin(vertex_index) is one such label. For a pipeline stage's inputs, we can't freely feed any data; we must select fields from this predefined set. In this case, @builtin(vertex_index) is the actual parameter name, and in_vertex_index is just an alias we've given it.

The @builtin decorator indicates a group of predefined fields. We'll encounter other decorators like @location, which we'll discuss later to understand their differences.

Shader stage outputs follow a similar principle. We can't output arbitrary data; instead, we populate a few predefined fields. In our example, we're outputting a struct VertexOutput, which appears custom-defined. However, it contains a single predefined field @builtin(position), where we'll write our result.

The Screen Space Coordinate System.
The Screen Space Coordinate System.

The content of the vertex shader may seem puzzling at first. Before we delve into it, let me explain the primary goal of a vertex shader. A vertex shader receives geometries as individual vertices. At this stage, we lack geometry connectivity information, meaning we don't know which vertices connect to form triangles. This information is not available to us. We process individual vertices with the aim of converting their positions to align with the canvas.

Without this conversion, the vertices wouldn't be visible correctly. Vertex positions, as received by a vertex shader, are typically defined in their own coordinate system. To display them on the canvas, we must unify the coordinate systems used by the input vertices into the canvas' coordinate system. Additionally, vertices can exist in 3D space, while the canvas is always 2D. In computer graphics, the process of transforming 3D coordinates into 2D is called projection.

Now, let's examine the coordinate system of the canvas. This system is usually referred to as the screen space or clip space. Although in WebGPU we typically render to a canvas rather than directly to a screen, the term "screen space coordinate system" is inherited from other native 3D APIs.

The screen space coordinate system has its origin at the center, with both x and y coordinates confined within the [-1, 1] range. This coordinate system remains constant regardless of your screen or canvas size.

Recall from the previous tutorial that we can define a viewport, but this doesn't affect the coordinate system. This may seem counter-intuitive. The screen space coordinate system remains unchanged regardless of your viewport definition. A vertex is visible as long as its coordinates fall within the [-1, 1] range. The rendering pipeline automatically stretches the screen space coordinate system to match your defined viewport. For instance, if you have a viewport of 640x480, even though the aspect ratio is 4:3, the canvas coordinate system still spans [-1, 1] for both x and y. If you draw a vertex at location (1, 1), it will appear at the upper right corner. However, when presented on the canvas, the location (1, 1) will be stretched to (640, 0).

The Visible Area Remains Constant Regardless of Screen Size or Aspect Ratio.
The Visible Area Remains Constant Regardless of Screen Size or Aspect Ratio.

In the code above, our inputs are vertex indices rather than vertex positions. Since a triangle has three vertices, the indices are 0, 1, and 2. Without vertex positions as input, we generate their positions based on these indices, instead of performing vertex position transformation. Our goal is to generate a unique position for each index while ensuring that the position falls within the [-1, 1] range, making the entire triangle visible. If we substitute 0, 1, 2 for vertex_index, we'll get the positions (0.5, -0.5), (0, 0.5), and (-0.5, -0.5) respectively.

let x = f32(1 - i32(in_vertex_index)) * 0.5;
 let y = f32(i32(in_vertex_index & 1u) * 2 - 1) * 0.5;

A clip location (a position in the clip space) is represented by a 4-float vector, not just 2. For our 2D triangle in screen space, the third component is always zero. The last value is set to 1.0. We'll delve into the details of the last two values when we explore camera and matrix transformations later.

As mentioned earlier, the outputs of the vertex stage undergo rasterization. This process generates fragments with interpolated vertex values. In our simple example, the only interpolated value is the vertex position.

The fragment shader's output is defined by another predefined field called @location(0). Each location can store up to 16 bytes of data, equivalent to four 32-bit floats. The total number of available locations is determined by the specific WebGPU implementation.

To understand the distinction between locations and builtins, we can consider locations as unstructured custom data. They have no labels other than an index. This concept parallels the HTTP protocol, where we have a structured message header (akin to builtins) followed by a body or payload (similar to locations) that can contain arbitrary data. If you're familiar with decoding binary files, it's comparable to having a structured header with metadata, followed by a chunk of data as the payload. In our context, builtins and locations share this conceptual structure.

Our fragment shader in this example is straightforward: it simply outputs a solid color to @location(0).

let code = document.getElementById('shader').innerText;
 const shaderDesc = { code: code };
 let shaderModule = device.createShaderModule(shaderDesc);
-

Writing the shader code is just one part of rendering a simple triangle. Let's now examine how to modify the pipeline to incorporate this shader code. The process involves several steps:

  1. We retrieve the shader code string from our first script tag. This is where the tag's id='shader' attribute becomes crucial.

  2. We construct a shader description object that includes the source code.

  3. We create a shader module by providing the shader description to the WebGPU API.

It's worth noting that we haven't implemented error handling in this example. If a compilation error occurs, we'll end up with an invalid shader module. In such cases, the browser's console messages can be extremely helpful for debugging.

Typically, shader code is defined by developers during the development stage, and it's likely that all shader issues will be resolved before the code is deployed. For this reason, we've omitted error handling in this basic example. However, in a production environment, implementing robust error checking would be advisable.

const pipelineLayoutDesc = { bindGroupLayouts: [] };
+

Writing the shader code is just one part of rendering a simple triangle. Let's now examine how to modify the pipeline to incorporate this shader code. The process involves several steps:

  1. We retrieve the shader code string from our first script tag. This is where the tag's id='shader' attribute becomes crucial.

  2. We construct a shader description object that includes the source code.

  3. We create a shader module by providing the shader description to the WebGPU API.

It's worth noting that we haven't implemented error handling in this example. If a compilation error occurs, we'll end up with an invalid shader module. In such cases, the browser's console messages can be extremely helpful for debugging.

Typically, shader code is defined by developers during the development stage, and it's likely that all shader issues will be resolved before the code is deployed. For this reason, we've omitted error handling in this basic example. However, in a production environment, implementing robust error checking would be advisable.

const pipelineLayoutDesc = { bindGroupLayouts: [] };
 const layout = device.createPipelineLayout(pipelineLayoutDesc);
-

Next, we define the pipeline layout. But what exactly is a pipeline layout? It refers to the structure of constants we intend to provide to the pipeline. Each layout represents a group of constants we want to feed into the pipeline.

A pipeline can have multiple groups of constants, which is why bindGroupLayouts is defined as a list. These constants maintain their values throughout the execution of the pipeline.

In our current example, we're not providing any constants at all. Consequently, our pipeline layout is empty.

const colorState = {
+

Next, we define the pipeline layout. But what exactly is a pipeline layout? It refers to the structure of constants we intend to provide to the pipeline. Each layout represents a group of constants we want to feed into the pipeline.

A pipeline can have multiple groups of constants, which is why bindGroupLayouts is defined as a list. These constants maintain their values throughout the execution of the pipeline.

In our current example, we're not providing any constants at all. Consequently, our pipeline layout is empty.

const colorState = {
     format: 'bgra8unorm'
 };
-

The next step in our pipeline configuration is to specify the output pixel format. In this case, we're using bgra8unorm. This format defines how we'll populate our rendering target. To elaborate, bgra8unorm stands for:

const pipelineDesc = {
+

The next step in our pipeline configuration is to specify the output pixel format. In this case, we're using bgra8unorm. This format defines how we'll populate our rendering target. To elaborate, bgra8unorm stands for:

const pipelineDesc = {
     layout,
     vertex: {
         module: shaderModule,
@@ -207,7 +207,7 @@ 

1.1 Drawing a Triangle

Our first tutorial is a bit boring as we were }; pipeline = device.createRenderPipeline(pipelineDesc); -

Having assembled all necessary components, we can now create the pipeline. A GPU pipeline, analogous to a real factory pipeline, consists of inputs, a series of processing stages, and final outputs. In this analogy, layout and primitive describe the input data formats. As previously mentioned, layout refers to the constants, while primitive specifies how the geometry primitives should be provided.

Typically, the actual input data is supplied through buffers. These buffers normally contain vertex data, including vertex positions and other attributes such as vertex colors and texture coordinates. However, in our current example, we don't use any buffers. Instead of feeding vertex positions directly, we derive them in the vertex shader stage from vertex indices. These indices are automatically provided by the GPU pipeline to the vertex shader.

Typically, we provide input geometry as a list of vertices without explicit connectivity information, rather than as complete 3D graphic elements like triangles. The pipeline reconstructs triangles from these vertices based on the topology field. For instance, if the topology is set to triangle-list, it indicates that the vertex list represents triangle vertices in either counter-clockwise or clockwise order. Each triangle has a front side and a back side, with the vertex order defining the direction of the triangle's front face frontFace: 'ccw'.

The cullMode parameter determines whether we want to eliminate the rendering of a particular side of the triangle. Setting it to back means we choose not to render the back side of triangles. In most cases, the back sides of triangles shouldn't be rendered, and omitting them can save computational resources.

Using a triangle list topology is the most straightforward way of representing triangles, but it's not always the most efficient method. As illustrated in the following diagram, when we want to render a strip formed by connected triangles, many of its vertices are shared by more than one triangle.

Two Triangles Form a Triangle Strip. In a Right-Handed System, Positive Vertex Order Is Counterclockwise Around the Triangle Normal
Two Triangles Form a Triangle Strip. In a Right-Handed System, Positive Vertex Order Is Counterclockwise Around the Triangle Normal

In such cases, we want to reuse vertex positions for multiple triangles, rather than redundantly sending the same position multiple times for different triangles. This is where a triangle-strip topology becomes a better choice. It allows us to define a series of connected triangles more efficiently, reducing data redundancy and potentially improving rendering performance. We will explore other topology types in future chapters.

commandEncoder = device.createCommandEncoder();
+

Having assembled all necessary components, we can now create the pipeline. A GPU pipeline, analogous to a real factory pipeline, consists of inputs, a series of processing stages, and final outputs. In this analogy, layout and primitive describe the input data formats. As previously mentioned, layout refers to the constants, while primitive specifies how the geometry primitives should be provided.

Typically, the actual input data is supplied through buffers. These buffers normally contain vertex data, including vertex positions and other attributes such as vertex colors and texture coordinates. However, in our current example, we don't use any buffers. Instead of feeding vertex positions directly, we derive them in the vertex shader stage from vertex indices. These indices are automatically provided by the GPU pipeline to the vertex shader.

Typically, we provide input geometry as a list of vertices without explicit connectivity information, rather than as complete 3D graphic elements like triangles. The pipeline reconstructs triangles from these vertices based on the topology field. For instance, if the topology is set to triangle-list, it indicates that the vertex list represents triangle vertices in either counter-clockwise or clockwise order. Each triangle has a front side and a back side, with the vertex order defining the direction of the triangle's front face frontFace: 'ccw'.

The cullMode parameter determines whether we want to eliminate the rendering of a particular side of the triangle. Setting it to back means we choose not to render the back side of triangles. In most cases, the back sides of triangles shouldn't be rendered, and omitting them can save computational resources.

Using a triangle list topology is the most straightforward way of representing triangles, but it's not always the most efficient method. As illustrated in the following diagram, when we want to render a strip formed by connected triangles, many of its vertices are shared by more than one triangle.

Two Triangles Form a Triangle Strip. In a Right-Handed System, Positive Vertex Order Is Counterclockwise Around the Triangle Normal
Two Triangles Form a Triangle Strip. In a Right-Handed System, Positive Vertex Order Is Counterclockwise Around the Triangle Normal

In such cases, we want to reuse vertex positions for multiple triangles, rather than redundantly sending the same position multiple times for different triangles. This is where a triangle-strip topology becomes a better choice. It allows us to define a series of connected triangles more efficiently, reducing data redundancy and potentially improving rendering performance. We will explore other topology types in future chapters.

commandEncoder = device.createCommandEncoder();
 
 passEncoder = commandEncoder.beginRenderPass(renderPassDesc);
 passEncoder.setViewport(0, 0, canvas.width, canvas.height, 0, 1);
@@ -216,7 +216,7 @@ 

1.1 Drawing a Triangle

Our first tutorial is a bit boring as we were passEncoder.end(); device.queue.submit([commandEncoder.finish()]); -

With the pipeline defined, we need to create the colorAttachment, which is similar to what we covered in the first tutorial, so I'll omit the details here. After that, the final step is command creation and submission. This process is nearly identical to what we've done before, with the key differences being the use of our newly created pipeline and the invocation of the draw() function.

The draw() function triggers the rendering process. The first parameter specifies the number of vertices we want to render, and the second parameter indicates the instance count. Since we are rendering a single triangle, the total number of vertices is 3. The vertex indices are automatically generated for the vertex shader.

The instance count determines how many times we want to duplicate the triangle. This technique can speed up rendering when we need to render a large number of identical geometries, such as grass or leaves in a video game. In this example, we specify a single instance because we only need to draw one triangle.

+
1_01_triangle/index.html:86-94 Submit Command Buffer to Render a Triangle

With the pipeline defined, we need to create the colorAttachment, which is similar to what we covered in the first tutorial, so I'll omit the details here. After that, the final step is command creation and submission. This process is nearly identical to what we've done before, with the key differences being the use of our newly created pipeline and the invocation of the draw() function.

The draw() function triggers the rendering process. The first parameter specifies the number of vertices we want to render, and the second parameter indicates the instance count. Since we are rendering a single triangle, the total number of vertices is 3. The vertex indices are automatically generated for the vertex shader.

The instance count determines how many times we want to duplicate the triangle. This technique can speed up rendering when we need to render a large number of identical geometries, such as grass or leaves in a video game. In this example, we specify a single instance because we only need to draw one triangle.

1.2 Drawing a Triangle with Defined Vertices

In our previous tutoria out.clip_position = vec4<f32>(inPos, 1.0); return out; } -

1_02_triangle_with_vertices/index.html:13-20 Vertex Shader

First, let's examine the shader changes. We've omitted what remains the same as before. The input to vs_main has changed from @builtin(vertex_index) in_vertex_index: u32 to @location(0) inPos: vec3<f32>. Recall that @builtin(vertex_index) is a predefined input field containing the current vertex's index, whereas @location(0) is akin to a pointer to a storage location with arbitrary data we feed into the pipeline. In this particular tutorial, we will put the vertex positions in this location. The data format for this storage location is a vector of 3 floats.

In the function body, we no longer need to derive the vertex positions as we expect them to be sent to the shader. Here, we simply create a vector of 4 floats and assign its xyz components to the input position and the w component to 1.0.

The rest of the shader remains the same. The new shader is actually simpler.

const positionAttribDesc = {
+

First, let's examine the shader changes. We've omitted what remains the same as before. The input to vs_main has changed from @builtin(vertex_index) in_vertex_index: u32 to @location(0) inPos: vec3<f32>. Recall that @builtin(vertex_index) is a predefined input field containing the current vertex's index, whereas @location(0) is akin to a pointer to a storage location with arbitrary data we feed into the pipeline. In this particular tutorial, we will put the vertex positions in this location. The data format for this storage location is a vector of 3 floats.

In the function body, we no longer need to derive the vertex positions as we expect them to be sent to the shader. Here, we simply create a vector of 4 floats and assign its xyz components to the input position and the w component to 1.0.

The rest of the shader remains the same. The new shader is actually simpler.

const positionAttribDesc = {
     shaderLocation: 0, // @location(0)
     offset: 0,
     format: 'float32x3'
 };
-

Now, let's look at the pipeline changes to adopt the new shader code. First, we need to create a position attribute description. An attribute refers to the input to the shader function @location(0) inPos: vec3<f32>. Unlike the @builtins, an attribute doesn't have predefined meanings. Its meaning is determined by the developer; it could represent vertex positions, vertex colors, or texture coordinates.

First, we specify the attribute's location shaderLocation, which corresponds to @location(0). Second, we tell the pipeline the offset with respect to the beginning of the data buffer that contains the vertex data to find the first element of this attribute. This is because we can mingle multiple attributes in a single piece of buffer. Finally, the format field defines the format and corresponds to vec3<f32> in the shader.

const positionBufferLayoutDesc = {
+

Now, let's look at the pipeline changes to adopt the new shader code. First, we need to create a position attribute description. An attribute refers to the input to the shader function @location(0) inPos: vec3<f32>. Unlike the @builtins, an attribute doesn't have predefined meanings. Its meaning is determined by the developer; it could represent vertex positions, vertex colors, or texture coordinates.

First, we specify the attribute's location shaderLocation, which corresponds to @location(0). Second, we tell the pipeline the offset with respect to the beginning of the data buffer that contains the vertex data to find the first element of this attribute. This is because we can mingle multiple attributes in a single piece of buffer. Finally, the format field defines the format and corresponds to vec3<f32> in the shader.

const positionBufferLayoutDesc = {
     attributes: [positionAttribDesc],
     arrayStride: 4 * 3, // sizeof(float) * 3
     stepMode: 'vertex'
 };
-

Our next task involves creating a buffer layout descriptor. This step is crucial in aiding our GPU pipeline to comprehend the buffer's format when we submit it. For those new to graphics programming, these steps may seem verbose, and it's often challenging to grasp the difference between an attribute descriptor and a layout descriptor, as well as why they are necessary to describe a GPU buffer.

When submitting vertex data to the GPU, we typically send a large buffer containing data for numerous vertices. As introduced in the first chapter, transferring small amounts of data from CPU memory to GPU memory is inefficient, hence the best practice is to submit data in large batches. As previously mentioned, vertex data can contain multiple attributes intermingled, such as vertex positions, colors, and texture coordinates. Alternatively, you may choose to use dedicated buffers for each attribute separately. However, when we reach the vertex shader's entry point, we process each vertex individually. At this stage, we no longer have visibility of the entire attribute buffers. Each shader invocation works on one vertex independently, which allows shader programs to benefit from the GPU's parallel architecture.

To transition from submitting a single chunk of buffer on the CPU side to per-vertex processing on the GPU side, we need to dissect the input buffer to extract information for each individual vertex. The GPU pipeline can do this automatically with the help of the layout description. To differentiate between the attribute descriptor and the layout descriptor: the attribute descriptor describes the attribute itself, such as its location and format, whereas the layout descriptor focuses on how to break apart a list of multiple attributes for many vertices into data for each individual vertex.

Within this layout descriptor structure, we find an attribute list. In our current example, which only deals with positions, the list solely contains the position attribute descriptor. In more complex scenarios, we would include more attributes in this list. Following that, we define the arrayStride. This parameter denotes the size of the step by which we advance the buffer pointer for each vertex. For instance, for the first vertex (vertex 0), its data resides at offset zero within the buffer. For the subsequent vertex (vertex 1), we locate its data at offset zero plus arrayStride, which starts at the 12th byte (4 bytes for one float multiplied by 3).

Lastly, we specify the step mode. Two options exist: vertex and instance. By choosing either, we instruct the GPU pipeline to advance the pointer of this buffer for each vertex or for each instance. We'll explore the concept of instancing in future chapters. However, for most scenarios, the vertex option suffices.

const positions = new Float32Array([
+

Our next task involves creating a buffer layout descriptor. This step is crucial in aiding our GPU pipeline to comprehend the buffer's format when we submit it. For those new to graphics programming, these steps may seem verbose, and it's often challenging to grasp the difference between an attribute descriptor and a layout descriptor, as well as why they are necessary to describe a GPU buffer.

When submitting vertex data to the GPU, we typically send a large buffer containing data for numerous vertices. As introduced in the first chapter, transferring small amounts of data from CPU memory to GPU memory is inefficient, hence the best practice is to submit data in large batches. As previously mentioned, vertex data can contain multiple attributes intermingled, such as vertex positions, colors, and texture coordinates. Alternatively, you may choose to use dedicated buffers for each attribute separately. However, when we reach the vertex shader's entry point, we process each vertex individually. At this stage, we no longer have visibility of the entire attribute buffers. Each shader invocation works on one vertex independently, which allows shader programs to benefit from the GPU's parallel architecture.

To transition from submitting a single chunk of buffer on the CPU side to per-vertex processing on the GPU side, we need to dissect the input buffer to extract information for each individual vertex. The GPU pipeline can do this automatically with the help of the layout description. To differentiate between the attribute descriptor and the layout descriptor: the attribute descriptor describes the attribute itself, such as its location and format, whereas the layout descriptor focuses on how to break apart a list of multiple attributes for many vertices into data for each individual vertex.

Within this layout descriptor structure, we find an attribute list. In our current example, which only deals with positions, the list solely contains the position attribute descriptor. In more complex scenarios, we would include more attributes in this list. Following that, we define the arrayStride. This parameter denotes the size of the step by which we advance the buffer pointer for each vertex. For instance, for the first vertex (vertex 0), its data resides at offset zero within the buffer. For the subsequent vertex (vertex 1), we locate its data at offset zero plus arrayStride, which starts at the 12th byte (4 bytes for one float multiplied by 3).

Lastly, we specify the step mode. Two options exist: vertex and instance. By choosing either, we instruct the GPU pipeline to advance the pointer of this buffer for each vertex or for each instance. We'll explore the concept of instancing in future chapters. However, for most scenarios, the vertex option suffices.

const positions = new Float32Array([
     1.0, -1.0, 0.0, -1.0, -1.0, 0.0, 0.0, 1.0, 0.0
 ]);
-

Now, let's proceed to prepare the actual buffer, which is a relatively straightforward step. Here, we create a 32-bit floating-point array and populate it with the coordinates of the three vertices. This array contains nine values in total.

To better understand these coordinate values, recall the clip space or screen space coordinates we introduced previously. Each set of three values represents a vertex position in 3D space. The first vertex (1.0, -1.0, 0.0) is positioned at the bottom-right corner of the clip space. The second vertex (-1.0, -1.0, 0.0) is at the bottom-left corner, and the third vertex (0.0, 1.0, 0.0) is at the top-center of the clip space. They are organized in a clockwise order.

These coordinates are chosen deliberately to form a triangle that spans across the visible area of our rendering surface. The z-coordinate is set to 0.0 for all vertices, placing them on the same plane perpendicular to the viewing direction. This arrangement will result in a triangle that covers half of the screen, with its base along the bottom edge and its apex at the top-center.

At this stage, the data we've created resides in CPU memory. To utilize it within the GPU pipeline, we must transfer this data to GPU memory, which involves creating a GPU buffer.

const positionBufferDesc = {
+

Now, let's proceed to prepare the actual buffer, which is a relatively straightforward step. Here, we create a 32-bit floating-point array and populate it with the coordinates of the three vertices. This array contains nine values in total.

To better understand these coordinate values, recall the clip space or screen space coordinates we introduced previously. Each set of three values represents a vertex position in 3D space. The first vertex (1.0, -1.0, 0.0) is positioned at the bottom-right corner of the clip space. The second vertex (-1.0, -1.0, 0.0) is at the bottom-left corner, and the third vertex (0.0, 1.0, 0.0) is at the top-center of the clip space. They are organized in a clockwise order.

These coordinates are chosen deliberately to form a triangle that spans across the visible area of our rendering surface. The z-coordinate is set to 0.0 for all vertices, placing them on the same plane perpendicular to the viewing direction. This arrangement will result in a triangle that covers half of the screen, with its base along the bottom edge and its apex at the top-center.

At this stage, the data we've created resides in CPU memory. To utilize it within the GPU pipeline, we must transfer this data to GPU memory, which involves creating a GPU buffer.

const positionBufferDesc = {
     size: positions.byteLength,
     usage: GPUBufferUsage.VERTEX,
     mappedAtCreation: true
@@ -188,7 +188,7 @@ 

1.2 Drawing a Triangle with Defined Vertices

In our previous tutoria new Float32Array(positionBuffer.getMappedRange()); writeArray.set(positions); positionBuffer.unmap(); -

We begin this process by crafting a buffer descriptor. The descriptor's first field specifies the buffer's size, followed by the usage flag. In our case, as we intend to use this buffer for supplying vertex data, we set the VERTEX flag. Lastly, we determine whether we want to map this buffer at creation.

Mapping is a crucial operation that must precede any data transfer between the CPU and GPU. It essentially creates a mirrored buffer on the CPU side for the GPU buffer. This mirrored buffer serves as our staging area where we write our CPU data. Once we've finished writing the data, we call unmap to flush the data to the GPU.

The mappedAtCreation flag offers a convenient shortcut. By setting this flag, the buffer is automatically mapped upon creation, making it immediately available for data copying.

After defining the descriptor structure, we create the buffer based on this descriptor in the subsequent line. Since the buffer is already mapped at this point, we can proceed to write the data.

Our approach involves creating a temporary 32-bit floating-point array writeArray, directly linked to the mapped GPU buffer. We then simply copy the CPU buffer to this temporary array. After unmapping the buffer, we can be confident that the data has been successfully transferred to the GPU and is ready for use by the shader.

const pipelineDesc = {
+

We begin this process by crafting a buffer descriptor. The descriptor's first field specifies the buffer's size, followed by the usage flag. In our case, as we intend to use this buffer for supplying vertex data, we set the VERTEX flag. Lastly, we determine whether we want to map this buffer at creation.

Mapping is a crucial operation that must precede any data transfer between the CPU and GPU. It essentially creates a mirrored buffer on the CPU side for the GPU buffer. This mirrored buffer serves as our staging area where we write our CPU data. Once we've finished writing the data, we call unmap to flush the data to the GPU.

The mappedAtCreation flag offers a convenient shortcut. By setting this flag, the buffer is automatically mapped upon creation, making it immediately available for data copying.

After defining the descriptor structure, we create the buffer based on this descriptor in the subsequent line. Since the buffer is already mapped at this point, we can proceed to write the data.

Our approach involves creating a temporary 32-bit floating-point array writeArray, directly linked to the mapped GPU buffer. We then simply copy the CPU buffer to this temporary array. After unmapping the buffer, we can be confident that the data has been successfully transferred to the GPU and is ready for use by the shader.

const pipelineDesc = {
     layout,
     vertex: {
         module: shaderModule,
@@ -216,7 +216,7 @@ 

1.2 Drawing a Triangle with Defined Vertices

In our previous tutoria passEncoder.end(); device.queue.submit([commandEncoder.finish()]); -

The remaining portion of the code bears a strong resemblance to the previous tutorial, with only a few key differences. One notable change appears in the pipeline descriptor definition. Within the vertex stage, we now provide a buffer layout descriptor in the buffers field. It's important to note that this field can accommodate multiple buffer descriptors if needed.

Another significant change is in the primitive section of the pipeline descriptor. We specify frontFace: cw for clockwise, which corresponds to the order of vertices in our vertex buffer. This setting informs the GPU about the winding order of our triangles, which is crucial for correct face culling.

After creating the new pipeline using this updated descriptor, we need to set a vertex buffer when crafting a command with this pipeline. We accomplish this using the setVertexBuffer function. The first parameter represents an index, corresponding to the buffer layout indices of the buffers field when defining the pipeline. In this case, we specify that the positionBuffer, which resides on the GPU, should be used as the source of vertex data.

The draw command remains similar to our previous example, instructing the GPU to render three vertices as a single triangle. However, the key difference now is that these vertices are sourced from our explicitly defined buffer, rather than being generated in the shader.

Upon submitting this command, you should see a solid triangle rendered on the screen. This approach of explicitly defining vertex data offers greater flexibility and control over the geometry we render, paving the way for more complex shapes and models in future tutorials.

+
1_02_triangle_with_vertices/index.html:76-119 Pipeline and Command Buffer Definition

The remaining portion of the code bears a strong resemblance to the previous tutorial, with only a few key differences. One notable change appears in the pipeline descriptor definition. Within the vertex stage, we now provide a buffer layout descriptor in the buffers field. It's important to note that this field can accommodate multiple buffer descriptors if needed.

Another significant change is in the primitive section of the pipeline descriptor. We specify frontFace: cw for clockwise, which corresponds to the order of vertices in our vertex buffer. This setting informs the GPU about the winding order of our triangles, which is crucial for correct face culling.

After creating the new pipeline using this updated descriptor, we need to set a vertex buffer when crafting a command with this pipeline. We accomplish this using the setVertexBuffer function. The first parameter represents an index, corresponding to the buffer layout indices of the buffers field when defining the pipeline. In this case, we specify that the positionBuffer, which resides on the GPU, should be used as the source of vertex data.

The draw command remains similar to our previous example, instructing the GPU to render three vertices as a single triangle. However, the key difference now is that these vertices are sourced from our explicitly defined buffer, rather than being generated in the shader.

Upon submitting this command, you should see a solid triangle rendered on the screen. This approach of explicitly defining vertex data offers greater flexibility and control over the geometry we render, paving the way for more complex shapes and models in future tutorials.

1.6 Understanding Uniforms

In this tutorial, we'll explore the concept of uniforms in WebGPU shaders. Uniforms provide a mechanism to supply data to shader programs, acting as constants throughout the entire execution of a shader.

Launch Playground - 1_06_uniforms

You might wonder how uniforms differ from attributes, which we've previously used to pass data to shader programs. The distinction lies in their intended use and behavior.

This distinction is crucial because it enables shaders to handle both per-vertex data (attributes) and shared, unchanging data (uniforms), offering flexibility and efficiency in the rendering process.

In this example, we'll create a uniform called "offset" to shift the positions of our vertices by a consistent amount. Using a uniform for this purpose is logical because we want the same offset applied to all vertices. If we used attributes for the offset, we'd have to duplicate the same value for every vertex, which would be inefficient and wasteful. Uniforms are the ideal choice when you need to pass the same information to the shader for all vertices.

By using a uniform for the offset, we can efficiently apply a global transformation to our geometry, easily update the offset value for dynamic effects, and reduce memory usage and data transfer compared to per-vertex attributes. This example will demonstrate how to declare, set, and use uniforms in WebGPU, illustrating their power in creating flexible and efficient shader programs.

Let's examine the syntax used to create a uniform in this program:

@group(0) @binding(0)
 var<uniform> offset: vec3<f32>;
-

The var<uniform> declaration indicates that offset is a uniform variable, signaling to the shader that this variable should be provided from a uniform buffer. The @binding(0) annotation serves a similar purpose to @location(0) for vertex attributes. It's an index that identifies the uniform within a uniform buffer. In a typical uniform buffer, you'll pack multiple uniform values, and this index helps the shader locate the correct value efficiently.

The @group(0) annotation relates to how uniforms are organized. In this simple case, we've placed all uniforms in a single group (group 0). However, for more complex shaders, using multiple groups can be advantageous. For instance, when rendering an animated scene, you might have camera parameters that change frequently and object colors that remain constant. By separating these into different groups, you can update only the data that changes frequently, thereby optimizing performance.

@vertex
+

The var<uniform> declaration indicates that offset is a uniform variable, signaling to the shader that this variable should be provided from a uniform buffer. The @binding(0) annotation serves a similar purpose to @location(0) for vertex attributes. It's an index that identifies the uniform within a uniform buffer. In a typical uniform buffer, you'll pack multiple uniform values, and this index helps the shader locate the correct value efficiently.

The @group(0) annotation relates to how uniforms are organized. In this simple case, we've placed all uniforms in a single group (group 0). However, for more complex shaders, using multiple groups can be advantageous. For instance, when rendering an animated scene, you might have camera parameters that change frequently and object colors that remain constant. By separating these into different groups, you can update only the data that changes frequently, thereby optimizing performance.

@vertex
 fn vs_main(
     @location(0) inPos: vec3<f32>,
     @location(1) inColor: vec3<f32>
@@ -168,12 +168,12 @@ 

1.6 Understanding Uniforms

In this tutorial, we'll explore the conce out.color = inColor; return out; } -

After defining the offset uniform, its usage in the shader becomes straightforward. We simply add the offset value to the position of each vertex, effectively shifting their positions. This uniform allows us to apply a consistent transformation to all vertices without the need for duplication or redundancy in the shader code.

After modifying the shader, we need to create the uniform buffer and supply the data on the JavaScript side. Uniforms, like attribute data, are provided in a GPU buffer. Let's break down this process:

const uniformData = new Float32Array([
+

After defining the offset uniform, its usage in the shader becomes straightforward. We simply add the offset value to the position of each vertex, effectively shifting their positions. This uniform allows us to apply a consistent transformation to all vertices without the need for duplication or redundancy in the shader code.

After modifying the shader, we need to create the uniform buffer and supply the data on the JavaScript side. Uniforms, like attribute data, are provided in a GPU buffer. Let's break down this process:

const uniformData = new Float32Array([
     0.1, 0.1, 0.1
 ]);
 
 let uniformBuffer = createGPUBuffer(device, uniformData, GPUBufferUsage.UNIFORM);
-

First, we create a uniformData buffer to hold our uniform values. In this example, it contains a 3-element vector representing our offset. We create the uniformBuffer using the GPUBufferUsage.UNIFORM flag to indicate its purpose. Then, we use our helper function to populate the GPU uniform buffer with the data.

let uniformBindGroupLayout = device.createBindGroupLayout({
+

First, we create a uniformData buffer to hold our uniform values. In this example, it contains a 3-element vector representing our offset. We create the uniformBuffer using the GPUBufferUsage.UNIFORM flag to indicate its purpose. Then, we use our helper function to populate the GPU uniform buffer with the data.

let uniformBindGroupLayout = device.createBindGroupLayout({
     entries: [
         {
             binding: 0,
@@ -182,7 +182,7 @@ 

1.6 Understanding Uniforms

In this tutorial, we'll explore the conce } ] }); -

Next, we create a uniform binding group layout to describe the format of the uniform group, corresponding to our uniform group definition in the shader code. In this example, the layout has one entry, corresponding to our single uniform value. The binding index matches the one in the shader, while visibility is set to VERTEX as we use this uniform in the vertex shader. Finally, the empty buffer setting object means that we want to use defaults.

let uniformBindGroup = device.createBindGroup({
+

Next, we create a uniform binding group layout to describe the format of the uniform group, corresponding to our uniform group definition in the shader code. In this example, the layout has one entry, corresponding to our single uniform value. The binding index matches the one in the shader, while visibility is set to VERTEX as we use this uniform in the vertex shader. Finally, the empty buffer setting object means that we want to use defaults.

let uniformBindGroup = device.createBindGroup({
     layout: uniformBindGroupLayout,
     entries: [
         {
@@ -193,9 +193,9 @@ 

1.6 Understanding Uniforms

In this tutorial, we'll explore the conce } ] }); -

We then create the uniform binding group, connecting the layout with the actual data storage. Here, we supply the uniform buffer as the resource for binding 0.

const pipelineLayoutDesc = { bindGroupLayouts: [uniformBindGroupLayout] };
+

We then create the uniform binding group, connecting the layout with the actual data storage. Here, we supply the uniform buffer as the resource for binding 0.

const pipelineLayoutDesc = { bindGroupLayouts: [uniformBindGroupLayout] };
 
• • •
passEncoder.setBindGroup(0, uniformBindGroup);
-

In the pipeline layout descriptor, we include the uniform binding group layout. Finally, when encoding the render command, we use setBindGroup to specify the group ID and the corresponding binding group.

With these steps, we've successfully created a uniform buffer, defined its layout, and supplied the uniform data to the shader in the GPU pipeline. The result is the same triangle, but slightly offset based on our uniform values. Experiment with adjusting the offset values in the code sample to see how it affects the triangle's position.

+
1_06_uniforms/index.html:120-166 Configure the Pipeline and Submit the Uniform Bind Group

In the pipeline layout descriptor, we include the uniform binding group layout. Finally, when encoding the render command, we use setBindGroup to specify the group ID and the corresponding binding group.

With these steps, we've successfully created a uniform buffer, defined its layout, and supplied the uniform data to the shader in the GPU pipeline. The result is the same triangle, but slightly offset based on our uniform values. Experiment with adjusting the offset values in the code sample to see how it affects the triangle's position.

1.4 Using Different Vertex Colors

In this tutorial, we're making ano fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> { return vec4<f32>(in.color, 1.0); } -

1_04_different_vertex_colors/index.html:9-28 Updated Shader Code With an Additional Color Attribute

By assigning different colors to different vertices, we're now in a position to address the interesting question we raised earlier: how are fragment colors generated for those fragments located in the middle of a triangle? This setup will allow us to observe the GPU's color interpolation in action, providing a visual demonstration of how data is processed between the vertex and fragment stages of the rendering pipeline.

Now, let's explore how to set up the pipeline for our new shader. The new pipeline is quite similar to the previous one, with the key difference being that we now need to create a color buffer containing colors for all vertices and feed it into our pipeline.

The steps involved closely mirror how we handled the position buffer. First, we create a color attribute descriptor. Note that we set shaderLocation to 1, corresponding to the inColor attribute at @location(1) in our shader. The format for the color attribute remains a vector of three floats.

const colorAttribDesc = {
+

By assigning different colors to different vertices, we're now in a position to address the interesting question we raised earlier: how are fragment colors generated for those fragments located in the middle of a triangle? This setup will allow us to observe the GPU's color interpolation in action, providing a visual demonstration of how data is processed between the vertex and fragment stages of the rendering pipeline.

Now, let's explore how to set up the pipeline for our new shader. The new pipeline is quite similar to the previous one, with the key difference being that we now need to create a color buffer containing colors for all vertices and feed it into our pipeline.

The steps involved closely mirror how we handled the position buffer. First, we create a color attribute descriptor. Note that we set shaderLocation to 1, corresponding to the inColor attribute at @location(1) in our shader. The format for the color attribute remains a vector of three floats.

const colorAttribDesc = {
     shaderLocation: 1, // @location(1)
     offset: 0,
     format: 'float32x3'
@@ -200,7 +200,7 @@ 

1.4 Using Different Vertex Colors

In this tutorial, we're making ano ]); let colorBuffer = createGPUBuffer(device, colors, GPUBufferUsage.VERTEX); -

Next, we create the buffer layout descriptor, which informs the pipeline how to interpret the color buffer for each vertex. We assign the color attribute descriptor to the attributes field. The arrayStride is set to 4 * 3 because a float occupies 4 bytes, and we have 3 floats for each color. The stepMode is set to vertex because each vertex will have one color.

After defining the RGB data in CPU memory using a Float32Array (with the first vertex being red, the second green, and the third blue), we proceed to create a GPU buffer and copy the data to the GPU.

Let's recap the process of creating and populating the GPU buffer:

  1. We define a buffer descriptor, specifying the buffer size and setting the usage flag as VERTEX since we'll use the color attribute in the vertex stage.

  2. We set mappedAtCreation to true, allowing immediate data copying upon buffer creation.

  3. We create the GPU buffer and, using the mapped buffer range, create a mirrored buffer in CPU memory.

  4. We copy the color data into this mapped buffer.

  5. Finally, we unmap the buffer, signaling that the data transfer is complete.

In our sample code, you might notice that we don't explicitly see these steps. This is because this process is a common procedure that we need to perform many times throughout our WebGPU programs. To streamline our code and reduce repetition, I've created a utility function to encapsulate these steps.

As previously mentioned, WebGPU can be quite verbose in its syntax. It's often a good practice to wrap common code blocks into reusable utility functions. This approach not only reduces our workload but also makes our code more readable and maintainable.

The createGPUBuffer function I've created encapsulates all these steps into a single, reusable function. Here's how it's defined:

function createGPUBuffer(device, buffer, usage) {
+

Next, we create the buffer layout descriptor, which informs the pipeline how to interpret the color buffer for each vertex. We assign the color attribute descriptor to the attributes field. The arrayStride is set to 4 * 3 because a float occupies 4 bytes, and we have 3 floats for each color. The stepMode is set to vertex because each vertex will have one color.

After defining the RGB data in CPU memory using a Float32Array (with the first vertex being red, the second green, and the third blue), we proceed to create a GPU buffer and copy the data to the GPU.

Let's recap the process of creating and populating the GPU buffer:

  1. We define a buffer descriptor, specifying the buffer size and setting the usage flag as VERTEX since we'll use the color attribute in the vertex stage.

  2. We set mappedAtCreation to true, allowing immediate data copying upon buffer creation.

  3. We create the GPU buffer and, using the mapped buffer range, create a mirrored buffer in CPU memory.

  4. We copy the color data into this mapped buffer.

  5. Finally, we unmap the buffer, signaling that the data transfer is complete.

In our sample code, you might notice that we don't explicitly see these steps. This is because this process is a common procedure that we need to perform many times throughout our WebGPU programs. To streamline our code and reduce repetition, I've created a utility function to encapsulate these steps.

As previously mentioned, WebGPU can be quite verbose in its syntax. It's often a good practice to wrap common code blocks into reusable utility functions. This approach not only reduces our workload but also makes our code more readable and maintainable.

The createGPUBuffer function I've created encapsulates all these steps into a single, reusable function. Here's how it's defined:

function createGPUBuffer(device, buffer, usage) {
     const bufferDesc = {
         size: buffer.byteLength,
         usage: usage,
@@ -232,7 +232,7 @@ 

1.4 Using Different Vertex Colors

In this tutorial, we're making ano gpuBuffer.unmap(); return gpuBuffer; } -

At this point, we've successfully duplicated the color values on the GPU, ready for use in our shader.

const pipelineDesc = {
+

At this point, we've successfully duplicated the color values on the GPU, ready for use in our shader.

const pipelineDesc = {
     layout,
     vertex: {
         module: shaderModule,
@@ -263,7 +263,7 @@ 

1.4 Using Different Vertex Colors

In this tutorial, we're making ano passEncoder.end(); device.queue.submit([commandEncoder.finish()]); -

After creating our buffers, we define the pipeline descriptor. The key difference from our previous example is the addition of colorBufferLayoutDescriptor to the buffers list in the vertex stage. This informs the pipeline that we're now using two vertex buffers: one for positions and another for colors.

When encoding our render commands, we now need to set two vertex buffers. We use setVertexBuffer(0, positionBuffer) for the position data and setVertexBuffer(1, colorBuffer) for the color data. The indices 0 and 1 correspond to the buffer layouts when defining the pipeline descriptor. The rest of the rendering process remains largely unchanged.

Interpolated Colors on the Triangle
Interpolated Colors on the Triangle

Upon running this code, we're presented with a visually interesting result: a colorful triangle. Each vertex is rendered with its specified color - red, green, and blue. However, the most interesting aspect is what happens between these vertices. We observe a smooth transition of colors across the triangle's surface.

This automatic color blending is a feature performed by the GPU, a process we refer to as interpolation. It's important to note that this interpolation isn't limited to colors; any value we output at the vertex stage will be interpolated by the GPU to assign appropriate values for every fragment, particularly for those not located directly at the vertices.

The interpolation for a fragment's values is calculated based on its relative distance to the vertices, following a bilinear scheme. This mechanism is incredibly useful because, considering there are typically far more fragments than vertices in a scene, it would be impractical to specify values for all fragments individually. Instead, we rely on the GPU to generate these values efficiently based on the values defined only at each vertex.

This interpolation technique is a fundamental concept in computer graphics, enabling smooth transitions and gradients across surfaces with minimal input data.

+
1_04_different_vertex_colors/index.html:95-138 Setup the Pipeline With Two Vertex Buffers

After creating our buffers, we define the pipeline descriptor. The key difference from our previous example is the addition of colorBufferLayoutDescriptor to the buffers list in the vertex stage. This informs the pipeline that we're now using two vertex buffers: one for positions and another for colors.

When encoding our render commands, we now need to set two vertex buffers. We use setVertexBuffer(0, positionBuffer) for the position data and setVertexBuffer(1, colorBuffer) for the color data. The indices 0 and 1 correspond to the buffer layouts when defining the pipeline descriptor. The rest of the rendering process remains largely unchanged.

Interpolated Colors on the Triangle
Interpolated Colors on the Triangle

Upon running this code, we're presented with a visually interesting result: a colorful triangle. Each vertex is rendered with its specified color - red, green, and blue. However, the most interesting aspect is what happens between these vertices. We observe a smooth transition of colors across the triangle's surface.

This automatic color blending is a feature performed by the GPU, a process we refer to as interpolation. It's important to note that this interpolation isn't limited to colors; any value we output at the vertex stage will be interpolated by the GPU to assign appropriate values for every fragment, particularly for those not located directly at the vertices.

The interpolation for a fragment's values is calculated based on its relative distance to the vertices, following a bilinear scheme. This mechanism is incredibly useful because, considering there are typically far more fragments than vertices in a scene, it would be impractical to specify values for all fragments individually. Instead, we rely on the GPU to generate these values efficiently based on the values defined only at each vertex.

This interpolation technique is a fundamental concept in computer graphics, enabling smooth transitions and gradients across surfaces with minimal input data.