VGA Mode-X Sprite Rendering

As part of my SilkWorm project, I've got a transparent sprite-rendering system going.

It's over-complicated, but it works quite quickly.  I'm writing this more as a reminder to my future self than as a HOWTO, so I'll not go into the gory details.


The system is inspired by a really nice idea from StaticSaga and 36rKATPURPY on the Discord server linked to a DOS game jam.  Their system is simpler and more data-efficient.  My code is clunky but for the moment seems to work, so I'm leaving it as is until it breaks or needs speeding-up.

The problem


Mode-X stores pixel data in planes.  It can use VGA hardware to copy 4 pixels at a time, but only on 4px-aligned boundaries. 

Michael Abrash' Black Book covers one method of how to do transparent sprite rendering without the 4px alignment, but it involves uploading 4 copies of each sprite to VGA RAM, and setting the plane mask for every 4-pixel copy.  To me, this seemed like a waste of VGA RAM, and any speed boost would be undermined by the 'out' instruction being used for every 4-pixel copy.

I started having other ideas about how to do it without using VRAM at all, and StaticSaga and 36rKATPURPY confirmed for me that there are better ways.

And so, after much debugging, cursing, more debugging and more cursing, I now have some code which does a thing.

The system


I split my pixel data into the four planes.  This is so, when rendering, I can choose which pixel column to draw into (thus stepping around the 4px-alignment issue).

I compile each plane of my sprite data into an encoded byte-stream with 4 possible commands:
  1. Skip column(s)
  2. Draw column
  3. Skip row(s)
  4. Draw row(s) (with raw pixel data following)
Each command byte has the command code in the top-two bits, with the bottom-6 bits being a number (n) interpreted differently by each command.

Skip Column(s): move to the top of the column (n+1) across (the +1 is so that n=0 would still be a meaningful command).

Draw column: move to the top of the next column and be ready for row commands.  For this command, (n*2) is the number of command bytes to skip if this column is off-screen (so we don't have to waste time looping through columns we're not going to draw).  The factor of 2 is in case we have a 64px-high sprite that draws every pixel - I didn't want to limit myself to 32px if I didn't have to.  A Draw Column(0) command marks the end of the rendering.

Skip row(s): move down n rows.  I allowed n = 0 so I could use it as padding in case a column's byte commands came to an odd-number of bytes (see the *2 in the Draw column command).

Draw row(s): n is the number of pixels to draw.  This command is followed by n bytes of pixel data.  I probably should've done n+1 here because Draw Rows(0) makes no sense, and if I'm allowing 64px-high sprites, the whole column could be done in a single command.  Never mind - I'm not touching it now it works.

The implementation


The sprite compiling I do in Python to produce a custom file format that can be loaded quickly by the DOS executable.  I like Python for throwing data around.

In the DOS executable, I loop through each plane in C, calling a DrawCompiledPlane() asm routine.

The asm routine puts the command buffer ptr in ds:si and the VGA memory location in es:di.

For each byte command:
  1. if it's zero, end
  2. if 0x80 bit is set, it's a row command:
    1. if 0x40 bit is set, it's a draw command:
      1. loop through n times: vertical clip check, read px value, write px value, move down a row
    2. else it's a skip command:
      1. add n * screenWidth on to di
  3. else it's a column command:
    1. Set di to top of next column
    2. if 0x40 bit is set, it's a draw command:
      1. if current column is off-screen, add n*2 to si (to skip this column's byte commands)
    3. else it's a skip command:
      1. add n on to di (to skip columns - note we've already skipped one column before we checked if it was skip or draw)
  4. Loop for next command
I can't properly remember why I decided to loop down each column rather than across each row.  I think it was so that the clipping in the tight loop could be done against min/max pointers in VGA RAM, rather than some calculation involving a row count.  The outer loop uses a column count but because it's the outer loop, the calculation is performed fewer times: once per column instead of once per pixel.  At least, I think that's why I did it - part of the reason for writing this post is that I can barely remember what I did 5 minutes ago, let alone days or weeks ago!

Snippet of relevant code

Comments

Popular posts from this blog

Micro:Bit and SPI display

DCS World with TrackIR under Ubuntu

Cardboard Mock-up of USB Joystick