Refilling the buffer - This was the key to playing long sound files with smooth transitions. Each time it is called, it refills half of the buffer with the next half-buffer-length of the sound clip to be played. [_DMA_Refill_Flag] indicates which half to refill. The flag is updated every time one half is filled, so the other half is filled the next time this function is called.
The actual refill was just copying the next half-buffer length of the sound clip to the DMA buffer address. The next portion to be played was stored in [BGMPos], which is incremented by SIZE/2 each time. [NextPos] is just a variable starting at 0 incremented each time by SIZE/2 also, which is used for comparing to handle the end case. When [NextPos] is greater than [BGMSize], which is the size of the background music, we know we've reached the end, and jump to the end case.
The end case was handled rather crudely. The code was originally written to play the remaining size of the file, switching to single cycle. But with small enough DMA buffer and having the sound file with enough blank at the end, simply not playing small remainder was good enough. Therefore, when the end is reached, the small remainder is ignored and the music starts again from the beginning.
Mixing the sounds - Next, it checks the [_DMA_Mix] flag to see whether it needs to mix sound effects to the background music. In this program, background music was always on, so whenever we wanted to play a sound effect, it had to mixed with the background music. To mix the sounds, it loaded data from the buffer it just refilled into an MMX register. It loads the sound effect to be mixed into another MMX register and performs bytewise parallel add. The result is stored back into the buffer and it is repeated until the entire half buffer is filled.
[MixCycle] stores how many times cycles (DMA refills) the sound must be mixed. It is initialized in _PlaySFX. It is decremented each time, and when it reaches 0, it jumps to the end case. This end case was also handled similar to the DMA refilling case. Again with small enough buffer and sound files with blanks at the end, I simply set the [_DMA_Mix] flag back to 0.