Changelog
UI: implement LCD1x shader (thanks @DestinyXZ9)
ARM: implement multiplication carry algorithm (thanks @zaydlang, @calc84maniac)
ARM: fix LDM^ bus conflict logic breaks in user mode
ARM: fix out-of-bounds accesses when switching in or out of an invalid CPU mode
ARM: optimize GPR and SPSR reads
ARM: optimize LSL, LSR, ASR and ROR arithmetic
DMA: allow CPU internal ticks during CPU<->DMA transition cycles
PPU: correctly reset OBJ mosaic Y counter (thanks @j-mattsson)