These notes were sent to me by Eric Engler, in discussion of BinLoad. For more information, please see his website.
Although written for the 32K flash 9S12C32, these notes are
helpful in understanding the basic 9S12 memory paging mechanism
and many of these notes apply to other 9S12 devices, also.
$0000: control register block. Although you can re-map these
registers to live at a different address, most people
leave them here so zero page addressing can be used to
address the control registers that live in the lowest
256 addresses.
$3800: 2K of RAM: 3800 - 3fff
SP should be set to $4000, so it uses the top part of RAM
NOTE: AN2548 says the stack needs 35 bytes of storage for its
own use. This is often misunderstood - they are NOT
asking us to reserve the top 35 bytes for use by the
SerMon, but rather they are going to use the same
stack used by the end-user program, and the max no.
of bytes the SerMon will put on the stack is 35 bytes.
So it's safe to allocate the stack starting from the
higest avail. RAM location (SP must start one byte
higher than the top of RAM - typically $4000).
NOTE: Unlike d-bug12, there is no pseudo vector table in
RAM. The pseudo vector table goes into flash at
$f780 - f7ff.
$4000: 16K fixed flash block (PPAGE block $3E). This is where user
programs will often go. This is not shared with any other
function (the $c000 block is shared with vectors and the
SerMon itself)
$8000: 16K page window. This is a mirror of the flash at 4000 if
PPAGE holds $3e (which is the case in run mode). This can
also mirror the $c000 block if PPAGE holds 3f,but there's
not much value in doing it this way.
The bane of all newbies to the C32 family is this hole
between the two 16K blocks: the $4000 and $c000 blocks are
separated by this $8000 page window. People who compile C
programs want one big contiguous space of flash memory to
hold their text (program code) segment, since it's difficult
to break up their output code into two separate flash ranges.
This isn't a big concern for Assembly language developers
because they can easily design their memory usage to straddle
this window.
Karl Lunt solved this "hole in the middle" problem by making
use of the mirroring of the $4000 block at $8000 at runtime.
The user program gets "org'd" at $8000 during assembly/compile
time, and his binload program burns it into two separate
regions ($4000 and $c000). He remaps the lower 16K of .s19
records for flash burning purposes so those records are
burned into the $4000 block (instead of the $8000 block as
requested by the .s19 records). All internal address
references in the user program are based on the runtime
contiguous block of nearly 30K ($8000 - f77f). This is how
he eliminates the hole between the two 16K blocks of flash.
Karl's trick will probably work during execution in the debug
mode (with the serial monitor active), but Pluto would have
to ensure that PPAGE holds $3e when it runs a program. And
the user program would be org'd at $8000 and the pseudo
vectors would have to point into the range of $8000 - f77f.
$c000: 16K fixed flash block (PPAGE block $3F)
There's almost 14K of flash for user programs: $c000 - f77f.
The upper 128 bytes of this 14K is for pseudo vector table:
$f780 - f7ff. The highest 2K of flash is reserved for the
serial monitor: $f800 - $ffff
The 64K addressing range is called the logical address range (this is
the address range used by the PC/SP/IX/IY, etc). The type of address
that uses the PPAGE addressing scheme to reference flash blocks using
their native internal address is called a physical address.
A physical address is always required when programming flash, since
there can be a lot more flash than can be addressed with 64K (at least,
in some higher-end MCU's other than the 9s12C32). Freescale's internal
flash programming code always requires physical addresses. A physical
address is 20 bits long, so you'll see 5 hex digits in a physical
address. Since 20 bits is not an even multiple of 8 bits, you'll often
see a 24 bit address, just to make it line up on byte boundaries but
upper 4 bits isn't used (prior to the S12X, anyway).
I don't want to confuse anyone here, but I should mention that there's
two types of physical address schemes. The "paged" scheme simply uses
the PPAGE value as the high order byte. This means there's a gap
between $ffff (highest 64K address) and $3d0000 (lowest paged address).
A user program gets burned into flash from $3d0000 to $3dffff, and then
from $3f0000 to $3fffff. The CodeWarrior tool chain uses this scheme.
The "linear" scheme uses an equivalent address that builds up from the
64K range, and has no hole. The D-bug12 bootloader and the P&E tools
use this scheme. With this model you can just keep incrementing the
address one byte at a time from $00000 to $fffff (less than this,
depending on how much flash your chip has), with no holes in the middle.
Many modern debuggers and flash burning programs that run on the PC can
translate a paged address file on the fly, or the SRecCvt program can be
used to translate a paged .s19 file to a linear .s19 file. Why would
you want to translate? Because the CodeWarrior chain (and possibly
others) will generate paged addresses and you may need linear addresses
for use with the bootloader, or P&E tools, etc. P&E has their own
translation program called logtophy (logical to physical). Sometimes
people will use a different extension for a linear file: I've seen .s2
in some cases, but there is no widely used convention for this, so you
can't be sure which scheme is in use if you see a file with an .s19
extension.
The memory window at $8000 seems unimportant on the C32 since you can
directly address the entire 32K of flash using logical 64K addresses at
runtime (aside from the case of programming flash, as mentioned above).
However, you can still use the memory window and there can be a benefit
in using it. If you store $3E in the PPAGE register at address $0030,
then the $4000 block of flash will also be seen at $8000. Similarly, if
you store $3F in the PPAGE register, then the $C000 block will also be
visible at $8000. Of course, since these 2 flash blocks are also fixed,
you'll continue to see them at $4000 or $C000. The window at $8000, in
effect, results a 16K hole between the two fixed blocks of flash, but
you can use Karl's trick as mentioned above to treat this as one big
block of 30K.
One important factor that needs to be considered when you have a program in flash that will run upon startup (switch in Run position): you must have some initialization code to configure a few things before the main program runs. You have to map the control registers and RAM, and then set SP at the top of RAM. You should also set the PLL frequency. If you're using gcc, the startup code has to be patched. I have a complete demo project in my EmbeddedGNU Windows IDE, including the modified gcc startup code. There's also code for the reset vector so your program will execute upon powerup. You can use my code even if you don't want to use my IDE. Be sure to look at the help file for some more info on the C32.
We have some vectors at the top of our usable flash range in the area just below $F800. This is our own vector table that we define. Interrupts are handled first by the serial monitor (which has firmware hooks just below $ffff), and then flow is directed to our own vectors (just below $f800) so our code can handle the interrupts. However, the serial interupt is dedicated for use by the serial monitor if running in boot mode (monitor mode). It uses these interrupts to let it steal some MCU clock cycles so it can execute our debug commands and send back the results. This is an invasive style of debugging because the timer in the MCU still ticks away while the serial monitor is running - this can cause trouble if our program is using the timer for its purposes. It's a reasonable trade-off to make in order to get low-cost debugging over the serial port. In cases where we need less invasive debugging, we need to use a BDM device. While in active BDM mode to execute debug commands, the timer is temporarily halted. This makes it easier to debug programs that are aware of the passage of time. But even this isn't perfect if you're interacting with high speed external devices that can't be halted - like Ethernet on the NE64, for example. If we run in boot mode then the serial monitor, while still present in flash, is not actually active. However, the vectors just below $f800 are still used in the boot mode because we're not allowed to modify the protected flash range where the serial monitor lives ($f800 - $ffff). How does the C32 MCU know that it's supposed to use our vectors and not the serial monitor vectors if we're running in the boot mode? Because the serial monitor has firmware vectors in place that will trap all of the interrupts, and it will direct those interrupts to the new vector table if the serial monitor is not active. Even the serial port interrupt, normally used only by the serial monitor in program mode, is redirected to our own vector table in boot mode. To make our lives easier, if we code interrupt vectors at an ORG in the $ffxx memory block, the serial monitor routines automatically remap our own addresses to the $f7xx range when we program the flash. So as far as our source code is concerned, it looks like we're putting our vectors right over the top of the serial monitor vectors. This is "by design" - they wanted it to look like our program will have complete control, and this is really the case in boot mode.