Memory Subsystem
The SoC integrates various kinds of memory, as described below.
- SRAM
The main Static Random Access Memory (SRAM) is comprised of up to an on-chip static RAM with a capacity of 96 KB. It mainly stores application data, which can also be used to execute instructions. The DMA controller can transfer data between the SRAM, Flash and peripherals.
To manage power usage more efficiently, the main SRAM is implemented as multiple SRAM instances (for example, 96 KB SRAM can be implemented as six instances) when less SRAM is required. Each SRAM instance is controlled by an independent power switch, which controls SRAM power states: Full Power, Retention Power, and Power Off.
- Flash
The non-volatile memory management system integrates an accessible Flash memory and an on-chip XQSPI controller, supporting the XIP mechanism and allowing the SoC to store application data and code execution in Flash. The XIP mechanism allows the CPU to execute program code directly from the internal Flash by using QSPI. When a cache miss occurs, the XIP controller will automatically prefetch a certain number of program instructions from Flash and update them into the cache memory.
- ROM
The 224 KB read-only memory (ROM) mainly stores the boot code and code related to Bluetooth LE protocol stack.
- eFuse
The eFuse (32 bytes) stores factory information, such as the UID, Flash security register address, and memory information.
- Cache
The 8 KB cache is a high-speed buffer memory between CPU and memory. The speed of CPU is much higher than that of memory. When CPU directly accesses data from memory, it has to wait for a certain period. However, cache can save the data that has just been used or repeatedly used by CPU. If CPU needs to use such data again, it can fetch the data directly from cache, which reduces delay by avoiding repeated access, and thus improves system efficiency.
SRAM
Introduction
Although always contiguous on all devices, the SRAM instances are divided between five AHB matrix ports. This allows user programs to potentially obtain better performance by dividing RAM usage among the ports. For example, simultaneous access to SRAM0 by the CPU and SRAM1 by the system DMA controller does not result in any bus stalls for either master.
Generally, the CPU will read or write all peripheral data at the same point, even when all such data is read from or sent to a peripheral by DMA. Therefore, minimizing stalls is likely to involve putting data to/from different peripherals in RAM on each port.
Alternatively, load sequences of data from the same peripheral into different RAM ports, to prevent the access attempt from CPU and DMA to the same group of data. Once DMA fills or clears a buffer, it sends a signal to the CPU to indicate that the data is ready. Then, CPU can use the data in the buffer, and DMA can use RAM on a different port to prepare for filling the next buffer. In this way, CPU and DMA can effectively share memory resources, improving efficiency and reliability of data transmission.
eFuse
Introduction
The eFuse (32 bytes) mainly stores the UID, Flash security register address, and memory information.
- UID (16 bytes)
- Flash information (7 bytes)
- SRAM Size (1 byte) is the size of SRAM.
- SE_ADDR1 (3 bytes) is the start address for Flash security register #1 (24 bits).
- SE_ADDR2 (3 bytes) is the start address for Flash security register #2 (24 bits).
- BOD Trim (1 byte)
- bits[3:0]: BOD trimming
- bits[7:4]: pattern (0xA)
- Configuration (1 byte)
- bit0: isp_uart_bypass (0: UART ISP supported, 1: UART ISP not supported)
- bit1: isp_jlink_bypass (0: JLINK ISP supported, 1: JLINK ISP not supported)
- bit2: SWD control (0: enabled, 1: disabled)
- bit3: IO LDO adjustment control (0: disabled, 1: enabled)
- bit4: IO_LDO_SEL (I/O voltage 0: 1.8 V, 1: 3.0 V)
- bit5: IO_LDO_Bypass (0: no bypass, 1: bypass)
-
IO_LDO_1P8 (1 byte)
-
IO LDO output 1.8 V bits[6:5]: coarse; bits[4:0]: fine
-
-
IO_LDO_3P0 (1 byte)
-
IO LDO output 3.0 V bits[6:5]: coarse; bits[4:0]: fine
-
-
User Region (5 bytes)
The user region is reserved for users to write and read by calling the eFuse driver. Every single byte of eFuse can only be written once.
The figure below shows the structure of eFuse.
The access to eFuse should be word aligned, which means the offset address can only be 0x0, 0x10, … 0x1C.
Main Features
-
The initial eFuse bit is 0, and it can only be changed once from 0 to 1.
-
Little-endian mode, the minimum unit of data reading or writing is 4 bytes.
-
The User Region is reserved for users to write and read data by calling the eFuse driver.
-
The base address of User Region is 0x40016000.
-
Every single byte of eFuse can only be written once.
-
-
Users can use GProgrammer to generate and download eFuse files.
Programming
Write flow:
- Write 1 to bit[10] of the RF_REG_2 register. Connect to eFuse supply voltage.
- Set MSIO_7 to analog mode. Connect MSIO_7 to 2.5 V–2.7 V external power supply.
- Write 1 to bits[12:10] of the RF_REG_2 register. Connect to eFuse program voltage.
- Configure the TPGM register of eFuse.
- Write 0 to the SIG field in the PGENB register of eFuse.
- Software sends commands to write data to eFuse.
- Software reads efuse_write_done status from the STAT register of eFuse.
- If software has no data to write, write 1 to the SIG field in the PGENB register of eFuse.
- Write 0 to bits[12:10] of the RF_REG_2 register.
Read flow:
- Write 1 to bit[10] of the RF_REG_2 register . Connect to eFuse supply voltage.
- Write 0 to bits[12:11] of the RF_REG_2 register. Disconnect from eFuse program voltage.
- Write 1 to the SIG field in the PGENB register.
- Software sends commands to read data from eFuse.
Flash (Non-Volatile Memory)
Introduction
The non-volatile memory management system integrates an accessible Flash memory, and an on-chip QSPI controller that supports the XIP mechanism, allowing the device to be capable of storing application data and executing code in Flash address.
The XIP mechanism allows CPU to execute program code directly from the internal Flash memory by using the QSPI interface. When a cache miss occurs, the XIP controller will automatically prefetch a certain number of program instructions from Flash memory and update them into the cache memory.
Main Features
- Single 1.65 V to 3.60 V supply
- Industrial temperature range: -40°C to 105°C
- Serial peripheral interface (SPI) compatible: Mode 0 and Mode 3
- Single, dual, and quad I/O modes
- Flexible architecture for code and data storage
-
Uniform 256-byte page program
-
Uniform 256-byte page erase
-
Uniform 4-KB sector erase
-
Uniform 32-KB/64-KB block erase
-
Full chip erase
-
- Hardware controlled locking of protected sectors by the write protect (WP) pin
- One time programmable (OTP) security registers
-
3 x 512-byte security registers with OTP lock
-
- 128-bit UID for each device
- Fast program and erase speed
- 2 ms page program time
- 16 ms page erase time
- 16 ms 4-KB sector erase time
- 16 ms 32-KB block erase time
-
16 ms 64-KB block erase time
- Ultra low power consumption
- 0.1 µA deep power-down current
- 10 µA standby current
- 2.5 mA active read current at 33 MHz
-
3.0 mA active program or erase current
- High reliability
-
100,000 program/erase cycles
-
20-year data retention
-
Functional Description
Layout
The Flash provides three 512-byte security registers which can be erased and programmed individually. These registers are used to store chip-specific information and configuration separately from the main memory array. The addresses of three security registers are as follows:
Address | A23-16 | A15-12 | A11-9 | A8-0 |
---|---|---|---|---|
Security Register #1 |
00H |
0001 |
000 |
Don’t care |
Security Register #2 |
00H |
0010 |
000 |
Don’t care |
Security Register #3 |
00H |
0011 |
000 |
Don’t care |
The information stored in Security Register #1 is shown in the following table.
Bass Address | Peripheral | Instance | Size (Byte) | Description |
---|---|---|---|---|
0x001000 |
Flash |
OTP |
512 |
OTP security register #1 |
Module | Symbol | Offset | Size (Byte) | Description |
---|---|---|---|---|
Verify | Pattern | 0x00 | 2 | Goodix pattern |
Item end | 0x02 | 2 | Used Item End Addr | |
Check sum | 0x04 | 4 | Check_sum is 32 bits of sum, including Production, Flash, RF, PWR, GPADC, and unused segments. Data of all the modules will be added up (unit: 32 bits). |
|
Production | ATE version | 0x08 | 2 | - |
HW verison | 0x0A | 2 | Build code (hardware version). Encoded as ASCII. B2: 0x4232 B3: 0x4233 Unspecified : 0xFFFF |
|
Chip ID | 0x0C | 2 | Part code GR5332: 0x5332 GR5331: 0x5331 |
|
Package | 0x0E | 2 | Package options:
|
|
Flash size | 0x10 | 2 | Flash variant.
|
|
RAM size | 0x12 | 2 | RAM variant.
|
|
Unused | 0x14 | 12 | Reserved | |
RF | TX power | 0x20 | 1 | Power value for TX |
RSSI CALI | 0x21 | 1 | Calibration value for RSSI | |
HP gain | 0x22 | 1 | HP gain offset for 2M | |
Unused | 0x23 | 1 | Reserved | |
PMU | DCDC_1P05 | 0x24 | 1 | DC-DC outputs 1.05 V 0x4000A808 bits[23:19] |
DCDC_1P15 | 0x25 | 1 | DC-DC outputs 1.15 V 0x4000A808 bits[23:19] | |
DIG_LDO_0P9 | 0x26 | 1 | CORE_LDO outputs 0.9 V (Coarse) 0x4000A814 bits [21:20] | |
DIG_LDO_0P9_fine | 0x27 | 1 | CORE_LDO outputs 0.9 V (Fine) 0x4000A8C0 bit8 = 1; 0x4000A8C4 bits [13:8] trimming |
|
DIG_LDO_1P05 | 0x28 | 1 | CORE_LDO outputs 1.05 V (Coarse) 0x4000A814 bits [21:20] | |
DIG_LDO_1P05_fine | 0x29 | 1 | CORE_LDO outputs 1.05 V (Fine) 0x4000A8C0 bit8=1; 0x4000A8C4 bits [13:8] trimming |
|
SYS_LDO_1P05 | 0x2A | 1 | SYS_LDO outputs 1.05 V 0x4000A854 bits [3:0] | |
SYS_LDO_1P15 | 0x2B | 1 | SYS_LDO outputs 1.15 V 0x4000A854 bits [3:0] | |
IO_LDO_1P8 | 0x2C | 1 | IO_LDO outputs 1.8 V 0x4000A804 bits[6:5] coarse; bits [4:0] fine |
|
IO_LDO_3P0 | 0x2D | 1 | IO_LDO outputs 3.0 V 0x4000A804 bits[6:5] coarse; bit<4:0> fine |
|
STB_IO_LDO_1P8 | 0x2E | 1 | STB_IO_LDO outputs 1.8 V. 0x4000A810 bits[22:21] coarse; bits[15:12] fine |
|
STB_IO_LDO_3P0 | 0x2F | 1 | STB_IO_LDO outputs 3.0 V. 0x4000A810 bits[22:21] coarse; bit[15:12] fine |
|
RINGO_DIG_0P9 | 0x30 | 2 | Ringo value with digital core 0.9 V. 0x4000E80C bit<15:4> | |
RINGO_DIG_1P05 | 0x32 | 2 | Ringo value with digital core 1.05 V. 0x4000E80C bit<15:4> | |
ADC | Offset Int 0P8 | 0x34 | 2 | Offset based on internal reference (0.85 V) |
Slope Int 0P8 | 0x36 | 2 | Slope based on internal reference(0.85 V) | |
Offset Int 1P2 | 0x38 | 2 | Offset based on internal reference(1.28 V) | |
Slope Int 1P2 | 0x3A | 2 | Slope based on internal reference(1.28 V) | |
Offset Int 1P6 | 0x3C | 2 | Offset based on internal reference (1.6 V) |
|
Slope Int 1P6 | 0x3E | 2 | Slope based on internal reference (1.6 V) |
|
Offset Ext 1P0 | 0x40 | 2 | Offset based on external reference (1.0 V) |
|
Slope Ext 1P0 | 0x42 | 2 | Slope based on external reference (1.0 V) |
|
Temp | 0x44 | 2 | - | |
Temp_ref(2B) | 0x46 | 2 | Chip temperature sensor reference temperature. Example: Decimal 2618: 26.18°C |
|
unused | 0x48 | 4 | Reserved | |
Clock | HF_OSC_192M | 0x4C | 2 | HF_OSC output 192M clock(Coarse) 0x4000A854 bits[3:0] |
HF_OSC_192M_fine | 0x4E | 2 | HF_OSC output 192M clock(Fine) 0x4000A854 bits[7:6]; 0x4000A850 bits[7:0] If 0x4000A854 bits[7:6] = 0x2 and 0x4000A850 bits[7:0]= 0x23, the value of this section should be written to 0x223. |
|
Unused | 0x50 | 4 | Reserved | |
Flash | Manufacturer | 0x54 | 1 | Flash manufacturers |
Features | 0x55 | 1 | Flash feature type 0bit: 1-support 512-byte write, 0- do not support 512-byte write |
|
tVSL | 0x56 | 1 | VCC(min.) to device operation. Uint: 10 μs Unspecified: 0xFF |
|
tESL | 0x57 | 1 | Erase suspend latency. Uint: 5 μs Unspecified: 0xFF |
|
tPSL | 0x58 | 1 | Program suspend latency. Uint: 5 μs Unspecified: 0xFF |
|
tPRS | 0x59 | 1 | Latency between program resume and next suspend. Uint: 5 μs Unspecified: 0xFF |
|
tERS | 0x5A | 1 | Latency between erase resume and next suspend. Uint: 5 μs Unspecified: 0xFF |
|
tDP | 0x5B | 1 | CS# High to deep power-down mode. Uint: 5 μs Unspecified: 0xFF |
|
tRES2 | 0x5C | 1 | CS# High to standby mode with electronic signature read. Unit: 5 µs Unspecified: 0xFF |
|
tRDINT | 0x5D | 1 | Read status register interval after write operation. Unit: 5 µs Unspecified: 0xFF |
|
Unused | 0x5E | 2 | Reserved | |
Unused | Unused | 0x60 | 160 | - |
The information stored in Security Register #2 is shown in the following table.
Bass address | Peripheral | Instance | Size (Byte) | Description |
---|---|---|---|---|
0x002000 |
Flash |
OTP |
512 |
OTP security register #2 |
Module | Symbol | Offset | Size (Byte) | Description |
---|---|---|---|---|
Bluetooth LE/RF | BT_ADDR | 0x00 | 6 | - |
XO_32M | 0x06 | 2 | - | |
Unused | 0x08 | 24 | Reserved for Goodix firmware design | |
User | User_Start | 0x20 | 480 | Reserved for users |
User_End | 0x1FF |
Write and Erase
Flash memory can be read by CPU for an unlimited number of times through the XIP controller, but can only be written or erased for a limited number of times (up to 100000 cycles). The write approach is restricted.
Writing to Flash memory is managed by the on-chip XQSPI controller.
-
Before performing a write/erase, the XIP controller should be disabled for writing and the on-chip XQSPI controller should be enabled.
-
After writing/erasing, software needs to restore the XIP controller. Users must make sure that writing and erasing are not performed at the same time. If the CPU executes code from Flash while the on-chip XQSPI controller is writing to Flash, the CPU execution will be halted. In this case, use SRAM instead.
The Flash memory is only able to write “0” to erased bits, but it cannot write a bit back to 1. Therefore, a page action must be performed before a write to the same address in Flash.
After erasing a Flash page, all bits in the page are set to 1. The CPU will stop working if the CPU executes code from the Flash while the XQSPI controller performs erase operations.
XQSPI Controller
The XQSPI-AHB controller can be configured as a master or a slave by software. Reading from/writing to the core is done on the AMBA AHB interface. The core operates in various data modes from 4 bits to 32 bits (8 modes are supported in multiples of 4 data bits). The data is then serialized and transmitted, either LSB or MSB first, using the standard 4-wire SPI bus interface or the extended Quad mode bus.
In the diagram, there are several clock domains represented: AHB clock (blue), External Master SCLK (green), and Slave SCLK (purple). The FIFOs serve as the clock boundary between the AHB clock and the SPI bus clock.
Production of the SPI bus clock varies from the Master/Slave mode:
- In Master mode, the QSPI core will produce the SPI bus clock directly from the External Master SCLK; in Slave mode the QSPI core will receive the SPI bus clock from another SPI Master.
- In Master mode, the Master Serializer and the right side of the FIFOs (purple) use the same clock; in Slave mode, the Slave Serializer (red) is clocked directly by the gapped SPI bus clock sclkIn/sclkInN, and the right side of the FIFOs (purple) is clocked by HCLK. Either ssIn or an internal signal that compares the receive data count against a terminal value is synchronized to the HCLK domain. FIFO pointers are advanced on the rising edge of this active-high, synchronized signal.
The master bit is responsible for directing much of the internal multiplexing. One other signal is also very important: enable. The enable bit is synchronized to the different clock domains (when appropriate). These synchronized enable signals, along with proper programming order of the AHB registers, allow logic on other domains to safely sample other AHB register values and control bits, since they are guaranteed to be stable once enable goes high. In short, the enable bit serves as a master synchronizer for the entire module.
SPI Data Link
Data is transmitted synchronously with MOSI (Master Out, Slave In) relative to the SCLK generated by the master device. The master also receives data on the MISO (Master In, Slave Out) signal in a full duplex fashion.
DMA Operation
The XQSPI-AHB module is compatible with various industry-standard DMA controllers. DMA operation in the IPC-QSPIAHB can be enabled to assist a DMA controller in the loading (writing) of the transmit FIFO, and the unloading (reading) of the receive FIFO. Note that dmaSreq indicates whether the receive FIFO is not empty, while dmaBreq indicates whether the receive FIFO level is above the programmed watermark level. It is the responsibility of the DMA Controller to implement the appropriate read transfer that does not underflow the RX FIFO.
Execute-In-Place (XIP)
The Flash supports the Execute-in-place (XIP) functionality for Flash of several industry standards. The XIP mode allows an AHB Master to directly read contents from a device of one of the industry standards simply by reading from the address space of the QSPI controller. This is useful in many situations. For example, when a processor executes boot instructions from a QSPI Flash device, or a DMA copy function where a DMA controller reads the contents of the QSPI Flash device then writes the data to a RAM elsewhere in the system.
Flash Operation Interfaces
To ensure the safety of Flash operations, all the Flash operation interfaces are provided in the form of APIs by the SDK. Users do not need to care about the implementation details of Flash operations.
Electrical Specifications
Description | Min. | Typ. | Max. | Unit |
---|---|---|---|---|
Erase suspend latency |
30 |
μs |
||
Program suspend latency |
30 |
μs |
||
Latency between program resume and next suspend |
0.3 |
μs |
||
Latency between erase resume and next suspend |
0.3 |
μs |
||
Page program time (up to 256 bytes) |
2 |
3 |
ms |
|
Page erase time |
8 |
20 |
ms |
|
Sector erase time |
8 |
20 |
ms |
|
Chip erase time |
8 |
20 |
ms |
|
Program / Erase cycles |
100000 |
Cycles |
Cache
Introduction
A cache is a block of high-speed memory locations containing address information (known as a tag), and the associated data. The purpose of cache controller is to enhance system performance by speeding up CPU execution from Flash and reducing power consumption by minimizing Flash access. The Data RAM inside the cache controller stores instructions and data in blocks of cache-line size.
Main Features
-
Direct mapping mode
-
4-way set associative mapping mode with LFU replacement policy
-
8 KB cache memory (Data RAM) and 16 bytes per cache line
-
Profiling counters
Block Diagram
Functional Description
Cacheable Range
The cache controller caches address range 0x20_0000 – 0x11F_FFFF (16 MB). There are two ways for CPU to read data from Flash:
- Cache mode: Cache controller sends request to XQSPI (green line way).
- Bypass mode: CPU directly sends request through AHB bus to XQSPI (blue line way).
There are also two ways to switch between two modes.
- Configure the DIS bit in the CTRL0 register of cache to control mode selection. When DIS = 1, the bypass mode is used; When DIS = 0, the cache mode is used.
- Automatic hardware switching according to address. When CPU reads Flash at Flash address, hardware automatically switches to cache mode. When CPU reads Flash at Flash alias address, hardware automatically switches to bypass mode. The following figure shows the allocation of Flash address and Flash alias address.
Address Segmentation
Flash main address is 0x20_0000–0x11F_FFFF. It is 16 MB. The address bit width is 24 bits. Cache size is 8 KB. Each cache way size is 2 KB. Cache line size is 128 bits. Address segmentation is shown below.
Mapping Mode
To enable the associativity reconfiguration, the cache controller supports 4-way set associative cache and direct-mapped cache. Each cache way size is 2 KB in 4-way cache.
Configure the DIRECT_MAP_EN bit in the CTRL0 register of cache to select the cache mapping mode. When DIRECT_MAP_EN = 1, the direct-mapped cache is used; When DIRECT_MAP_EN = 0, the 4-way set associative cache mode is used.
Replacement Strategy
The cache controller uses the Least Frequency Used (LFU) algorithm in 4-way set associative cache mode. Each cache way in the controller has a hit_cnt that calculates the number of cache way hits. When a replacement is required, the way with the lowest hit_cnt will be replaced. In a case where the hit_cnt values of two or more ways equal, the substitution follows the priority: way3 > way2 > way1 > way0.
In direct-mapped cache mode, the controller directly replaces the entire cache when a cache miss occurs.
Miss Rate Monitor
The cache includes profiling counters. To measure the performance (hit-rate) of a code-section, the built-in profiling counters should be enabled. Start the profiling counters by resetting HITMISS bit in the CTRL0 register of cache. At the end of the section, the number of cache hits and cache misses for the section can be read from the HIT_COUNT and MISS_COUNT registers, and users can stop the counters by setting the HITMISS bit.
The cache hit-rate can be calculated based on the formula: HIT_COUNT/( HIT_COUNT + MISS_COUNT). The HIT_COUNT and MISS_COUNT registers do not wrap around after reaching the maximum value. If the maximum value is reached, consider profiling for a shorter duration to get correct numbers.