System

Q1: When the product has APP & SYS abnormalities, how to implement the abnormal protection mechanism?

  • Setting up a watchdog mechanism in the main process can ensure that it reboots when the APP and SYS are abnormal;

  • Kernel sets CONFIG_PANIC_TIMEOUT to non-zero. When panic, it will restart after a few seconds according to the set value.

Q2: How to locate the crash problem without connecting to the serial port printing?

  • Open coredump to handle segfaults at the application layer, and then parse coredump to locate the problem.

    1. Add the following contents to profile:

      ulimit -c unlimited
      echo "if [ -e /etc/core.sh ]; then" >> ${OUTPUTDIR}/rootfs/etc/profile
      echo ' echo "|/etc/core.sh %p" > /proc/sys/kernel/core_pattern' >> ${OUTPUTDIR}/rootfs/etc/profile
      echo "chmod 777 /etc/core.sh" >> ${OUTPUTDIR}/rootfs/etc/profile
      echo "fi;" >> ${OUTPUTDIR}/rootfs/etc/profile
      
    2. Add the following contents in etc/core.sh:

      #!/bin/sh
      /bin/gzip -1 > /config/coredump.process_$1.gz     -->Define a path to generate the coredump file, which must be writable to avoid failure to generate coredump; at the same time confirm that there is a gzip command
      sync
      

      The method of parsing coredump is as follows:

      1. gunzip coredump_xx.zip;

      2. arm-linux-gnueabihf-gdb APP(Generate the app corresponding to coredump)

      3. set solib-search-path ../../sdk/verify/application/zk_full/lib/ (Lib corresponding to app link)

      4. core-file coredump.process_xxx

      5. bt

      Coredump is only used in the development phase, and needs to be closed during the mass production phase, otherwise the flash partition will burst, just delete the content in step 2 to close it.

  • Save the log of the kernel panic, refer to the following:

    Using the Linux native CONFIG_MTD_OOPS mechanism, register the corresponding flash mtdoops_do_dump function through kmsg_dump_register, call mtdoops_do_dump when OOPS or panic occurs in the kernel, and write kmsg information to the corresponding mtd partition to save the kernel exception log before restart.

    The changes are as follows:

    1. In the flash partition, create a 128KB kmsglog partition to save panic abnormal logs (remember /dev/mtdblocknum corresponding to kmsglog partition, and use cat to view the kmsg content on the node later)

      Note: The size of the newly created partition is greater than or equal to the erasesize*2 of the corresponding flash to avoid errors reported by insmod mtdoopss:

    2. The kernel changes are as follows:

      1. Enable CONFIG_MTD_OOPS, save the mtd partition corresponding to kmsg;

      2. Set the record_size (64K) of kmsg that needs to be saved according to the actual situation. Save from the position where the panic log is printed, generally the backtrace will not be larger than 64K;

      3. Save kmsg information circularly in the set mtd partition. For example, the currently saved kmsg size is 64K and the mtd size is 128K. A total of two kernel panic logs can be saved. If more than twice, the first saved kmsg information will be overwritten.

      4. Realize the write flash function in mtdoops_do_dump registered by kmsg, generally use the write function corresponding to flash;

      Instructions:

      Cat /dev/mtdblocknum in kmsglog partition, or redirects to the file to save.

...