'/home/root' user partition became 'read-only' suddenly

We found one of the failure unit during some stress test showed a weird issue behavior – ‘/home/root/’ partition became ‘read-only’, NOT ‘read/write’.
Due to this, the application keeps getting crashed because the logger object can’t be instantiated and rerun by monitoring thread repeatedly.
We’ve never seen this behavior before and the only possible way to recover seems reprogramming or reflashing FW – but not yet checked this recovery method.
My questions regarding this issue are,

  1. What can make ‘/home/root/’ partition ‘read-only’? What’s the possible scenarios about the root cause of this issue?
  2. What’s the way to recover when this issue is encountered? Reflashing FW is the only way? Or what else?

Thanks!

never seen this before, but you can try download to FW R11 and try swiflash to recover the module ( assume you are using WP7 module):

I need the answer about the question below rather than recovery method suggestion

  1. What can make ‘/home/root/’ partition ‘read-only’? What’s the possible scenarios about the root cause of this issue?

What I need are

  1. What can make ‘/home/root/’ partition ‘read-only’? What’s the possible scenarios about the root cause of this issue?
  2. About recovery method, I want to confirm if there is any other way than reflashing FW, i.e. in running kernel.

Have you compared the boot message with a working module and see which line becomes different?

I believe the partition mounting is done by /etc/init.d/mount_early, not sure if you can compare this file

Unfortunately I don’t have the failure device in my hand. It’s in testing 3rd party in India.
Can you kindly provide the detailed step they can follow easily? The testers have bare knowledge on the technical background…

The boot message is in uart! You can capture it during boot up

I asked 3rd Party test to share the kernel log(’/home/kernel.log’ from the failure unit. Once I get it, I will share it for your analysis.
By the way, 3rd party reported the recovery command failed as follows. FW version is R9. Can you think what the issue is?

It cannot go into download mode, is it keeping on reset?

That’s what I know directly from the message too. Any else? It keeps failing!

Then can you read the message directly and knowing if it is resetting?
Have you compared the boot message with a working module and see which line becomes different?

Here is the snippet about why UBIFS suddenly became read-only from Linux MTD forum and how to check.


UBIFS suddenly became read-only - what is this?

Read-write UBIFS file-system may suddenly become read-only because of an error. This is how UBIFS reacts on unexpected errors which it cannot properly handle - it immediately switches to read-only mode in order to protect the data from any possible further corruption.

If this happened, you should look at UBIFS-related dmesg messages. UBIFS usually prints error messages before switching to read-only mode. The messages may shed some light on what happened. Feel free to ask for help from the MTD mailing list. If you think this is an UBIFS bug, please, send a bug report.

How do I detect if UBIFS became read-only?

If you use up-to-date UBIFS which includes commit 2fde99cb55fb9d9b88180512a5e8a5d939d27fec ( UBIFS: mark VFS SB RO too ), then you should be able to find this out from /proc/mounts . You should also be able to use something like inotify to catch events when UBIFS becomes R/O (e.g., due to some errors).

Still this answer from forum doesn’t locate the exact code location in UBIFS driver that make UBIFS read-only when it faces any non-recoverable error and even doesn’t tell what non-recoverable errors are there.
Looking into kernel sources, I could locate the UBIFS kernel codes related with the non-recoverable errors that set UBIFS read-only mode as follows.

  1. If the read-only happened following any base station reboot or manual power cycle reboot, then it might be related mostly with Journaling, I/O related errors.
  2. Otherwise i.e. it happened following the progressively happening file-system full, it might be related with commit errors or possibly I/O, journaling too.

[Commit related]

  1. error in do_commit() – logical block to physical block commit

ubifs_err(c, “commit failed, error %d”, err)

  1. error in ubifs_bg_thread()

err = ubifs_bg_wbufs_sync©;

*ubifs_bg_wbufs_sync - synchronize write-buffers.

[IO related]

  1. error in ubifs_leb_write()

ubifs_err(c, “writing %d bytes to LEB %d:%d failed, error %d”,len, lnum, offs, err);

  1. error in ubifs_leb_change()

ubifs_err(c, “changing %d bytes in LEB %d failed, error %d”, len, lnum, err);

  1. error in ubifs_leb_unmap()

ubifs_err(c, “unmap LEB %d failed, error %d”, lnum, err);

  1. error in ubifs_leb_map()

ubifs_err(c, “mapping LEB %d failed, error %d”, lnum, err);

  1. error in next_sqnum()
  • next_sqnum - get next sequence number

ubifs_err(c, “sequence number overflow %llu, end of life”, sqnum);

  1. error in ubifs_bg_wbufs_sync()
  • ubifs_bg_wbufs_sync - synchronize write-buffers

ubifs_err(c, “cannot sync write-buffer, error %d”, err);

  1. error in ubifs_sync_wbufs_by_inode()
  • ubifs_sync_wbufs_by_inode - synchronize write-buffers for an inode

err = ubifs_wbuf_sync_nolock()

*ubifs_wbuf_sync_nolock - synchronize write-buffer.

[Journaling related]

  1. error in ubifs_jnl_update()
  • ubifs_jnl_update - update inode
  1. error in ubifs_jnl_write_data()

*ubifs_jnl_write_data - write a data node to the journal

  1. error in ubifs_jnl_write_inode()
  • ubifs_jnl_write_inode - flush inode to the journal
  1. error in ubifs_jnl_delete_inode()

*ubifs_jnl_delete_inode - delete an inode

  1. error in ubifs_jnl_rename()
  • ubifs_jnl_rename - rename a directory entry
  1. error in ubifs_jnl_truncate()
  • ubifs_jnl_truncate - update the journal for a truncation
  1. error in ubifs_jnl_delete_xattr()
  • ubifs_jnl_delete_xattr - delete an extended attribute
  1. error in ubifs_jnl_change_xattr()
  • ubifs_jnl_change_xattr - change an extended attribute