Tiny6410, impossible to boot again after sometime

eric3
Hi,

I'm using a Tiny 6410 board, Linux embedded, regular U-boot, booting from
Nand Flash.

Config  : 
------------------------------------
U-Boot 1.1.6 (Apr  6 2011 - 14:17:30) for FriendlyARM MINI6410
CPU:     S3C6410@532MHz
         Fclk = 532MHz, Hclk = 133MHz, Pclk = 66MHz, Serial = CLKUART (SYNC
Mode) 
Board:   MINI6410
DRAM:    256 MB
Flash:   0 kB
NAND:    2048 MB 
------------------------------------

My application runs correctly during a couple of days or weeks.
And suddenly, my board is not able to boot anymore. 
The Console (TTY) shows me that Uboot is stucked in the menu, and the key
selection to boot the kernel has no effect. It restarts and stays in the
Uboot menu.

Sometimes it repports me a bad CRC warning when analysing the Nand.
It seems that Nand flash was altered.

The only way to make my application working again is to flash the Nand with
Kernel (Zimage), UBIFS and EXT3.

Anyone already facing this problem?

My application is an embeded system, so I cannot restore the system in this
way.

Regards,
Eric3

Juergen Beisert
And let me guess: you erase your NAND completely prior writing its new
content?
If yes, I would also guess your NAND reached the end of its lifetime.

eric3
Hi Juergen,

Thanks for your interest.

No, I never erase the NAND. The only things I've done is to programm
several times the NAND with new code.

I already deployed 20 systems based on this Tiny 6410 core module, I would
say without any problems, I we just perform update of the firmware by
writing the new code in flash thanks to the Tiny Board and the SD card
Boot.

I'm quite sure this is not a problem of "end of life" of the NAND.

Eric3

Juergen Beisert
Hi eric,

> No, I never erase the NAND.

You did. To program a NAND you *must* erase it first. And if your bad block
management is broken or non existent after such an erase the NAND *seems*
to work again. Only to start failing again very fast.

How fast a NAND wears out depends on how often you change the data on it.
*Each* change needs an erase. So, if you have a high filesystem activity on
this NAND memory (for example logfiles) it may needs a few weeks or only a
few days to reach the end of life of this kind of memory.

eric3
Hi Juergen,

Ok, I agree : I erase the Nand. What I mean by "I no erase the Nand", I do
no do it by myself, I'm simply using the "superboot-20110405.bin" provided
(No idea of the source contents).

Maybe, this is one of my problems. But I saw "Bad Nand CRC" with a
completly new Tiny Core Module.

My idea of the problem was more oriented on an absence of "proper linux
shutdown" before power down. I read literature on it and it could explain
my troubles. 
I've tried to reproduce the bug, without any success. So i'm not sure the
patch I've done solve my problem.

Any idea?

Eric3

Juergen Beisert
CRC errors can have many sources. One source can be the wrong formating of
the image one writes to the NAND memory. For example a JFFS2 image must be
generated in accordance to the erase block size of the NAND. If not, you
will get many funny filesystem errors later on which includes CRC errors.

And yes, another cause can be the improper shutdown. Older flash aware
filesystems are still fighting with this issue. But you seem to use UBIFS,
its currently the best solution for that issue.
The next cause can be the use of recent flash devices. They shrink the die
structure more and more which also increases the error rate. Recent NANDs
need a much more stronger checksum. What checksum generator does your NAND
driver use? The 1-bit or the 4/8-bit type?

Your Mini6410 system comes with a 2 GiB NAND. Does is use the K9WAG08U1B or
a different type?

eric3
Attachment: PB_samsung.PNG (211.87 KB)
Hello Juergen,

Sorry for my late reply, I had to switch to another project that took me
all my energy!

Just to give you more details : This morning, I would like to restart my
application. My 6410Core CPU was stayed on a table since the last use.


When I put it on the Tiny6410 dev kit, I was not able to start the system.
You can have a look on the PB_samsung.png (LOST)

After several try, I decided to flash again the CPU. No problem for
flashing.
See PB_samsung.png (FLASH)

And no Problem after Flash to start my system. 
See PB_samsung.png (START)


In addition, to answer your question regarding the Nand Driver, you can
have a look in the Menu Config.

Below, some information of my runnin system :

S3C NAND Driver, (c) 2008 Samsung Electronics
MLC nand initialized, 2011 ported by FriendlyARM
S3C NAND Driver is using hardware ECC.
NAND device: Manufacturer ID: 0xec, Chip ID: 0xd5 (Samsung NAND 2GiB 3,3V
8-bit)
Creating 3 MTD partitions on "NAND 2GiB 3,3V 8-bit":
0x000000000000-0x000000400000 : "Bootloader"
0x000000400000-0x000000c00000 : "Kernel"
0x000000c00000-0x000080000000 : "File System"
UBI: attaching mtd2 to ubi0
UBI: physical eraseblock size:   1048576 bytes (1024 KiB)
UBI: logical eraseblock size:    1032192 bytes
UBI: smallest flash I/O unit:    8192
UBI: VID header offset:          8192 (aligned 8192)
UBI: data offset:                16384
UBI: max. sequence number:       0
UBI: volume 0 ("FriendlyARM-root") re-sized from 165 to 2008 LEBs
UBI: attached mtd2 to ubi0
UBI: MTD device name:            "File System"
UBI: MTD device size:            2036 MiB
UBI: number of good PEBs:        2032
UBI: number of bad PEBs:         4
UBI: number of corrupted PEBs:   0
UBI: max. allowed volumes:       128
UBI: wear-leveling threshold:    4096
UBI: number of internal volumes: 1
UBI: number of user volumes:     1
UBI: available PEBs:             0
UBI: total number of reserved PEBs: 2032
UBI: number of PEBs reserved for bad PEB handling: 20
UBI: max/mean erase counter: 1/0
UBI: image sequence number:  92685634
UBI: background thread "ubi_bgt0d" started, PID 641
PPP generic driver version 2.4.2
PPP Deflate Compression module registered
PPP BSD Compression module registered
PPP MPPE Compression module registered
NET: Registered protocol family 24


Thank you for your help,
Regard,
Eric

eric3
Attachment: my_current_linux_config (56.04 KB)
And this is my Linux Config.

Juergen Beisert
I would continue to guess your NAND memory is broken. And: You shouldn't
erase the NAND in a hard way. You must use the corresponding UBI tool to do
so, because you need to keep the block's erase counters.
And I don't know how reliable the MLC driver is ("MLC nand initialized,
2011 ported by FriendlyARM").

Andrew Sampaio
Hi Eric3,

I'm facing the exact same problem (see
http://www.friendlyarm.net/forum/topic/6150#lastpost).

Did you manage to solve it? What did you do?

Thanks in advance.