Hi All, I am using a CAN mcp2515 controller attached to a MINI6410 running emdebian. I have tried two Linux kernels 2.6.36 and 2.6.28. Both exhibit the same problem. The mcp2515 uses a external interrupt pin, in my case IRQ_EINT(16), which works about 200 times over 15 minutes before the system stops working. I have done a number of things which lead to the conclusion that the problem is in the Linux kernel. Test I have done to date: - inserted debug messages into the IRQ work handling routine to check if all interrupts are cleared when the routine is exited. (No sign of a problem here) - monitored the interrupt line and the SPI CS line on an oscilloscope. The result is that the all communication between mini6410 and mcp2515 are displayed correctly until such time that the interrupt line goes low and no SPI transfers occur. The interrupt line remains low until a restart of the socketcan interface is done. Even though the interrupt line is low, it is still possible send CAN messages out onto the bus. Only receiving messages freezes. - I have tried to register the interrupt as level triggered rather then edge triggered (default) which resulted in a complete freeze of the system. Is there any one out there having dealt with a similar problem and solved it? Or has any body got an idea as to what might be wrong? Thanks for any help in advance.
mini6410 misses external interrupt
Are you sure that the MCP2515 is still generating the interrupts on receive? Is it possible the device is going OFF BUS? Can you check the registers for this condition? This sounds more like a driver issue with the MCP2515 than with the interrupts on the MINI6410. Dave...
Hi Dave, I though in the first place that this might be a driver issue and I have tried about 3 different version of the driver. The fact is though that I monitor the interrupt line and the chip select line on the MCP2515 using a scope. For a working interrupt I see the interrupt line going low for less then 400us, in that time I see a short low on the CS (reading the interrupt flags) and a second longer low on the CS (reading the RX buffer), CS and the interrupt line go high together after the RX buffer read. The last interrupt that occurs when the system hangs, the interrupt goes low and I see no activity on the CS line at all. The scope captures about 5ms after the interrupt went low. I have done the same test in the driver interrupt handling routine by inserting a counter and debug messages. The counter value and cat /proc/interrupts show the same value. One more thing, the driver reads the interrupt flags a second time, which causes CS to go low a second time after the interrupt went high again. The debug messages always say that that the RX0 int flags was set and that all interrupts were cleared on exiting the interrupt routine. The only driver issue that I see here is that a higher level of the socketcan modules prohibits the interrupt handler from being executed. I have also double checked, the first drivers used a call back that dealt with the actual interrupt and the ISR just scheduled that call back. The latest version got rid of the call back and deals with the interrupt in the ISR. The behaviour that I observe is the same for all drivers. Regards Cyberhippy
Hi adam, I have no progress to report, just one more test that assures that it must be the Linux kernel that misses the interrupt. I have tried to regenerate the missed interrupt which unstuck the driver. As the interrupt line is a data line I have done the unspeakable. When the interrupt got stuck I have shorted the low interrupt line momentary to Vdd and thus regenerated the falling edge of the interrupt. The driver picked this new interrupt and all works just fine, until it gets stuck again ;-(. Examining the interrupt code of the Linux kernel I have the suspicion that this situation is related to the ethernet chip. Both the MCP2515 and the ethernet chip use the same external interrupt bank were the ethernet chip uses a higher priority. The interrupt handler iterates through all interrupts in a bank and seems to stop once it finds one that reported that the interrupt was handled. Thus it stops searching before the MCP2515 interrupt is handled. (a downfall of edge triggered interrupts) The way to prove that this is the case is to move the interrupt to another bank which requires physical rework. Else one could fix the kernel external interrupt handler (my preferred fix). I will implement a fix once I have some time spare or someone else, who knows more about s3c64xx external interrupt handling, can point me to the right direction to fix the kernel. For now I use a cron job to restart the interface every 5 min. Thanks for asking cyberhippy
Maybe this issue is worth to be reported on the ALKML[1] mailing list. If the logic misses edge triggered interrupts you should use low level triggered interrupts as a workaround. [1] linux-arm-kernel@lists.infradead.org
Hi All The point is INT line in mcp251x is level-triggered interrupt. When MCP251x needs MCU's attention, changes the INT line to 0, now it is the responsibility of MCU (interrupt handler) to clear some flags in MCP251x (CANINTF) which eventually makes MCP251x to change the INT to 1. The device driver for MCP251x (mcp251x.c) register an IRQ with "IRQF_TRIGGERED_FALLING" flag which means interrupt handler would be invoked if there is a falling edge signal in interrupt line. Mixing two different kind of triggering leads to some problems which finally communication between MCP251x and MCU stops. For example if by any chance MCU misses one edge it won't be able to clear the MCP251x's interrupt flags and INT line remains low. but MCU is supposed to service this interrupt line as long as it is low. As a solution for this problem I use "IRQF_TRIGGERED_LOW | IRQF_ONESHOT" for IRQ registration and it works fine I also attached a patch file