summaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms/pseries/eeh_driver.c
AgeCommit message (Collapse)Author
2007-12-11[POWERPC] EEH: Avoid a possible NULL pointer dereferenceStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-03[POWERPC] EEH: Report errors as soon as possibleLinas Vepstas
Do not wait for the pci slot status before reporting an error to the device driver. Some systems may take many seconds to report the slot status, and this can confuse unsuspecting device drivers. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08[POWERPC] EEH: Drivers that need reset trump othersLinas Vepstas
Bugfix: if a driver controlling one part of a multi-function PCI card has asked for a reset, honor that request above all others. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08[POWERPC] EEH: Clean up commentsLinas Vepstas
Clean up commentary, remove dead code. Signed-off-by Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-14[POWERPC] Tweak EEH copyright infoLinas Vepstas
Twiddle the copyright notices. Per current guidelines, the use of the (C) or (c) in source code is deprecated. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c | 6 +++++- arch/powerpc/platforms/pseries/eeh_cache.c | 3 ++- arch/powerpc/platforms/pseries/eeh_driver.c | 6 +++--- 3 files changed, 10 insertions(+), 5 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-10[POWERPC] Assorted janitorial EEH cleanupsLinas Vepstas
Assorted minor cleanups to EEH code; -- use literals, use kerneldoc format. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c | 13 ++++++++++--- arch/powerpc/platforms/pseries/eeh_driver.c | 7 ++++--- include/asm-powerpc/ppc-pci.h | 18 +++++++++++++++--- 3 files changed, 29 insertions(+), 9 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09[POWERPC] EEH: Split up long error msgLinas Vepstas
Make some minor adjustments to the EEH error messages. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09[POWERPC] EEH: log error only after driver notification.Linas Vepstas
It turns out many/most versions of firmware enable MMIO when the slto-error-detail rtas call is made (in violation of the architecture). Thus, it would be best to call slot-error-detail only after notifying device drivers of a freeze, as otherwise, a variety of strange and unexpected things may happen. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] Rename get_property to of_get_property: arch/powerpcStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22[POWERPC] EEH: restructure multi-function supportLinas Vepstas
Rework how multi-function PCI devices are identified and traversed. This fixes a bug with multi-function recovery on Power4 that was introduced by a recent Power4 EEH patch. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22[POWERPC] EEH: verify state changeLinas Vepstas
After requesting a state change, verify that the state change actually ocurred, and the system ends up in the expected state. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22[POWERPC] EEH: rm un-needed dataLinas Vepstas
The EEH event notification system passes around data that is not needed or at least, not used properly. Stop passing this data; get it in a more reliable fashion. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22[POWERPC] EEH: multifunction recovery bugfixLinas Vepstas
If the second or higher function of a multi-function device fails to recover, this failure is not reported upwards. Fix this. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22[POWERPC] EEH: hotplug recovery bugfixLinas Vepstas
If a device driver does not have native PCI error recovery, a hotplug error recovery will be attemped. In this case, the device driver will not report back whether its healthy or not; simply assume that it is. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22[POWERPC] EEH: Add clarifying messages.Linas Vepstas
There are multiple code patchs tht resuls in a "permanent failure"; when examining rare events, it can be hard to see which was taken. This patch adds printk's to assist. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-01-24[POWERPC] Clarify EEH error messageLinas Vepstas
Clarify error message re EEH permanent failure. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-08[POWERPC] EEH recovery tweaksLinas Vepstas
If one attempts to create a device driver recovery sequence that does not depend on a hard reset of the device, but simply just attempts to resume processing, then one discovers that the recovery sequence implemented on powerpc is not quite right. This patch fixes this up. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-09-21[POWERPC] EEH: support MMIO enable recovery stepLinas Vepstas
Update to the PowerPC PCI error recovery code. Add code to enable MMIO if a device driver reports that it is capable of recovering on its own. One anticipated use of this having a device driver enable MMIO so that it can take a register dump, which might then be followed by the device driver requesting a full reset. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-09-21[POWERPC] EEH: code comment cleanupLinas Vepstas
Clean up subroutine documentation; mostly formatting changes, with some new content. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-31[POWERPC] pseries: Constify & voidify get_property()Jeremy Kerr
Now that get_property() returns a void *, there's no need to cast its return value. Also, treat the return value as const, so we can constify get_property later. pseries platform changes. Built for pseries_defconfig Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-30typo fixes: occuring -> occurringAdrian Bunk
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-21[POWERPC] pseries: Print PCI slot location code on failureLinas Vepstas
The PCI error recovery code will printk diagnostic info when a PCI error event occurs. Change the messages to include the slot location code, which is how most sysadmins will know the device. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-19[PATCH] powerpc/pseries: Increment fail counter in PCI recoveryLinas Vepstas
When a PCI device driver does not support PCI error recovery, the powerpc/pseries code takes a walk through a branch of code that resets the failure counter. Because of this, if a broken PCI card is present, the kernel will attempt to reset it an infinite number of times. (This is annoying but mostly harmless: each reset takes about 10-20 seconds, and uses almost no CPU time). This patch preserves the failure count across resets. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-22[PATCH] powerpc/pseries: clear PCI failure counter if no new failuresLinas Vepstas
The current PCI error recovery system keeps track of the number of PCI card resets, and refuses to bring a card back up if this number is too large. The goal of doing this was to avoid an infinite loop of resets if a card is obviously dead. However, if the failures are rare, but the machine has a high uptime, this mechanism might still be triggered; this is too harsh. This patch will avoids this problem by decrementing the fail count after an hour. Thus, as long as a pci card BSOD's less than 6 times an hour, it will continue to be reset indefinitely. If it's failure rate is greater than that, it will be taken off-line permanently. This patch is larger than it might otherwise be because it changes indentation by removing a pointless while-loop. The while loop is not needed, as the handler is invoked once fo each event (by schedule_work()); the loop is leftover cruft from an earlier implementation. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-01[PATCH] powerpc/pseries: fix device name printing, again.Linas Vepstas
The recent patch to print device names in EEH reset messages was lacking ... this patch works better. Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-01[PATCH] powerpc/pseries: print message if EEH recovery failsLinas Vepstas
The current code prints an ambiguous message if the recovery of a failed PCI device fails. Give this special case its own unique message. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27[PATCH] powerpc/pseries: Cleanup device name printing.Linas Vepstas
This avoids printk'ing a NULL string. Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-28[PATCH] powerpc: fix NULL pointer in handle_eeh_eventsOlaf Hering
This patch fixes a crash in handle_eeh_events, but ethtool -t still doesnt work right. ... pepino:~ # cpu 0x3: Vector: 300 (Data Access) at [c00000005192bbe0] pc: c00000000004a380: .handle_eeh_events+0xe0/0x23c lr: c00000000004a374: .handle_eeh_events+0xd4/0x23c sp: c00000005192be60 msr: 9000000000009032 dar: 268 dsisr: 40000000 current = 0xc0000001fe7bf1a0 paca = 0xc00000000048b280 pid = 16322, comm = eehd enter ? for help [c00000005192bf00] c00000000004a808 .eeh_event_handler+0xcc/0x130 [c00000005192bf90] c000000000025e00 .kernel_thread+0x4c/0x68 ... (none):/# /usr/sbin/ethtool -i eth0 driver: e100 version: 3.5.10-k2-NAPI firmware-version: N/A bus-info: 0000:21:01.0 (none):/# /usr/sbin/ethtool -t eth0 Call Trace: [C00000000F8DEFF0] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000F8DF0A0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000F8DF150] [C000000000049E58] .eeh_check_failure+0x10c/0x138 [C00000000F8DF1E0] [C0000000002DFDB0] .e100_hw_reset+0x70/0xf4 [C00000000F8DF270] [C0000000002E1BBC] .e100_hw_init+0x2c/0x260 [C00000000F8DF310] [C0000000002E2464] .e100_loopback_test+0x8c/0x220 [C00000000F8DF3C0] [C0000000002E28DC] .e100_diag_test+0xdc/0x16c [C00000000F8DF490] [C000000000420BE0] .dev_ethtool+0xf24/0x14f8 [C00000000F8DF8F0] [C00000000041F4A8] .dev_ioctl+0x5cc/0x740 [C00000000F8DFA20] [C00000000040FEFC] .sock_ioctl+0x3d0/0x404 [C00000000F8DFAC0] [C0000000000D513C] .do_ioctl+0x68/0x108 [C00000000F8DFB50] [C0000000000D56B0] .vfs_ioctl+0x4d4/0x510 [C00000000F8DFC10] [C0000000000D5740] .sys_ioctl+0x54/0x94 [C00000000F8DFCC0] [C0000000000FB6EC] .ethtool_ioctl+0x11c/0x150 [C00000000F8DFD60] [C0000000000F7E40] .compat_sys_ioctl+0x338/0x3bc [C00000000F8DFE30] [C00000000000871C] syscall_exit+0x0/0x40 EEH: Detected PCI bus error on device 0000:21:01.0 EEH: This PCI device has failed 1 times since last reboot: <NULL> - modprobe: FATAL: Could not load /lib/modules/2.6.16-rc4-git7/modules.dep: No such file or directory Cannot get strings: No such device (none):/# (none):/# EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2 (none):/# Call Trace: [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154 [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8 [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110 [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250 [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140 [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68 EEH: Detected PCI bus error on device <NULL> EEH: This PCI device has failed 1 times since last reboot: <NULL> - EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2 Call Trace: [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154 [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8 [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110 [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250 [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140 [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68 EEH: Detected PCI bus error on device <NULL> EEH: This PCI device has failed 1 times since last reboot: <NULL> - EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2 Call Trace: [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154 [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8 [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110 [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250 [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140 [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68 EEH: Detected PCI bus error on device <NULL> and so on Signed-off-by: Olaf Hering <olh@suse.de> Acked-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07[PATCH] eeh_driver NULL noise removalAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-01-10powerpc: Fix up some compile errors in the PCI error recovery codePaul Mackerras
<asm/systemcfg.h> is gone now, and the PCI error recovery constants in include/linux/pci.h changed their names in the process of getting accepted. Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 5a2516156c591fc3d2059fbd93f97e15eb6010d6 commit)
2006-01-10[PATCH] powerpc: handle multifunction PCI devices properlyLinas Vepstas
239-eeh-multifunction-consolidate.patch New-style firmware will often place multiple different functions under a non-EEH-aware parent. However, these devices might share a common PE "partition endpoint" and config address, ad thus any EEH events will affect all of the devices in common. This patch makes the effort to find all of these common devices and handle them together. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 216810296bb97d39da8e176822e9de78d2f00187 commit)
2006-01-10[PATCH] powerpc: Don't continue with PCI Error recovery if slot reset failed.Linas Vepstas
238-eeh-stop-if-reset_failed.patch If the firmware is unable to reset the PCI slot for some reason, then don't attempt any further recovery steps after that point. Instead, mark the device as permanently failed. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from e06b942521eb2cdaf232726f45a820d5837acb12 commit)
2006-01-10[PATCH] powerpc: Remove duplicate codeLinas Vepstas
234-eeh-find-pe.patch The find_device_pe() routine is duplicated in two files. Remove one of the two copies, declare the other extern. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 48408e708282d4d0269136ff27ea5acbd9410b5a commit)
2006-01-10[PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routinesLinas Vepstas
Various PCI bus errors can be signaled by newer PCI controllers. The core error recovery routines are architecture dependent. This patch adds a recovery infrastructure for the PPC64 pSeries systems. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)