summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2019-12-31EDAC/ghes: Fix grain calculationRobert Richter
[ Upstream commit 7088e29e0423d3195e09079b4f849ec4837e5a75 ] The current code to convert a physical address mask to a grain (defined as granularity in bytes) is: e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); This is broken in several ways: 1) It calculates to wrong grain values. E.g., a physical address mask of ~0xfff should give a grain of 0x1000. Without considering PAGE_MASK, there is an off-by-one. Things are worse when also filtering it with ~PAGE_MASK. This will calculate to a grain with the upper bits set. In the example it even calculates to ~0. 2) The grain does not depend on and is unrelated to the kernel's page-size. The page-size only matters when unmapping memory in memory_failure(). Smaller grains are wrongly rounded up to the page-size, on architectures with a configurable page-size (e.g. arm64) this could round up to the even bigger page-size of the hypervisor. Fix this with: e->grain = ~mem_err->physical_addr_mask + 1; The grain_bits are defined as: grain = 1 << grain_bits; Change also the grain_bits calculation accordingly, it is the same formula as in edac_mc.c now and the code can be unified. The value in ->physical_addr_mask coming from firmware is assumed to be contiguous, but this is not sanity-checked. However, in case the mask is non-contiguous, a conversion to grain_bits effectively converts the grain bit mask to a power of 2 by rounding it up. Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01EDAC, thunderx: Fix memory leak in thunderx_l2c_threaded_isr()Dan Carpenter
[ Upstream commit d8c27ba86a2fd806d3957e5a9b30e66dfca2a61d ] Fix memory leak in L2c threaded interrupt handler. [ bp: Rewrite commit message. ] Fixes: 41003396f932 ("EDAC, thunderx: Add Cavium ThunderX EDAC driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: David Daney <david.daney@cavium.com> CC: Jan Glauber <jglauber@cavium.com> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Sergey Temerkhanov <s.temerkhanov@gmail.com> CC: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20181013102843.GG16086@mwanda Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20EDAC, sb_edac: Return early on ADDRV bit and address type testQiuxu Zhuo
[ Upstream commit dcc960b225ceb2bd66c45e0845d03e577f7010f9 ] Users of the mce_register_decode_chain() are called for every logged error. EDAC drivers should check: 1) Is this a memory error? [bit 7 in status register] 2) Is there a valid address? [bit 58 in status register] 3) Is the address a system address? [bitfield 8:6 in misc register] The sb_edac driver performed test "1" twice. Waited far too long to perform check "2". Didn't do check "3" at all. Fix it by moving the test for valid address from sbridge_mce_output_error() into sbridge_mce_check_error() and add a test for the type immediately after. Delete the redundant check for the type of the error from sbridge_mce_output_error(). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180907230828.13901-2-tony.luck@intel.com [ Re-word commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05EDAC/amd64: Decode syndrome before translating addressYazen Ghannam
[ Upstream commit 8a2eaab7daf03b23ac902481218034ae2fae5e16 ] AMD Family 17h systems currently require address translation in order to report the system address of a DRAM ECC error. This is currently done before decoding the syndrome information. The syndrome information does not depend on the address translation, so the proper EDAC csrow/channel reporting can function without the address. However, the syndrome information will not be decoded if the address translation fails. Decode the syndrome information before doing the address translation. The syndrome information is architecturally defined in MCA_SYND and can be considered robust. The address translation is system-specific and may fail on newer systems without proper updates to the translation algorithm. Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode function") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-6-Yazen.Ghannam@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05EDAC/amd64: Recognize DRAM device type ECC capabilityYazen Ghannam
[ Upstream commit f8be8e5680225ac9caf07d4545f8529b7395327f ] AMD Family 17h systems support x4 and x16 DRAM devices. However, the device type is not checked when setting mci.edac_ctl_cap. Set the appropriate capability flag based on the device type. Default to x8 DRAM device when neither the x4 or x16 bits are set. [ bp: reverse cpk_en check to save an indentation level. ] Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-3-Yazen.Ghannam@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()Stephen Douthit
[ Upstream commit 29a3388bfcce7a6d087051376ea02bf8326a957b ] Depending on how BIOS has marked the reserved region containing the 32KB MCHBAR you can get warnings like: resource sanity check: requesting [mem 0xfed10000-0xfed1ffff], which spans more than reserved [mem 0xfed10000-0xfed17fff] caller dnv_rd_reg+0xc8/0x240 [pnd2_edac] mapping multiple BARs Not all of the mmio regions used in dnv_rd_reg() are the same size. The MCHBAR window is 32KB and the sideband ports are 64KB. Pass the correct size to ioremap() depending on which resource we're reading from. Signed-off-by: Stephen Douthit <stephend@silicom-usa.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05EDAC/altera: Use the proper type for the IRQ status bitsDan Carpenter
[ Upstream commit 8faa1cf6ed82f33009f63986c3776cc48af1b7b2 ] Smatch complains about the cast of a u32 pointer to unsigned long: drivers/edac/altera_edac.c:1878 altr_edac_a10_irq_handler() warn: passing casted pointer '&irq_status' to 'find_first_bit()' This code wouldn't work on a 64 bit big endian system because it would read past the end of &irq_status. [ bp: massage. ] Fixes: 13ab8448d2c9 ("EDAC, altera: Add ECC Manager IRQ controller support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com> Cc: James Morse <james.morse@arm.com> Cc: kernel-janitors@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190624134717.GA1754@mwanda Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05EDAC/mc: Fix grain_bits calculationRobert Richter
[ Upstream commit 3724ace582d9f675134985727fd5e9811f23c059 ] The grain in EDAC is defined as "minimum granularity for an error report, in bytes". The following calculation of the grain_bits in edac_mc is wrong: grain_bits = fls_long(e->grain) + 1; Where grain_bits is defined as: grain = 1 << grain_bits Example: grain = 8 # 64 bit (8 bytes) grain_bits = fls_long(8) + 1 grain_bits = 4 + 1 = 5 grain = 1 << grain_bits grain = 1 << 5 = 32 Replace it with the correct calculation: grain_bits = fls_long(e->grain - 1); The example gives now: grain_bits = fls_long(8 - 1) grain_bits = fls_long(7) grain_bits = 3 grain = 1 << 3 = 8 Also, check if the hardware reports a reasonable grain != 0 and fallback with a warning to 1 byte granularity otherwise. [ bp: massage a bit. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190624150758.6695-2-rrichter@marvell.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msecEiichi Tsukata
[ Upstream commit d8655e7630dafa88bc37f101640e39c736399771 ] Commit 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2") assumes edac_mc_poll_msec to be unsigned long, but the type of the variable still remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds write. Reproducer: # echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec KASAN report: BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150 Write of size 8 at addr ffffffffb91b2d00 by task bash/1996 CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 Call Trace: dump_stack+0xca/0x13e print_address_description.cold+0x5/0x246 __kasan_report.cold+0x75/0x9a ? edac_set_poll_msec+0x140/0x150 kasan_report+0xe/0x20 edac_set_poll_msec+0x140/0x150 ? dimmdev_location_show+0x30/0x30 ? vfs_lock_file+0xe0/0xe0 ? _raw_spin_lock+0x87/0xe0 param_attr_store+0x1b5/0x310 ? param_array_set+0x4f0/0x4f0 module_attr_store+0x58/0x80 ? module_attr_show+0x80/0x80 sysfs_kf_write+0x13d/0x1a0 kernfs_fop_write+0x2bc/0x460 ? sysfs_kf_bin_read+0x270/0x270 ? kernfs_notify+0x1f0/0x1f0 __vfs_write+0x81/0x100 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 ? __ia32_sys_read+0xb0/0xb0 ? do_syscall_64+0x1f/0x390 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fa7caa5e970 Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04 RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970 RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001 RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005 R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005 The buggy address belongs to the variable: edac_mc_poll_msec+0x0/0x40 Memory state around the buggy address: ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa >ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa ^ ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fix it by changing the type of edac_mc_poll_msec to unsigned int. The reason why this patch adopts unsigned int rather than unsigned long is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid integer conversion bugs and unsigned int will be large enough for edac_mc_poll_msec. Reviewed-by: James Morse <james.morse@arm.com> Fixes: 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31EDAC/sysfs: Fix memory leak when creating a csrow objectPan Bian
[ Upstream commit 585fb3d93d32dbe89e718b85009f9c322cc554cd ] In edac_create_csrow_object(), the reference to the object is not released when adding the device to the device hierarchy fails (device_add()). This may result in a memory leak. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/1555554438-103953-1-git-send-email-bianpan2016@163.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15EDAC/mpc85xx: Prevent building as a moduleMichael Ellerman
[ Upstream commit 2b8358a951b1e2a534a54924cd8245e58a1c5fb8 ] The mpc85xx EDAC driver can be configured as a module but then fails to build because it uses two unexported symbols: ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined! ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined! We don't want to export those symbols just for this driver, so make the driver only configurable as a built-in. This seems to have been broken since at least c92132f59806 ("edac/85xx: Add PCIe error interrupt edac support") (Nov 2013). [ bp: make it depend on EDAC=y so that the EDAC core doesn't get built as a module. ] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Johannes Thumshirn <jth@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@ozlabs.org Cc: morbidrsa@gmail.com Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-14x86/cpu: Sanitize FAM6_ATOM namingPeter Zijlstra
commit f2c4db1bd80720cd8cb2a5aa220d9bc9f374f04e upstream Going primarily by: https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors with additional information gleaned from other related pages; notably: - Bonnell shrink was called Saltwell - Moorefield is the Merriefield refresh which makes it Airmont The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE for i in `git grep -l FAM6_ATOM` ; do sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \ -e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \ -e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \ -e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \ -e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \ -e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \ -e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \ -e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \ -e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \ -e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \ -e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dave.hansen@linux.intel.com Cc: len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13EDAC, skx_edac: Fix logical channel intermediate decodingQiuxu Zhuo
commit 8f18973877204dc8ca4ce1004a5d28683b9a7086 upstream. The code "lchan = (lchan << 1) | ~lchan" for logical channel intermediate decoding is wrong. The wrong intermediate decoding result is {0xffffffff, 0xfffffffe}. Fix it by replacing '~' with '!'. The correct intermediate decoding result is {0x1, 0x2}. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Aristeu Rozanski <aris@redhat.com> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: linux-edac <linux-edac@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20181009172025.18594-1-tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13EDAC, {i7core,sb,skx}_edac: Fix uncorrected error countingTony Luck
commit 432de7fd7630c84ad24f1c2acd1e3bb4ce3741ca upstream. The count of errors is picked up from bits 52:38 of the machine check bank status register. But this is the count of *corrected* errors. If an uncorrected error is being logged, the h/w sets this field to 0. Which means that when edac_mc_handle_error() is called, the EDAC core will carefully add zero to the appropriate uncorrected error counts. Signed-off-by: Tony Luck <tony.luck@intel.com> [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Cc: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180928213934.19890-1-tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13EDAC, amd64: Add Family 17h, models 10h-2fh supportMichael Jin
commit 8960de4a5ca7980ed1e19e7ca5a774d3b7a55c38 upstream. Add new device IDs for family 17h, models 10h-2fh. This is required by amd64_edac_mod in order to properly detect PCI device functions 0 and 6. Signed-off-by: Michael Jin <mikhail.jin@gmail.com> Reviewed-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20180816192840.31166-1-mikhail.jin@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03EDAC: Fix memleak in module init error pathJohan Hovold
[ Upstream commit 4708aa85d50cc6e962dfa8acf5ad4e0d290a21db ] Make sure to use put_device() to free the initialised struct device so that resources managed by driver core also gets released in the event of a registration failure. Signed-off-by: Johan Hovold <johan@kernel.org> Cc: Denis Kirjanov <kirjanov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 2d56b109e3a5 ("EDAC: Handle error path in edac_mc_sysfs_init() properly") Link: http://lkml.kernel.org/r/20180612124335.6420-1-johan@kernel.org Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03EDAC, i7core: Fix memleaks and use-after-free on probe and removeJohan Hovold
[ Upstream commit 6c974d4dfafe5e9ee754f2a6fba0eb1864f1649e ] Make sure to free and deregister the addrmatch and chancounts devices allocated during probe in all error paths. Also fix use-after-free in a probe error path and in the remove success path where the devices were being put before before deregistration. Signed-off-by: Johan Hovold <johan@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 356f0a30860d ("i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy") Link: http://lkml.kernel.org/r/20180612124335.6420-2-johan@kernel.org Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24EDAC: Add missing MEM_LRDDR4 entry in edac_mem_types[]Takashi Iwai
commit b748f2de4b2f578599f46c6000683a8da755bf68 upstream. The edac_mem_types[] array misses a MEM_LRDDR4 entry, which leads to NULL pointer dereference when accessed via sysfs or such. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20180810141426.8918-1-tiwai@suse.de Fixes: 1e8096bb2031 ("EDAC: Add LRDDR4 DRAM type") Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03EDAC, altera: Fix ARM64 build warningThor Thayer
[ Upstream commit 9ef20753e044f7468c4113e5aecd785419b0b3cc ] The kbuild test robot reported the following warning: drivers/edac/altera_edac.c: In function 'ocram_free_mem': drivers/edac/altera_edac.c:1410:42: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] gen_pool_free((struct gen_pool *)other, (u32)p, size); ^ After adding support for ARM64 architectures, the unsigned long parameter is 64 bits and causes a build warning on 64-bit configs. Fix by casting to the correct size (unsigned long) instead of u32. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: c3eea1942a16 ("EDAC, altera: Add Altera L2 cache and OCRAM support") Link: http://lkml.kernel.org/r/1526317441-4996-1-git-send-email-thor.thayer@linux.intel.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-19x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank typeYazen Ghannam
commit 68627a697c195937672ce07683094c72b1174786 upstream. Currently, bank 4 is reserved on Fam17h, so we chose not to initialize bank 4 in the smca_banks array. This means that when we check if a bank is initialized, like during boot or resume, we will see that bank 4 is not initialized and try to initialize it. This will cause a call trace, when resuming from suspend, due to rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue an IPI but we're running with interrupts disabled. This triggers: WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0 ... Reserved banks will be read-as-zero, so their MCA_IPID register will be zero. So, like the smca_banks array, the threshold_banks array will not have an entry for a reserved bank since all its MCA_MISC* registers will be zero. Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0. Use the "Reserved" type when checking if a bank is reserved. It's possible that other bank numbers may be reserved on future systems. Don't try to find the block address on reserved banks. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.14.x Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-12EDAC, mv64x60: Fix an error handling pathChristophe JAILLET
[ Upstream commit 68fa24f9121c04ef146b5158f538c8b32f285be5 ] We should not call edac_mc_del_mc() if a corresponding call to edac_mc_add_mc() has not been performed yet. So here, we should go to err instead of err2 to branch at the right place of the error handling path. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180107205400.14068-1-christophe.jaillet@wanadoo.fr Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08EDAC, sb_edac: Fix out of bound writes during DIMM configuration on KNLAnna Karbownik
commit bf8486709ac7fad99e4040dea73fe466c57a4ae1 upstream. Commit 3286d3eb906c ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4") decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights Landing which supports up to 6 channels. This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm variables which don't pay critical role on KNL code path, so the memory corruption wasn't causing any visible driver failures. The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that. An alternative solution would be to restructure the KNL part of the driver to 2MC/3channel representation. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anna Karbownik <anna.karbownik@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: jim.m.snow@intel.com Cc: krzysztof.paliswiat@intel.com Cc: lukasz.odzioba@intel.com Cc: qiuxu.zhuo@intel.com Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Fixes: 3286d3eb906c ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4") Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@intel.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_steppingJia Zhang
commit b399151cb48db30ad1e0e93dd40d68c6d007b637 upstream. x86_mask is a confusing name which is hard to associate with the processor's stepping. Additionally, correct an indent issue in lib/cpu.c. Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com> [ Updated it to more recent kernels. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16EDAC, octeon: Fix an uninitialized variable warningJames Hogan
commit 544e92581a2ac44607d7cc602c6b54d18656f56d upstream. Fix an uninitialized variable warning in the Octeon EDAC driver, as seen in MIPS cavium_octeon_defconfig builds since v4.14 with Codescape GNU Tools 2016.05-03: drivers/edac/octeon_edac-lmc.c In function ‘octeon_lmc_edac_poll_o2’: drivers/edac/octeon_edac-lmc.c:87:24: warning: ‘((long unsigned int*)&int_reg)[1]’ may \ be used uninitialized in this function [-Wmaybe-uninitialized] if (int_reg.s.sec_err || int_reg.s.ded_err) { ^ Iinitialise the whole int_reg variable to zero before the conditional assignments in the error injection case. Signed-off-by: James Hogan <jhogan@kernel.org> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-mips@linux-mips.org Fixes: 1bc021e81565 ("EDAC: Octeon: Add error injection support") Link: http://lkml.kernel.org/r/20171113161206.20990-1-james.hogan@mips.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-10EDAC, sb_edac: Fix missing break in switchGustavo A. R. Silva
[ Upstream commit a8e9b186f153a44690ad0363a56716e7077ad28c ] Add missing break statement in order to prevent the code from falling through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20171016174029.GA19757@embeddedor.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-21EDAC, sb_edac: Don't create a second memory controller if HA1 is not presentQiuxu Zhuo
commit 15cc3ae001873845b5d842e212478a6570c7d938 upstream. Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3) server (DELL PowerEdge 730xd): EDAC sbridge: Some needed devices are missing EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0 EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0 EDAC sbridge: Couldn't find mci handler EDAC sbridge: Couldn't find mci handler EDAC sbridge: Failed to register device with error -19. The refactored sb_edac driver creates the IMC1 (the 2nd memory controller) if any IMC1 device is present. In this case only HA1_TA of IMC1 was present, but the driver expected to find HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure. The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi Zhang inserted one DIMM per channel for each CPU, and did random error address injection test with this patch: 4024 addresses fell in TOLM hole area 12715 addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 12774 addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0 12798 addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 12913 addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 12674 addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 12686 addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0 12882 addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0 12934 addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 106400 addresses were injected totally. The test result shows that all the 4 channels belong to IMC0 per CPU, so the server really only has one IMC per CPU. In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3' implements either one or two IMCs. For CPUs with one IMC, IMC1 is not used and should be ignored. Thus, do not create a second memory controller if the key HA1 is absent. [1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz [2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf Reported-and-tested-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-21EDAC, mce_amd: Get rid of local var in amd_filter_mce()Borislav Petkov
... and use the macro for that. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-21EDAC, mce_amd: Get rid of most struct cpuinfo_x86 usesBorislav Petkov
struct mce.cpuid contains CPUID(1).EAX which contains family, model and stepping and thus has enough information for our purposes. Thus get rid of some external dependencies which are not really needed. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-21EDAC, mce_amd: Rename decode_smca_errors() to decode_smca_error()Borislav Petkov
Singular fits better because it decodes a single error. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-20EDAC: Make device_type constBhumika Goyal
Make these const as they are only stored in the type field of a device structure, which is const. Done using Coccinelle. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1503130946-2854-2-git-send-email-bhumirks@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-19EDAC, pnd2: Properly toggle hidden state for P2SB PCI deviceQiuxu Zhuo
Properly handle hidden state of P2SB PCI device (DEV:D, FUN:0) for Apollo Lake. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170814154905.21707-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-19EDAC, pnd2: Conditionally unhide/hide the P2SB PCI device to read BARQiuxu Zhuo
On Deverton server, the P2SB PCI device (DEV:1F, FUN:1) is used by multiple device drivers. If it's hidden by some device driver (e.g. with the i801 I2C driver, the commit 9424693035a5 ("i2c: i801: Create iTCO device on newer Intel PCHs") unconditionally hid the P2SB PCI device wrongly) it will make the pnd2_edac driver read out an invalid BAR value of 0xffffffff and then fail on ioremap(). Therefore, store the presence state of P2SB PCI device before unhiding it for reading BAR and restore the presence state after reading BAR. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-i2c@vger.kernel.org Link: http://lkml.kernel.org/r/20170814154845.21663-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-19EDAC, pnd2: Mask off the lower four bits of a BARQiuxu Zhuo
Bit[0] of BAR is always zero. Bit[2:1] and bit[3] of BAR contain the information of 'type' and the 'prefetchable' accordingly. Therefore, mask the lower four bits to retrieve the actual base address of a BAR. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170814154813.21619-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-18EDAC, thunderx: Fix error handling path in thunderx_lmc_probe()Christophe JAILLET
Return the proper error value if ioremap() fails (and not 0). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-mips@linux-mips.org Link: http://lkml.kernel.org/r/20170816045821.14165-1-christophe.jaillet@wanadoo.fr [ Massage commit message, remove newline. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-18EDAC, altera: Fix error handling path in altr_edac_device_probe()Christophe JAILLET
Return the proper error value if devm_ioremap() fails (and not 0). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Thor Thayer <thor.thayer@linux.intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170816050506.14541-1-christophe.jaillet@wanadoo.fr [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-04EDAC, pnd2: Build in a minimal sideband driver for Apollo LakeTony Luck
I've been waing a long time for the generic sideband driver to appear. Patience has run out, so include the minimum here to just read registers. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Patrick Geary <patrickg@supermicro.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170803210536.5662-1-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-02EDAC, sb_edac: Classify memory mirroring modesQiuxu Zhuo
Basically, there are full memory mirroring and address range partial memory mirroring (supported by Haswell EX and Broadwell EX) modes. a) In full memory mirroring, the memory behind each memory controller is mirrored, i.e. the memory is split into two identical mirrors (primary and secondary), half of the memory is reserved for redundancy. b) In address range partial memory mirroring, the memory size (range) of primary and secondary behind each memory controller can be user defined by the TAD0 register. The rest of memory ranges defined by TAD1/TAD2/... in that memory controller are non-mirrored. For more detail on memory mirroring, see the following link written by Tony Luck: https://01.org/lkp/blogs/tonyluck/2016/address-range-partial-memory-mirroring-linux Currently the sb_edac driver only supports address decoding in full memory mirroring and non-mirroring modes. In address range partial memory mirroring mode, it may fail to decode an address that falls in a non-mirroring area (the following was one of this kind of failed logs). mce: Uncorrected hardware memory error in user-access at 566d53a400 Memory failure: 0x566d53a: Killing einj_mem_uc:4647 due to hardware memory corruption Memory failure: 0x566d53a: recovery action for dirty LRU page: Recovered mce: [Hardware Error]: Machine check events logged EDAC sbridge MC1: HANDLING MCE MEMORY ERROR EDAC sbridge MC1: CPU 48: Machine Check Event: 0 Bank 7: ec00000000010090 EDAC sbridge MC1: TSC 4b914aa5a99dab EDAC sbridge MC1: ADDR 566d53a400 EDAC sbridge MC1: MISC 1443a0c86 EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1499712764 SOCKET 2 APIC 80 EDAC MC1: 0 UE Can't discover the memory rank for ch addr 0x7fb54e900 on any memory ( page:0x0 offset:0x0 grain:32) mce: [Hardware Error]: Machine check events logged Therefore, classify memory mirroring modes and make the address decoding in address range partial memory mode correct. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170730180651.30060-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-19EDAC, cpc925, ppc4xx: Convert to using %pOF instead of full_nameRob Herring
Now that we have a custom printf format specifier, convert users of full_name to use %pOF instead. This is preparation to remove storing of the full path string for each node. Signed-off-by: Rob Herring <robh@kernel.org> Cc: devicetree@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170718214339.7774-19-robh@kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-17EDAC: Get rid of mci->mod_verBorislav Petkov
It is a write-only variable so get rid of it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Robert Richter <rric@kernel.org> Acked-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Thor Thayer <thor.thayer@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Loc Ho <lho@apm.com> Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org
2017-07-17EDAC: Constify attribute_group structuresArvind Yadav
attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> CC: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/776cb8265509054abd01b0b551624cc0da3b88e7.1499078335.git.arvind.yadav.cs@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-17EDAC, mce_amd: Use cpu_to_node() to find the node IDYazen Ghannam
Using the homegrown amd_get_nb_id() to find a node ID on AMD was fine while the L3 to node mapping was 1:1. And Zen topology broke this. So let's start slowly moving away from it and use the topology interfaces instead. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1490041614-90057-2-git-send-email-Yazen.Ghannam@amd.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29EDAC, pnd2: Fix Apollo Lake DIMM detectionTony Luck
Non-existent or empty DIMM slots result in error return from RD_REGP(). But we shouldn't give up on failure. So long as we find at least one DIMM we can continue. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170628234407.21521-1-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29EDAC, i5000, i5400: Fix definition of NRECMEMB registerJérémy Lefaure
In the i5000 and i5400 drivers, the NRECMEMB register is defined as a 16-bit value, which results in wrong shifts in the code, as reported by sparse. In the datasheets ([1], section 3.9.22.20 and [2], section 3.9.22.21), this register is a 32-bit register. A u32 value for the register fixes the wrong shifts warnings and matches the datasheet. Also fix the mask to access to the CAS bits [27:16] in the i5000 driver. [1]: https://www.intel.com/content/dam/doc/datasheet/5000p-5000v-5000z-chipset-memory-controller-hub-datasheet.pdf [2]: https://www.intel.se/content/dam/doc/datasheet/5400-chipset-memory-controller-hub-datasheet.pdf Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170629005729.8478-1-jeremy.lefaure@lse.epita.fr Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-26EDAC, pnd2: Make function sbi_send() staticColin Ian King
The function sbi_send() is local to just pnd2_edac.c and does not need to be in global scope, so make it static. Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170623084855.9197-1-colin.king@canonical.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-23EDAC, pnd2: Return proper error value from apl_rd_reg()Gustavo A. R. Silva
Add code comment to make it clear that the fall-through is intentional and, OR ret with its previous value to avoid overwriting it so that callers can check the correct return value. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170622220535.GA4896@embeddedgus [ Massage a bit. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-14EDAC, altera: Simplify calculation of total memoryChris Packham
Use of_address_to_resource() and resource_size() instead of manually parsing the "reg" property from the "memory" node(s). Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Tested-by: Thor Thayer <thor.thayer@linux.intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170606235500.22772-3-chris.packham@alliedtelesis.co.nz Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-14EDAC, sb_edac: Avoid creating SOCK memory controllerQiuxu Zhuo
Xiaolong Ye reported the following failure on Broadwell D server: EDAC sbridge: Some needed devices are missing EDAC MC: Removed device 0 for sbridge_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0 EDAC sbridge: Couldn't find mci handler EDAC sbridge: Failed to register device with error -19. Broadwell D (only IMC0 per socket) and Broadwell X (IMC0 and IMC1 per socket) use the same PCI device IDs for IMC0 per socket, then they share pci_dev_descr_broadwell_table (n_imcs_per_sock=2). In this case, Broadwell D wrongly creates the nonexistent SOCK EDAC memory controller and reports above error messages, since it has no IMC1 per socket. Avoid creating the nonexistent SOCK memory controller. Reported-and-tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170608113351.25323-1-qiuxu.zhuo@intel.com [ Massage. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-12EDAC, mce_amd: Fix typo in SMCA error descriptionYazen Ghannam
Fix typo in "poison consumption" error description. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1497286703-62853-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-09EDAC, mv64x60: Sanity check edac_op_state before registeringChris Packham
edac_op_state is a module parameter which affects the behaviour of the driver probe which can potentially be invoked as soon as the platform driver registration happens. Because of this we need to ensure that we sanity check the module parameter before calling platform_register_drivers(). Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170607215530.8604-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Borislav Petkov <bp@suse.de>