summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2015-05-17uas: Add US_FL_MAX_SECTORS_240 flagHans de Goede
[ Upstream commit ee136af4a064c2f61e2025873584d2c7ec93f4ae ] The usb-storage driver sets max_sectors = 240 in its scsi-host template, for uas we do not want to do that for all devices, but testing has shown that some devices need it. This commit adds a US_FL_MAX_SECTORS_240 flag for such devices, and implements support for it in uas.c, while at it it also adds support for US_FL_MAX_SECTORS_64 to uas.c. Cc: stable@vger.kernel.org # 3.16 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ASoC: dapm: Enable autodisable on SOC_DAPM_SINGLE_TLV_AUTODISABLECharles Keepax
[ Upstream commit a2d97723cb3a7741af81868427b36bba274b681b ] Correct small copy and paste error where autodisable was not being enabled for the SOC_DAPM_SINGLE_TLV_AUTODISABLE control. Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ALSA: emu10k1: Emu10k2 32 bit DMA modePeter Zubaj
[ Upstream commit 7241ea558c6715501e777396b5fc312c372e11d9 ] Looks like audigy emu10k2 (probably emu10k1 - sb live too) support two modes for DMA. Second mode is useful for 64 bit os with more then 2 GB of ram (fixes problems with big soundfont loading) 1) 32MB from 2 GB address space using 8192 pages (used now as default) 2) 16MB from 4 GB address space using 4096 pages Mode is set using HCFG_EXPANDED_MEM flag in HCFG register. Also format of emu10k2 page table is then different. Signed-off-by: Peter Zubaj <pzubaj@marticonet.sk> Tested-by: Takashi Iwai <tiwai@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17mm/hugetlb: take page table lock in follow_huge_pmd()Naoya Horiguchi
[ Upstream commit e66f17ff71772b209eed39de35aaa99ba819c93d ] We have a race condition between move_pages() and freeing hugepages, where move_pages() calls follow_page(FOLL_GET) for hugepages internally and tries to get its refcount without preventing concurrent freeing. This race crashes the kernel, so this patch fixes it by moving FOLL_GET code for hugepages into follow_huge_pmd() with taking the page table lock. This patch intentionally removes page==NULL check after pte_page. This is justified because pte_page() never returns NULL for any architectures or configurations. This patch changes the behavior of follow_huge_pmd() for tail pages and then tail pages can be pinned/returned. So the caller must be changed to properly handle the returned tail pages. We could have a choice to add the similar locking to follow_huge_(addr|pud) for consistency, but it's not necessary because currently these functions don't support FOLL_GET flag, so let's leave it for future development. Here is the reproducer: $ cat movepages.c #include <stdio.h> #include <stdlib.h> #include <numaif.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 #define PS 0x1000 int main(int argc, char *argv[]) { int i; int nr_hp = strtol(argv[1], NULL, 0); int nr_p = nr_hp * HPS / PS; int ret; void **addrs; int *status; int *nodes; pid_t pid; pid = strtol(argv[2], NULL, 0); addrs = malloc(sizeof(char *) * nr_p + 1); status = malloc(sizeof(char *) * nr_p + 1); nodes = malloc(sizeof(char *) * nr_p + 1); while (1) { for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 1; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 0; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); } return 0; } $ cat hugepage.c #include <stdio.h> #include <sys/mman.h> #include <string.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 int main(int argc, char *argv[]) { int nr_hp = strtol(argv[1], NULL, 0); char *p; while (1) { p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (p != (void *)ADDR_INPUT) { perror("mmap"); break; } memset(p, 0, nr_hp * HPS); munmap(p, nr_hp * HPS); } } $ sysctl vm.nr_hugepages=40 $ ./hugepage 10 & $ ./movepages 10 $(pgrep -f hugepage) Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ebpf: verifier: check that call reg with ARG_ANYTHING is initializedDaniel Borkmann
[ Upstream commit 80f1d68ccba70b1060c9c7360ca83da430f66bed ] I noticed that a helper function with argument type ARG_ANYTHING does not need to have an initialized value (register). This can worst case lead to unintented stack memory leakage in future helper functions if they are not carefully designed, or unintended application behaviour in case the application developer was not careful enough to match a correct helper function signature in the API. The underlying issue is that ARG_ANYTHING should actually be split into two different semantics: 1) ARG_DONTCARE for function arguments that the helper function does not care about (in other words: the default for unused function arguments), and 2) ARG_ANYTHING that is an argument actually being used by a helper function and *guaranteed* to be an initialized register. The current risk is low: ARG_ANYTHING is only used for the 'flags' argument (r4) in bpf_map_update_elem() that internally does strict checking. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ACPICA: Utilities: split IO address types from data type models.Lv Zheng
[ Upstream commit 2b8760100e1de69b6ff004c986328a82947db4ad ] ACPICA commit aacf863cfffd46338e268b7415f7435cae93b451 It is reported that on a physically 64-bit addressed machine, 32-bit kernel can trigger crashes in accessing the memory regions that are beyond the 32-bit boundary. The region field's start address should still be 32-bit compliant, but after a calculation (adding some offsets), it may exceed the 32-bit boundary. This case is rare and buggy, but there are real BIOSes leaked with such issues (see References below). This patch fixes this gap by always defining IO addresses as 64-bit, and allows OSPMs to optimize it for a real 32-bit machine to reduce the size of the internal objects. Internal acpi_physical_address usages in the structures that can be fixed by this change include: 1. struct acpi_object_region: acpi_physical_address address; 2. struct acpi_address_range: acpi_physical_address start_address; acpi_physical_address end_address; 3. struct acpi_mem_space_context; acpi_physical_address address; 4. struct acpi_table_desc acpi_physical_address address; See known issues 1 for other usages. Note that acpi_io_address which is used for ACPI_PROCESSOR may also suffer from same problem, so this patch changes it accordingly. For iasl, it will enforce acpi_physical_address as 32-bit to generate 32-bit OSPM compatible tables on 32-bit platforms, we need to define ACPI_32BIT_PHYSICAL_ADDRESS for it in acenv.h. Known issues: 1. Cleanup of mapped virtual address In struct acpi_mem_space_context, acpi_physical_address is used as a virtual address: acpi_physical_address mapped_physical_address; It is better to introduce acpi_virtual_address or use acpi_size instead. This patch doesn't make such a change. Because this should be done along with a change to acpi_os_map_memory()/acpi_os_unmap_memory(). There should be no functional problem to leave this unchanged except that only this structure is enlarged unexpectedly. Link: https://github.com/acpica/acpica/commit/aacf863c Reference: https://bugzilla.kernel.org/show_bug.cgi?id=87971 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=79501 Reported-and-tested-by: Paul Menzel <paulepanter@users.sourceforge.net> Reported-and-tested-by: Sial Nije <sialnije@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17target: Fix COMPARE_AND_WRITE with SG_TO_MEM_NOALLOC handlingNicholas Bellinger
[ Upstream commit c8e639852ad720499912acedfd6b072325fd2807 ] This patch fixes a bug for COMPARE_AND_WRITE handling with fabrics using SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC. It adds the missing allocation for cmd->t_bidi_data_sg within transport_generic_new_cmd() that is used by COMPARE_AND_WRITE for the initial READ payload, even if the fabric is already providing a pre-allocated buffer for cmd->t_data_sg. Also, fix zero-length COMPARE_AND_WRITE handling within the compare_and_write_callback() and target_complete_ok_work() to queue the response, skipping the initial READ. This fixes COMPARE_AND_WRITE emulation with loopback, vhost, and xen-backend fabric drivers using SG_TO_MEM_NOALLOC. Reported-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v3.12+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17usb: define a generic USB_RESUME_TIMEOUT macroFelipe Balbi
[ Upstream commit 62f0342de1f012f3e90607d39e20fce811391169 ] Every USB Host controller should use this new macro to define for how long resume signalling should be driven on the bus. Currently, almost every single USB controller is using a 20ms timeout for resume signalling. That's problematic for two reasons: a) sometimes that 20ms timer expires a little before 20ms, which makes us fail certification b) some (many) devices actually need more than 20ms resume signalling. Sure, in case of (b) we can state that the device is against the USB spec, but the fact is that we have no control over which device the certification lab will use. We also have no control over which host they will use. Most likely they'll be using a Windows PC which, again, we have no control over how that USB stack is written and how long resume signalling they are using. At the end of the day, we must make sure Linux passes electrical compliance when working as Host or as Device and currently we don't pass compliance as host because we're driving resume signallig for exactly 20ms and that confuses certification test setup resulting in Certification failure. Cc: <stable@vger.kernel.org> # v3.10+ Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11net: fix crash in build_skb()Eric Dumazet
[ Upstream commit 2ea2f62c8bda242433809c7f4e9eae1c52c40bbe ] When I added pfmemalloc support in build_skb(), I forgot netlink was using build_skb() with a vmalloc() area. In this patch I introduce __build_skb() for netlink use, and build_skb() is a wrapper handling both skb->head_frag and skb->pfmemalloc This means netlink no longer has to hack skb->head_frag [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26! [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 1567.700067] Dumping ftrace buffer: [ 1567.700067] (ftrace buffer empty) [ 1567.700067] Modules linked in: [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167 [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000 [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3)) [ 1567.700067] RSP: 0018:ffff8802467779d8 EFLAGS: 00010202 [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049 [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000 [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000 [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000 [ 1567.700067] FS: 00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000 [ 1567.700067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0 [ 1567.700067] Stack: [ 1567.700067] ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000 [ 1567.700067] ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08 [ 1567.700067] ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821 [ 1567.700067] Call Trace: [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316) [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329) [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623) [ 1567.774369] sock_write_iter (net/socket.c:823) [ 1567.774369] ? sock_sendmsg (net/socket.c:806) [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491) [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249) [ 1567.774369] ? default_llseek (fs/read_write.c:487) [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701) [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4)) [ 1567.774369] vfs_write (fs/read_write.c:539) [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577) [ 1567.774369] ? SyS_read (fs/read_write.c:577) [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636) [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42) [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261) Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11net: add skb_checksum_complete_unsetTom Herbert
[ Upstream commit 4e18b9adf2f910ec4d30b811a74a5b626e6c6125 ] This function changes ip_summed to CHECKSUM_NONE if CHECKSUM_COMPLETE is set. This is called to discard checksum-complete when packet is being modified and checksum is not pulled for headers in a layer. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11arm/arm64: KVM: Keep elrsr/aisr in sync with software modelChristoffer Dall
commit ae705930fca6322600690df9dc1c7d0516145a93 upstream. There is an interesting bug in the vgic code, which manifests itself when the KVM run loop has a signal pending or needs a vmid generation rollover after having disabled interrupts but before actually switching to the guest. In this case, we flush the vgic as usual, but we sync back the vgic state and exit to userspace before entering the guest. The consequence is that we will be syncing the list registers back to the software model using the GICH_ELRSR and GICH_EISR from the last execution of the guest, potentially overwriting a list register containing an interrupt. This showed up during migration testing where we would capture a state where the VM has masked the arch timer but there were no interrupts, resulting in a hung test. Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Alex Bennee <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11arm/arm64: KVM: Require in-kernel vgic for the arch timersChristoffer Dall
commit 05971120fca43e0357789a14b3386bb56eef2201 upstream. It is curently possible to run a VM with architected timers support without creating an in-kernel VGIC, which will result in interrupts from the virtual timer going nowhere. To address this issue, move the architected timers initialization to the time when we run a VCPU for the first time, and then only initialize (and enable) the architected timers if we have a properly created and initialized in-kernel VGIC. When injecting interrupts from the virtual timer to the vgic, the current setup should ensure that this never calls an on-demand init of the VGIC, which is the only call path that could return an error from kvm_vgic_inject_irq(), so capture the return value and raise a warning if there's an error there. We also change the kvm_timer_init() function from returning an int to be a void function, since the function always succeeds. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11arm/arm64: KVM: vgic: move reset initialization into vgic_init_maps()Peter Maydell
commit 6d3cfbe21bef5b66530b50ad16c88fdc71a04c35 upstream. VGIC initialization currently happens in three phases: (1) kvm_vgic_create() (triggered by userspace GIC creation) (2) vgic_init_maps() (triggered by userspace GIC register read/write requests, or from kvm_vgic_init() if not already run) (3) kvm_vgic_init() (triggered by first VM run) We were doing initialization of some state to correspond with the state of a freshly-reset GIC in kvm_vgic_init(); this is too late, since it will overwrite changes made by userspace using the register access APIs before the VM is run. Move this initialization earlier, into the vgic_init_maps() phase. This fixes a bug where QEMU could successfully restore a saved VM state snapshot into a VM that had already been run, but could not restore it "from cold" using the -loadvm command line option (the symptoms being that the restored VM would run but interrupts were ignored). Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to kvm_vgic_map_resources. [ This patch is originally written by Peter Maydell, but I have modified it somewhat heavily, renaming various bits and moving code around. If something is broken, I am to be blamed. - Christoffer ] Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11kvm: add a memslot flag for incoherent memory regionsArd Biesheuvel
commit 1050dcda3052912984b26fb6d2695a3f41792000 upstream. Memory regions may be incoherent with the caches, typically when the guest has mapped a host system RAM backed memory region as uncached. Add a flag KVM_MEMSLOT_INCOHERENT so that we can tag these memslots and handle them appropriately when mapping them. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27usbnet: Fix tx_bytes statistic running backward in cdc_ncmBen Hutchings
[ Upstream commit 7a1e890e2168e33fb62d84528e996b8b4b478fea ] cdc_ncm disagrees with usbnet about how much framing overhead should be counted in the tx_bytes statistics, and tries 'fix' this by decrementing tx_bytes on the transmit path. But statistics must never be decremented except due to roll-over; this will thoroughly confuse user-space. Also, tx_bytes is only incremented by usbnet in the completion path. Fix this by requiring drivers that set FLAG_MULTI_FRAME to set a tx_bytes delta along with the tx_packets count. Fixes: beeecd42c3b4 ("net: cdc_ncm/cdc_mbim: adding NCM protocol statistics") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27usbnet: Fix tx_packets stat for FLAG_MULTI_FRAME driversBen Hutchings
[ Upstream commit 1e9e39f4a29857a396ac7b669d109f697f66695e ] Currently the usbnet core does not update the tx_packets statistic for drivers with FLAG_MULTI_PACKET and there is no hook in the TX completion path where they could do this. cdc_ncm and dependent drivers are bumping tx_packets stat on the transmit path while asix and sr9800 aren't updating it at all. Add a packet count in struct skb_data so these drivers can fill it in, initialise it to 1 for other drivers, and add the packet count to the tx_packets statistic on completion. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27vlan: introduce *vlan_hwaccel_push_inside helpersJiri Pirko
[ Upstream commit 5968250c868ceee680aa77395b24e6ddcae17d36 ] Use them to push skb->vlan_tci into the payload and avoid code duplication. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27vlan: rename __vlan_put_tag to vlan_insert_tag_set_protoJiri Pirko
[ Upstream commit 62749e2cb3c4a7da3eaa5c01a7e787aebeff8536 ] Name fits better. Plus there's going to be introduced __vlan_insert_tag later on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27vlan: kill vlan_put_tag helperJiri Pirko
[ Upstream commit b4bef1b57544b18899eb15569e3bafd8d2eeeff6 ] Since both tx and rx paths work with skb->vlan_tci, there's no need for this function anymore. Switch users directly to __vlan_hwaccel_put_tag. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27ipv6: protect skb->sk accesses from recursive dereference inside the stackhannes@stressinduktion.org
[ Upstream commit f60e5990d9c1424af9dbca60a23ba2a1c7c1ce90 ] We should not consult skb->sk for output decisions in xmit recursion levels > 0 in the stack. Otherwise local socket settings could influence the result of e.g. tunnel encapsulation process. ipv6 does not conform with this in three places: 1) ip6_fragment: we do consult ipv6_npinfo for frag_size 2) sk_mc_loop in ipv6 uses skb->sk and checks if we should loop the packet back to the local socket 3) ip6_skb_dst_mtu could query the settings from the user socket and force a wrong MTU Furthermore: In sk_mc_loop we could potentially land in WARN_ON(1) if we use a PF_PACKET socket ontop of an IPv6-backed vxlan device. Reuse xmit_recursion as we are currently only interested in protecting tunnel devices. Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)Christian Borntraeger
[ Upstream commit 43239cbe79fc369f5d2160bd7f69e28b5c50a58c ] Feedback has shown that WRITE_ONCE(x, val) is easier to use than ASSIGN_ONCE(val,x). There are no in-tree users yet, so lets change it for 3.19. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-26kernel: Provide READ_ONCE and ASSIGN_ONCEChristian Borntraeger
[ Upstream commit 230fa253df6352af12ad0a16128760b5cb3f92df ] ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via scalar types as suggested by Linus Torvalds. Accesses larger than the machines word size cannot be guaranteed to be atomic. These macros will use memcpy and emit a build warning. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-24cpuidle: remove state_count field from struct cpuidle_deviceBartlomiej Zolnierkiewicz
[ Upstream commit d75e4af14e228bbe3f86e29bcecb8e6be98d4e04 ] Thomas Schlichter reports the following issue on his Samsung NC20: "The C-states C1 and C2 to the OS when connected to AC, and additionally provides the C3 C-state when disconnected from AC. However, the number of C-states shown in sysfs is fixed to the number of C-states present at boot. If I boot with AC connected, I always only see the C-states up to C2 even if I disconnect AC. The reason is commit 130a5f692425 (ACPI / cpuidle: remove dev->state_count setting). It removes the update of dev->state_count, but sysfs uses exactly this variable to show the C-states. The fix is to use drv->state_count in sysfs. As this is currently the last user of dev->state_count, this variable can be completely removed." Remove dev->state_count as per the above. Reported-by: Thomas Schlichter <thomas.schlichter@web.de> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-24Defer processing of REQ_PREEMPT requests for blocked devicesBart Van Assche
[ Upstream commit bba0bdd7ad4713d82338bcd9b72d57e9335a664b ] SCSI transport drivers and SCSI LLDs block a SCSI device if the transport layer is not operational. This means that in this state no requests should be processed, even if the REQ_PREEMPT flag has been set. This patch avoids that a rescan shortly after a cable pull sporadically triggers the following kernel oops: BUG: unable to handle kernel paging request at ffffc9001a6bc084 IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib] Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100) Call Trace: [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp] [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp] [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod] [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod] [<ffffffff81223b37>] __blk_run_queue+0x27/0x30 [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110 [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0 [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod] [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod] [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod] [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod] [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod] [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod] [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod] [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod] [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160 [<ffffffff811589de>] vfs_write+0xce/0x140 [<ffffffff81158b53>] sys_write+0x53/0xa0 [<ffffffff81464592>] system_call_fastpath+0x16/0x1b [<00007f611c9d9300>] 0x7f611c9d92ff Reported-by: Max Gurtuvoy <maxg@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Cc: <stable@vger.kernel.org> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-24mm: prevent endless growth of anon_vma hierarchyKonstantin Khlebnikov
[ Upstream commit 7a3ef208e662f4b63d43a23f61a64a129c525bbc ] Constantly forking task causes unlimited grow of anon_vma chain. Each next child allocates new level of anon_vmas and links vma to all previous levels because pages might be inherited from any level. This patch adds heuristic which decides to reuse existing anon_vma instead of forking new one. It adds counter anon_vma->degree which counts linked vmas and directly descending anon_vmas and reuses anon_vma if counter is lower than two. As a result each anon_vma has either vma or at least two descending anon_vmas. In such trees half of nodes are leafs with alive vmas, thus count of anon_vmas is no more than two times bigger than count of vmas. This heuristic reuses anon_vmas as few as possible because each reuse adds false aliasing among vmas and rmap walker ought to scan more ptes when it searches where page is might be mapped. Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu Fixes: 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue") [akpm@linux-foundation.org: fix typo, per Rik] Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Daniel Forrest <dan.forrest@ssec.wisc.edu> Tested-by: Michal Hocko <mhocko@suse.cz> Tested-by: Jerome Marchand <jmarchan@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [2.6.34+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16regulator: palmas: Correct TPS659038 register definition for REGEN2Keerthy
[ Upstream commit e03826d5045e81a66a4fad7be9a8ecdaeb7911cf ] The register offset for REGEN2_CTRL in different for TPS659038 chip as when compared with other Palmas family PMICs. In the case of TPS659038 the wrong offset pointed to PLLEN_CTRL and was causing a hang. Correcting the same. Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16dm snapshot: suspend merging snapshot when doing exception handoverMikulas Patocka
[ Upstream commit 09ee96b21456883e108c3b00597bb37ec512151b ] The "dm snapshot: suspend origin when doing exception handover" commit fixed a exception store handover bug associated with pending exceptions to the "snapshot-origin" target. However, a similar problem exists in snapshot merging. When snapshot merging is in progress, we use the target "snapshot-merge" instead of "snapshot-origin". Consequently, during exception store handover, we must find the snapshot-merge target and suspend its associated mapped_device. To avoid lockdep warnings, the target must be suspended and resumed without holding _origins_lock. Introduce a dm_hold() function that grabs a reference on a mapped_device, but unlike dm_get(), it doesn't crash if the device has the DMF_FREEING flag set, it returns an error in this case. In snapshot_resume() we grab the reference to the origin device using dm_hold() while holding _origins_lock (_origins_lock guarantees that the device won't disappear). Then we release _origins_lock, suspend the device and grab _origins_lock again. NOTE to stable@ people: When backporting to kernels 3.18 and older, use dm_internal_suspend and dm_internal_resume instead of dm_internal_suspend_fast and dm_internal_resume_fast. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16regmap: introduce regmap_name to fix syscon regmap trace eventsPhilipp Zabel
[ Upstream commit c6b570d97c0e77f570bb6b2ed30d372b2b1e9aae ] This patch fixes a NULL pointer dereference when enabling regmap event tracing in the presence of a syscon regmap, introduced by commit bdb0066df96e ("mfd: syscon: Decouple syscon interface from platform devices"). That patch introduced syscon regmaps that have their dev field set to NULL. The regmap trace events expect it to point to a valid struct device and feed it to dev_name(): $ echo 1 > /sys/kernel/debug/tracing/events/regmap/enable Unable to handle kernel NULL pointer dereference at virtual address 0000002c pgd = 80004000 [0000002c] *pgd=00000000 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: coda videobuf2_vmalloc CPU: 0 PID: 304 Comm: kworker/0:2 Not tainted 4.0.0-rc2+ #9197 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Workqueue: events_freezable thermal_zone_device_check task: 9f25a200 ti: 9f1ee000 task.ti: 9f1ee000 PC is at ftrace_raw_event_regmap_block+0x3c/0xe4 LR is at _regmap_raw_read+0x1bc/0x1cc pc : [<803636e8>] lr : [<80365f2c>] psr: 600f0093 sp : 9f1efd78 ip : 9f1efdb8 fp : 9f1efdb4 r10: 00000004 r9 : 00000001 r8 : 00000001 r7 : 00000180 r6 : 00000000 r5 : 9f00e3c0 r4 : 00000003 r3 : 00000001 r2 : 00000180 r1 : 00000000 r0 : 9f00e3c0 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 2d91004a DAC: 00000015 Process kworker/0:2 (pid: 304, stack limit = 0x9f1ee210) Stack: (0x9f1efd78 to 0x9f1f0000) fd60: 9f1efda4 9f1efd88 fd80: 800708c0 805f9510 80927140 800f0013 9f1fc800 9eb2f490 00000000 00000180 fda0: 808e3840 00000001 9f1efdfc 9f1efdb8 80365f2c 803636b8 805f8958 800708e0 fdc0: a00f0013 803636ac 9f16de00 00000180 80927140 9f1fc800 9f1fc800 9f1efe6c fde0: 9f1efe6c 9f732400 00000000 00000000 9f1efe1c 9f1efe00 80365f70 80365d7c fe00: 80365f3c 9f1fc800 9f1fc800 00000180 9f1efe44 9f1efe20 803656a4 80365f48 fe20: 9f1fc800 00000180 9f1efe6c 9f1efe6c 9f732400 00000000 9f1efe64 9f1efe48 fe40: 803657bc 80365634 00000001 9e95f910 9f1fc800 9f1efeb4 9f1efe8c 9f1efe68 fe60: 80452ac0 80365778 9f1efe8c 9f1efe78 9e93d400 9e93d5e8 9f1efeb4 9f72ef40 fe80: 9f1efeac 9f1efe90 8044e11c 80452998 8045298c 9e93d608 9e93d400 808e1978 fea0: 9f1efecc 9f1efeb0 8044fd14 8044e0d0 ffffffff 9f25a200 9e93d608 9e481380 fec0: 9f1efedc 9f1efed0 8044fde8 8044fcec 9f1eff1c 9f1efee0 80038d50 8044fdd8 fee0: 9f1ee020 9f72ef40 9e481398 00000000 00000008 9f72ef54 9f1ee020 9f72ef40 ff00: 9e481398 9e481380 00000008 9f72ef40 9f1eff5c 9f1eff20 80039754 80038bfc ff20: 00000000 9e481380 80894100 808e1662 00000000 9e4f2ec0 00000000 9e481380 ff40: 800396f8 00000000 00000000 00000000 9f1effac 9f1eff60 8003e020 80039704 ff60: ffffffff 00000000 ffffffff 9e481380 00000000 00000000 9f1eff78 9f1eff78 ff80: 00000000 00000000 9f1eff88 9f1eff88 9e4f2ec0 8003df30 00000000 00000000 ffa0: 00000000 9f1effb0 8000eb60 8003df3c 00000000 00000000 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff Backtrace: [<803636ac>] (ftrace_raw_event_regmap_block) from [<80365f2c>] (_regmap_raw_read+0x1bc/0x1cc) r9:00000001 r8:808e3840 r7:00000180 r6:00000000 r5:9eb2f490 r4:9f1fc800 [<80365d70>] (_regmap_raw_read) from [<80365f70>] (_regmap_bus_read+0x34/0x6c) r10:00000000 r9:00000000 r8:9f732400 r7:9f1efe6c r6:9f1efe6c r5:9f1fc800 r4:9f1fc800 [<80365f3c>] (_regmap_bus_read) from [<803656a4>] (_regmap_read+0x7c/0x144) r6:00000180 r5:9f1fc800 r4:9f1fc800 r3:80365f3c [<80365628>] (_regmap_read) from [<803657bc>] (regmap_read+0x50/0x70) r9:00000000 r8:9f732400 r7:9f1efe6c r6:9f1efe6c r5:00000180 r4:9f1fc800 [<8036576c>] (regmap_read) from [<80452ac0>] (imx_get_temp+0x134/0x1a4) r6:9f1efeb4 r5:9f1fc800 r4:9e95f910 r3:00000001 [<8045298c>] (imx_get_temp) from [<8044e11c>] (thermal_zone_get_temp+0x58/0x74) r7:9f72ef40 r6:9f1efeb4 r5:9e93d5e8 r4:9e93d400 [<8044e0c4>] (thermal_zone_get_temp) from [<8044fd14>] (thermal_zone_device_update+0x34/0xec) r6:808e1978 r5:9e93d400 r4:9e93d608 r3:8045298c [<8044fce0>] (thermal_zone_device_update) from [<8044fde8>] (thermal_zone_device_check+0x1c/0x20) r5:9e481380 r4:9e93d608 [<8044fdcc>] (thermal_zone_device_check) from [<80038d50>] (process_one_work+0x160/0x3d4) [<80038bf0>] (process_one_work) from [<80039754>] (worker_thread+0x5c/0x4f4) r10:9f72ef40 r9:00000008 r8:9e481380 r7:9e481398 r6:9f72ef40 r5:9f1ee020 r4:9f72ef54 [<800396f8>] (worker_thread) from [<8003e020>] (kthread+0xf0/0x108) r10:00000000 r9:00000000 r8:00000000 r7:800396f8 r6:9e481380 r5:00000000 r4:9e4f2ec0 [<8003df30>] (kthread) from [<8000eb60>] (ret_from_fork+0x14/0x34) r7:00000000 r6:00000000 r5:8003df30 r4:9e4f2ec0 Code: e3140040 1a00001a e3140020 1a000016 (e596002c) ---[ end trace 193c15c2494ec960 ]--- Fixes: bdb0066df96e (mfd: syscon: Decouple syscon interface from platform devices) Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16sched/wait: Provide infrastructure to deal with nested blockingPeter Zijlstra
[ Upstream commit 61ada528dea028331e99e8ceaed87c683ad25de2 ] There are a few places that call blocking primitives from wait loops, provide infrastructure to support this without the typical task_struct::state collision. We record the wakeup in wait_queue_t::flags which leaves task_struct::state free to be used by others. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: tglx@linutronix.de Cc: ilya.dryomov@inktank.com Cc: umgwanakikbuti@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140924082242.051202318@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-28workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for ↵Tejun Heo
PREEMPT_NONE [ Upstream commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 ] cancel[_delayed]_work_sync() are implemented using __cancel_work_timer() which grabs the PENDING bit using try_to_grab_pending() and then flushes the work item with PENDING set to prevent the on-going execution of the work item from requeueing itself. try_to_grab_pending() can always grab PENDING bit without blocking except when someone else is doing the above flushing during cancelation. In that case, try_to_grab_pending() returns -ENOENT. In this case, __cancel_work_timer() currently invokes flush_work(). The assumption is that the completion of the work item is what the other canceling task would be waiting for too and thus waiting for the same condition and retrying should allow forward progress without excessive busy looping Unfortunately, this doesn't work if preemption is disabled or the latter task has real time priority. Let's say task A just got woken up from flush_work() by the completion of the target work item. If, before task A starts executing, task B gets scheduled and invokes __cancel_work_timer() on the same work item, its try_to_grab_pending() will return -ENOENT as the work item is still being canceled by task A and flush_work() will also immediately return false as the work item is no longer executing. This puts task B in a busy loop possibly preventing task A from executing and clearing the canceling state on the work item leading to a hang. task A task B worker executing work __cancel_work_timer() try_to_grab_pending() set work CANCELING flush_work() block for work completion completion, wakes up A __cancel_work_timer() while (forever) { try_to_grab_pending() -ENOENT as work is being canceled flush_work() false as work is no longer executing } This patch removes the possible hang by updating __cancel_work_timer() to explicitly wait for clearing of CANCELING rather than invoking flush_work() after try_to_grab_pending() fails with -ENOENT. Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com v3: bit_waitqueue() can't be used for work items defined in vmalloc area. Switched to custom wake function which matches the target work item and exclusive wait and wakeup. v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if the target bit waitqueue has wait_bit_queue's on it. Use DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu Vizoso. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rabin Vincent <rabin.vincent@axis.com> Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com> Cc: stable@vger.kernel.org Tested-by: Jesper Nilsson <jesper.nilsson@axis.com> Tested-by: Rabin Vincent <rabin.vincent@axis.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-28mmu_gather: move minimal range calculations into generic codeWill Deacon
[ Upstream commit fb7332a9fedfd62b1ba6530c86f39f0fa38afd49 ] On architectures with hardware broadcasting of TLB invalidation messages , it makes sense to reduce the range of the mmu_gather structure when unmapping page ranges based on the dirty address information passed to tlb_remove_tlb_entry. arm64 already does this by directly manipulating the start/end fields of the gather structure, but this confuses the generic code which does not expect these fields to change and can end up calculating invalid, negative ranges when forcing a flush in zap_pte_range. This patch moves the minimal range calculation out of the arm64 code and into the generic implementation, simplifying zap_pte_range in the process (which no longer needs to care about start/end, since they will point to the appropriate ranges already). With the range being tracked by core code, the need_flush flag is dropped in favour of checking that the end of the range has actually been set. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-23drm/i915/bdw: PCI IDs ending in 0xb are ULT.Rodrigo Vivi
commit 0dc6f20b9803f09726bbb682649d35cda8ef5b5d upstream. When reviewing patch that fixes VGA on BDW Halo Jani noticed that we also had other ULT IDs that weren't listed there. So this follow-up patch add these pci-ids as halo and fix comments on i915_pciids.h Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-14Revert "USB: serial: make bulk_out_size a lower limit"Johan Hovold
commit bc4b1f486fe69b86769e07c8edce472327a8462b upstream. This reverts commit 5083fd7bdfe6760577235a724cf6dccae13652c2. A bulk-out size smaller than the end-point size is indeed valid. The offending commit broke the usb-debug driver for EHCI debug devices, which use 8-byte buffers. Fixes: 5083fd7bdfe6 ("USB: serial: make bulk_out_size a lower limit") Reported-by: "Li, Elvin" <elvin.li@intel.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-14target: Fix PR_APTPL_BUF_LEN buffer size limitationNicholas Bellinger
commit f161d4b44d7cc1dc66b53365215227db356378b1 upstream. This patch addresses the original PR_APTPL_BUF_LEN = 8k limitiation for write-out of PR APTPL metadata that Martin has recently been running into. It changes core_scsi3_update_and_write_aptpl() to use vzalloc'ed memory instead of kzalloc, and increases the default hardcoded length to 256k. It also adds logic in core_scsi3_update_and_write_aptpl() to double the original length upon core_scsi3_update_aptpl_buf() failure, and retries until the vzalloc'ed buffer is large enough to accommodate the outgoing APTPL metadata. Reported-by: Martin Svec <martin.svec@zoner.cz> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-14mm: when stealing freepages, also take pages created by splitting buddy pageVlastimil Babka
commit 99592d598eca62bdbbf62b59941c189176dfc614 upstream. When studying page stealing, I noticed some weird looking decisions in try_to_steal_freepages(). The first I assume is a bug (Patch 1), the following two patches were driven by evaluation. Testing was done with stress-highalloc of mmtests, using the mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how often page stealing occurs for individual migratetypes, and what migratetypes are used for fallbacks. Arguably, the worst case of page stealing is when UNMOVABLE allocation steals from MOVABLE pageblock. RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal, so the goal is to minimize these two cases. The evaluation of v2 wasn't always clear win and Joonsoo questioned the results. Here I used different baseline which includes RFC compaction improvements from [1]. I found that the compaction improvements reduce variability of stress-highalloc, so there's less noise in the data. First, let's look at stress-highalloc configured to do sync compaction, and how these patches reduce page stealing events during the test. First column is after fresh reboot, other two are reiterations of test without reboot. That was all accumulater over 5 re-iterations (so the benchmark was run 5x3 times with 5 fresh restarts). Baseline: 3.19-rc4 3.19-rc4 3.19-rc4 5-nothp-1 5-nothp-2 5-nothp-3 Page alloc extfrag event 10264225 8702233 10244125 Extfrag fragmenting 10263271 8701552 10243473 Extfrag fragmenting for unmovable 13595 17616 15960 Extfrag fragmenting unmovable placed with movable 7989 12193 8447 Extfrag fragmenting for reclaimable 658 1840 1817 Extfrag fragmenting reclaimable placed with movable 558 1677 1679 Extfrag fragmenting for movable 10249018 8682096 10225696 With Patch 1: 3.19-rc4 3.19-rc4 3.19-rc4 6-nothp-1 6-nothp-2 6-nothp-3 Page alloc extfrag event 11834954 9877523 9774860 Extfrag fragmenting 11833993 9876880 9774245 Extfrag fragmenting for unmovable 7342 16129 11712 Extfrag fragmenting unmovable placed with movable 4191 10547 6270 Extfrag fragmenting for reclaimable 373 1130 923 Extfrag fragmenting reclaimable placed with movable 302 906 738 Extfrag fragmenting for movable 11826278 9859621 9761610 With Patch 2: 3.19-rc4 3.19-rc4 3.19-rc4 7-nothp-1 7-nothp-2 7-nothp-3 Page alloc extfrag event 4725990 3668793 3807436 Extfrag fragmenting 4725104 3668252 3806898 Extfrag fragmenting for unmovable 6678 7974 7281 Extfrag fragmenting unmovable placed with movable 2051 3829 4017 Extfrag fragmenting for reclaimable 429 1208 1278 Extfrag fragmenting reclaimable placed with movable 369 976 1034 Extfrag fragmenting for movable 4717997 3659070 3798339 With Patch 3: 3.19-rc4 3.19-rc4 3.19-rc4 8-nothp-1 8-nothp-2 8-nothp-3 Page alloc extfrag event 5016183 4700142 3850633 Extfrag fragmenting 5015325 4699613 3850072 Extfrag fragmenting for unmovable 1312 3154 3088 Extfrag fragmenting unmovable placed with movable 1115 2777 2714 Extfrag fragmenting for reclaimable 437 1193 1097 Extfrag fragmenting reclaimable placed with movable 330 969 879 Extfrag fragmenting for movable 5013576 4695266 3845887 In v2 we've seen apparent regression with Patch 1 for unmovable events, this is now gone, suggesting it was indeed noise. Here, each patch improves the situation for unmovable events. Reclaimable is improved by patch 1 and then either the same modulo noise, or perhaps sligtly worse - a small price for unmovable improvements, IMHO. The number of movable allocations falling back to other migratetypes is most noisy, but it's reduced to half at Patch 2 nevertheless. These are least critical as compaction can move them around. If we look at success rates, the patches don't affect them, that didn't change. Baseline: 3.19-rc4 3.19-rc4 3.19-rc4 5-nothp-1 5-nothp-2 5-nothp-3 Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%) Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%) Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%) Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%) Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%) Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%) Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%) Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%) Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%) Patch 1: 3.19-rc4 3.19-rc4 3.19-rc4 6-nothp-1 6-nothp-2 6-nothp-3 Success 1 Min 49.00 ( 0.00%) 44.00 ( 10.20%) 44.00 ( 10.20%) Success 1 Mean 51.80 ( 0.00%) 46.00 ( 11.20%) 45.80 ( 11.58%) Success 1 Max 54.00 ( 0.00%) 49.00 ( 9.26%) 49.00 ( 9.26%) Success 2 Min 58.00 ( 0.00%) 49.00 ( 15.52%) 48.00 ( 17.24%) Success 2 Mean 60.40 ( 0.00%) 51.80 ( 14.24%) 50.80 ( 15.89%) Success 2 Max 63.00 ( 0.00%) 54.00 ( 14.29%) 55.00 ( 12.70%) Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%) Success 3 Mean 85.00 ( 0.00%) 81.60 ( 4.00%) 79.80 ( 6.12%) Success 3 Max 86.00 ( 0.00%) 82.00 ( 4.65%) 82.00 ( 4.65%) Patch 2: 3.19-rc4 3.19-rc4 3.19-rc4 7-nothp-1 7-nothp-2 7-nothp-3 Success 1 Min 50.00 ( 0.00%) 44.00 ( 12.00%) 39.00 ( 22.00%) Success 1 Mean 52.80 ( 0.00%) 45.60 ( 13.64%) 42.40 ( 19.70%) Success 1 Max 55.00 ( 0.00%) 46.00 ( 16.36%) 47.00 ( 14.55%) Success 2 Min 52.00 ( 0.00%) 48.00 ( 7.69%) 45.00 ( 13.46%) Success 2 Mean 53.40 ( 0.00%) 49.80 ( 6.74%) 48.80 ( 8.61%) Success 2 Max 57.00 ( 0.00%) 52.00 ( 8.77%) 52.00 ( 8.77%) Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%) Success 3 Mean 85.00 ( 0.00%) 82.40 ( 3.06%) 79.60 ( 6.35%) Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%) Patch 3: 3.19-rc4 3.19-rc4 3.19-rc4 8-nothp-1 8-nothp-2 8-nothp-3 Success 1 Min 46.00 ( 0.00%) 44.00 ( 4.35%) 42.00 ( 8.70%) Success 1 Mean 50.20 ( 0.00%) 45.60 ( 9.16%) 44.00 ( 12.35%) Success 1 Max 52.00 ( 0.00%) 47.00 ( 9.62%) 47.00 ( 9.62%) Success 2 Min 53.00 ( 0.00%) 49.00 ( 7.55%) 48.00 ( 9.43%) Success 2 Mean 55.80 ( 0.00%) 50.60 ( 9.32%) 49.00 ( 12.19%) Success 2 Max 59.00 ( 0.00%) 52.00 ( 11.86%) 51.00 ( 13.56%) Success 3 Min 84.00 ( 0.00%) 80.00 ( 4.76%) 79.00 ( 5.95%) Success 3 Mean 85.40 ( 0.00%) 81.60 ( 4.45%) 80.40 ( 5.85%) Success 3 Max 87.00 ( 0.00%) 83.00 ( 4.60%) 82.00 ( 5.75%) While there's no improvement here, I consider reduced fragmentation events to be worth on its own. Patch 2 also seems to reduce scanning for free pages, and migrations in compaction, suggesting it has somewhat less work to do: Patch 1: Compaction stalls 4153 3959 3978 Compaction success 1523 1441 1446 Compaction failures 2630 2517 2531 Page migrate success 4600827 4943120 5104348 Page migrate failure 19763 16656 17806 Compaction pages isolated 9597640 10305617 10653541 Compaction migrate scanned 77828948 86533283 87137064 Compaction free scanned 517758295 521312840 521462251 Compaction cost 5503 5932 6110 Patch 2: Compaction stalls 3800 3450 3518 Compaction success 1421 1316 1317 Compaction failures 2379 2134 2201 Page migrate success 4160421 4502708 4752148 Page migrate failure 19705 14340 14911 Compaction pages isolated 8731983 9382374 9910043 Compaction migrate scanned 98362797 96349194 98609686 Compaction free scanned 496512560 469502017 480442545 Compaction cost 5173 5526 5811 As with v2, /proc/pagetypeinfo appears unaffected with respect to numbers of unmovable and reclaimable pageblocks. Configuring the benchmark to allocate like THP page fault (i.e. no sync compaction) gives much noisier results for iterations 2 and 3 after reboot. This is not so surprising given how [1] offers lower improvements in this scenario due to less restarts after deferred compaction which would change compaction pivot. Baseline: 3.19-rc4 3.19-rc4 3.19-rc4 5-thp-1 5-thp-2 5-thp-3 Page alloc extfrag event 8148965 6227815 6646741 Extfrag fragmenting 8147872 6227130 6646117 Extfrag fragmenting for unmovable 10324 12942 15975 Extfrag fragmenting unmovable placed with movable 5972 8495 10907 Extfrag fragmenting for reclaimable 601 1707 2210 Extfrag fragmenting reclaimable placed with movable 520 1570 2000 Extfrag fragmenting for movable 8136947 6212481 6627932 Patch 1: 3.19-rc4 3.19-rc4 3.19-rc4 6-thp-1 6-thp-2 6-thp-3 Page alloc extfrag event 8345457 7574471 7020419 Extfrag fragmenting 8343546 7573777 7019718 Extfrag fragmenting for unmovable 10256 18535 30716 Extfrag fragmenting unmovable placed with movable 6893 11726 22181 Extfrag fragmenting for reclaimable 465 1208 1023 Extfrag fragmenting reclaimable placed with movable 353 996 843 Extfrag fragmenting for movable 8332825 7554034 6987979 Patch 2: 3.19-rc4 3.19-rc4 3.19-rc4 7-thp-1 7-thp-2 7-thp-3 Page alloc extfrag event 3512847 3020756 2891625 Extfrag fragmenting 3511940 3020185 2891059 Extfrag fragmenting for unmovable 9017 6892 6191 Extfrag fragmenting unmovable placed with movable 1524 3053 2435 Extfrag fragmenting for reclaimable 445 1081 1160 Extfrag fragmenting reclaimable placed with movable 375 918 986 Extfrag fragmenting for movable 3502478 3012212 2883708 Patch 3: 3.19-rc4 3.19-rc4 3.19-rc4 8-thp-1 8-thp-2 8-thp-3 Page alloc extfrag event 3181699 3082881 2674164 Extfrag fragmenting 3180812 3082303 2673611 Extfrag fragmenting for unmovable 1201 4031 4040 Extfrag fragmenting unmovable placed with movable 974 3611 3645 Extfrag fragmenting for reclaimable 478 1165 1294 Extfrag fragmenting reclaimable placed with movable 387 985 1030 Extfrag fragmenting for movable 3179133 3077107 2668277 The improvements for first iteration are clear, the rest is much noisier and can appear like regression for Patch 1. Anyway, patch 2 rectifies it. Allocation success rates are again unaffected so there's no point in making this e-mail any longer. [1] http://marc.info/?l=linux-mm&m=142166196321125&w=2 This patch (of 3): When __rmqueue_fallback() is called to allocate a page of order X, it will find a page of order Y >= X of a fallback migratetype, which is different from the desired migratetype. With the help of try_to_steal_freepages(), it may change the migratetype (to the desired one) also of: 1) all currently free pages in the pageblock containing the fallback page 2) the fallback pageblock itself 3) buddy pages created by splitting the fallback page (when Y > X) These decisions take the order Y into account, as well as the desired migratetype, with the goal of preventing multiple fallback allocations that could e.g. distribute UNMOVABLE allocations among multiple pageblocks. Originally, decision for 1) has implied the decision for 3). Commit 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that (probably unintentionally) so that the buddy pages in case 3) are always changed to the desired migratetype, except for CMA pageblocks. Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and fix a bug") did some refactoring and added a comment that the case of 3) is intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type") removed the comment and tried to restore the original behavior where 1) implies 3), but due to the previous refactoring, the result is instead that only 2) implies 3) - and the conditions for 2) are less frequently met than conditions for 1). This may increase fragmentation in situations where the code decides to steal all free pages from the pageblock (case 1)), but then gives back the buddy pages produced by splitting. This patch restores the original intended logic where 1) implies 3). During testing with stress-highalloc from mmtests, this has shown to decrease the number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE pageblocks, which can lead to permanent fragmentation. In some cases it has increased the number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE pageblocks, but these are fixable by sync compaction and thus less harmful. Note that evaluation has shown that the behavior introduced by 47118af076f6 for buddy pages in case 3) is actually even better than the original logic, so the following patch will introduce it properly once again. For stable backports of this patch it makes thus sense to only fix versions containing 0cbef29a7821. [iamjoonsoo.kim@lge.com: tracepoint fix] Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-06quota: Store maximum space limit in bytesJan Kara
commit b10a08194c2b615955dfab2300331a90ae9344c7 upstream. Currently maximum space limit quota format supports is in blocks however since we store space limits in bytes, this is somewhat confusing. So store the maximum limit in bytes as well. Also rename the field to match the new unit and related inode field to match the new naming scheme. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06USB: add flag for HCDs that can't receive wakeup requests (isp1760-hcd)Alan Stern
commit 074f9dd55f9cab1b82690ed7e44bcf38b9616ce0 upstream. Currently the USB stack assumes that all host controller drivers are capable of receiving wakeup requests from downstream devices. However, this isn't true for the isp1760-hcd driver, which means that it isn't safe to do a runtime suspend of any device attached to a root-hub port if the device requires wakeup. This patch adds a "cant_recv_wakeups" flag to the usb_hcd structure and sets the flag in isp1760-hcd. The core is modified to prevent a direct child of the root hub from being put into runtime suspend with wakeup enabled if the flag is set. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06USB: don't cancel queued resets when unbinding driversAlan Stern
commit 524134d422316a59d5464ccbc12036bbe90c5563 upstream. The USB stack provides a mechanism for drivers to request an asynchronous device reset (usb_queue_reset_device()). The mechanism uses a work item (reset_ws) embedded in the usb_interface structure used by the driver, and the reset is carried out by a work queue routine. The asynchronous reset can race with driver unbinding. When this happens, we try to cancel the queued reset before unbinding the driver, on the theory that the driver won't care about any resets once it is unbound. However, thanks to the fact that lockdep now tracks work queue accesses, this can provoke a lockdep warning in situations where the device reset causes another interface's driver to be unbound; see http://marc.info/?l=linux-usb&m=141893165203776&w=2 for an example. The reason is that the work routine for reset_ws in one interface calls cancel_queued_work() for the reset_ws in another interface. Lockdep thinks this might lead to a work routine trying to cancel itself. The simplest solution is not to cancel queued resets when unbinding drivers. This means we now need to acquire a reference to the usb_interface when queuing a reset_ws work item and to drop the reference when the work routine finishes. We also need to make sure that the usb_interface structure doesn't outlive its parent usb_device; this means acquiring and dropping a reference when the interface is created and destroyed. In addition, cancelling a queued reset can fail (if the device is in the middle of an earlier reset), and this can cause usb_reset_device() to try to rebind an interface that has been deallocated (see http://marc.info/?l=linux-usb&m=142175717016628&w=2 for details). Acquiring the extra references prevents this failure. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk> Reported-by: Olivier Sobrie <olivier@sobrie.be> Tested-by: Olivier Sobrie <olivier@sobrie.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06usb: core: buffer: smallest buffer should start at ARCH_DMA_MINALIGNSebastian Andrzej Siewior
commit 5efd2ea8c9f4f12916ffc8ba636792ce052f6911 upstream. the following error pops up during "testusb -a -t 10" | musb-hdrc musb-hdrc.1.auto: dma_pool_free buffer-128, f134e000/be842000 (bad dma) hcd_buffer_create() creates a few buffers, the smallest has 32 bytes of size. ARCH_KMALLOC_MINALIGN is set to 64 bytes. This combo results in hcd_buffer_alloc() returning memory which is 32 bytes aligned and it might by identified by buffer_offset() as another buffer. This means the buffer which is on a 32 byte boundary will not get freed, instead it tries to free another buffer with the error message. This patch fixes the issue by creating the smallest DMA buffer with the size of ARCH_KMALLOC_MINALIGN (or 32 in case ARCH_KMALLOC_MINALIGN is smaller). This might be 32, 64 or even 128 bytes. The next three pools will have the size 128, 512 and 2048. In case the smallest pool is 128 bytes then we have only three pools instead of four (and zero the first entry in the array). The last pool size is always 2048 bytes which is the assumed PAGE_SIZE / 2 of 4096. I doubt it makes sense to continue using PAGE_SIZE / 2 where we would end up with 8KiB buffer in case we have 16KiB pages. Instead I think it makes sense to have a common size(s) and extend them if there is need to. There is a BUILD_BUG_ON() now in case someone has a minalign of more than 128 bytes. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06cipso: don't use IPCB() to locate the CIPSO IP optionPaul Moore
commit 04f81f0154e4bf002be6f4d85668ce1257efa4d9 upstream. Using the IPCB() macro to get the IPv4 options is convenient, but unfortunately NetLabel often needs to examine the CIPSO option outside of the scope of the IP layer in the stack. While historically IPCB() worked above the IP layer, due to the inclusion of the inet_skb_param struct at the head of the {tcp,udp}_skb_cb structs, recent commit 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses") reordered the tcp_skb_cb struct and invalidated this IPCB() trick. This patch fixes the problem by creating a new function, cipso_v4_optptr(), which locates the CIPSO option inside the IP header without calling IPCB(). Unfortunately, this isn't as fast as a simple lookup so some additional tweaks were made to limit the use of this new function. Reported-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06SUNRPC: NULL utsname dereference on NFS umount during namespace cleanupTrond Myklebust
commit 03a9a42a1a7e5b3e7919ddfacc1d1cc81882a955 upstream. Fix an Oopsable condition when nsm_mon_unmon is called as part of the namespace cleanup, which now apparently happens after the utsname has been freed. Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06NFS: struct nfs_commit_info.lock must always point to inode->i_lockTrond Myklebust
commit f4086a3d789dbe18949862276d83b8f49fce6d2f upstream. Commit 411a99adffb4f (nfs: clear_request_commit while holding i_lock) assumes that the nfs_commit_info always points to the inode->i_lock. For historical reasons, that is not the case for O_DIRECT writes. Cc: Weston Andros Adamson <dros@primarydata.com> Fixes: 411a99adffb4f ("nfs: clear_request_commit while holding i_lock") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06fsnotify: fix handling of renames in auditJan Kara
commit 6ee8e25fc3e916193bce4ebb43d5439e1e2144ab upstream. Commit e9fd702a58c4 ("audit: convert audit watches to use fsnotify instead of inotify") broke handling of renames in audit. Audit code wants to update inode number of an inode corresponding to watched name in a directory. When something gets renamed into a directory to a watched name, inotify previously passed moved inode to audit code however new fsnotify code passes directory inode where the change happened. That confuses audit and it starts watching parent directory instead of a file in a directory. This can be observed for example by doing: cd /tmp touch foo bar auditctl -w /tmp/foo touch foo mv bar foo touch foo In audit log we see events like: type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1 ... type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE ... and that's it - we see event for the first touch after creating the audit rule, we see events for rename but we don't see any event for the last touch. However we start seeing events for unrelated stuff happening in /tmp. Fix the problem by passing moved inode as data in the FS_MOVED_FROM and FS_MOVED_TO events instead of the directory where the change happens. This doesn't introduce any new problems because noone besides audit_watch.c cares about the passed value: fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events. fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all. fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH. kernel/audit_tree.c doesn't care about passed 'data' at all. kernel/audit_watch.c expects moved inode as 'data'. Fixes: e9fd702a58c49db ("audit: convert audit watches to use fsnotify instead of inotify") Signed-off-by: Jan Kara <jack@suse.cz> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-26net: sched: fix panic in rate estimatorsEric Dumazet
[ Upstream commit 0d32ef8cef9aa8f375e128f78b77caceaa7e8da0 ] Doing the following commands on a non idle network device panics the box instantly, because cpu_bstats gets overwritten by stats. tc qdisc add dev eth0 root <your_favorite_qdisc> ... some traffic (one packet is enough) ... tc qdisc replace dev eth0 root est 1sec 4sec <your_favorite_qdisc> [ 325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c [ 325.362609] IP: [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90 [ 325.369158] PGD 1fa7067 PUD 0 [ 325.372254] Oops: 0000 [#1] SMP [ 325.375514] Modules linked in: ... [ 325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163 [ 325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000 [ 325.419518] RIP: 0010:[<ffffffff81541c9e>] [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90 [ 325.428506] RSP: 0018:ffff881ff2fa7928 EFLAGS: 00010286 [ 325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c [ 325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060 [ 325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000 [ 325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744 [ 325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c [ 325.469536] FS: 00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000 [ 325.477630] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0 [ 325.490510] Stack: [ 325.492524] ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0 [ 325.499990] ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744 [ 325.507460] 00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0 [ 325.514956] Call Trace: [ 325.517412] [<ffffffff815424ee>] gen_new_estimator+0x8e/0x230 [ 325.523250] [<ffffffff815427aa>] gen_replace_estimator+0x4a/0x60 [ 325.529349] [<ffffffff815718ab>] tc_modify_qdisc+0x52b/0x590 [ 325.535117] [<ffffffff8155edd0>] rtnetlink_rcv_msg+0xa0/0x240 [ 325.540963] [<ffffffff8155ed30>] ? __rtnl_unlock+0x20/0x20 [ 325.546532] [<ffffffff8157f811>] netlink_rcv_skb+0xb1/0xc0 [ 325.552145] [<ffffffff8155b355>] rtnetlink_rcv+0x25/0x40 [ 325.557558] [<ffffffff8157f0d8>] netlink_unicast+0x168/0x220 [ 325.563317] [<ffffffff8157f47c>] netlink_sendmsg+0x2ec/0x3e0 Lets play safe and not use an union : percpu 'pointers' are mostly read anyway, and we have typically few qdiscs per host. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Fixes: 22e0f8b9322c ("net: sched: make bstats per cpu and estimator RCU safe") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-26ipv4: tcp: get rid of ugly unicast_sockEric Dumazet
[ Upstream commit bdbbb8527b6f6a358dbcb70dac247034d665b8e4 ] In commit be9f4a44e7d41 ("ipv4: tcp: remove per net tcp_sock") I tried to address contention on a socket lock, but the solution I chose was horrible : commit 3a7c384ffd57e ("ipv4: tcp: unicast_sock should not land outside of TCP stack") addressed a selinux regression. commit 0980e56e506b ("ipv4: tcp: set unicast_sock uc_ttl to -1") took care of another regression. commit b5ec8eeac46 ("ipv4: fix ip_send_skb()") fixed another regression. commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate") was another shot in the dark. Really, just use a proper socket per cpu, and remove the skb_orphan() call, to re-enable flow control. This solves a serious problem with FQ packet scheduler when used in hostile environments, as we do not want to allocate a flow structure for every RST packet sent in response to a spoofed packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-26ipv4: try to cache dst_entries which would cause a redirectHannes Frederic Sowa
[ Upstream commit df4d92549f23e1c037e83323aff58a21b3de7fe0 ] Not caching dst_entries which cause redirects could be exploited by hosts on the same subnet, causing a severe DoS attack. This effect aggravated since commit f88649721268999 ("ipv4: fix dst race in sk_dst_get()"). Lookups causing redirects will be allocated with DST_NOCACHE set which will force dst_release to free them via RCU. Unfortunately waiting for RCU grace period just takes too long, we can end up with >1M dst_entries waiting to be released and the system will run OOM. rcuos threads cannot catch up under high softirq load. Attaching the flag to emit a redirect later on to the specific skb allows us to cache those dst_entries thus reducing the pressure on allocation and deallocation. This issue was discovered by Marcelo Leitner. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Marcelo Leitner <mleitner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11x86/tlb/trace: Do not trace on CPU that is offlineSteven Rostedt (Red Hat)
commit 6c8465a82a605bc692304bab42703017dcfff013 upstream. When taking a CPU down for suspend and resume, a tracepoint may be called when the CPU has been designated offline. As tracepoints require RCU for protection, they must not be called if the current CPU is offline. Unfortunately, trace_tlb_flush() is called in this scenario as was noted by LOCKDEP: ... Disabling non-boot CPUs ... intel_pstate CPU 1 exiting =============================== smpboot: CPU 1 didn't die... [ INFO: suspicious RCU usage. ] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted ------------------------------- include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 0 no locks held by swapper/1/0. stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011 ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600 0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78 Call Trace: [<ffffffff817e370d>] dump_stack+0x4c/0x65 [<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0 [<ffffffff81054c4e>] play_dead_common+0xe/0x50 [<ffffffff81054ca5>] native_play_dead+0x15/0x140 [<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20 [<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580 [<ffffffff81053e20>] start_secondary+0x140/0x150 intel_pstate CPU 2 exiting ... By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected code when the CPU is offline. Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com Fixes: d17d8f9dedb9 "x86/mm: Add tracepoints for TLB flushes" Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Hansen <dave@sr71.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11tracing: Add condition check to RCU lockdep checksSteven Rostedt (Red Hat)
commit a05d59a5673339ef6936d6940cdf68172ce75b9f upstream. The trace_tlb_flush() tracepoint can be called when a CPU is going offline. When a CPU is offline, RCU is no longer watching that CPU and since the tracepoint is protected by RCU, it must not be called. To prevent the tlb_flush tracepoint from being called when the CPU is offline, it was converted to a TRACE_EVENT_CONDITION where the condition checks if the CPU is online before calling the tracepoint. Unfortunately, this was not enough to stop lockdep from complaining about it. Even though the RCU protected code of the tracepoint will never be called, the condition is hidden within the tracepoint, and even though the condition prevents RCU code from being called, the lockdep checks are outside the tracepoint (this is to test tracepoints even when they are not enabled). Even though tracepoints should be checked to be RCU safe when they are not enabled, the condition should still be considered when checking RCU. Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com Fixes: 3a630178fd5f "tracing: generate RCU warnings even when tracepoints are disabled" Acked-by: Dave Hansen <dave@sr71.net> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11ALSA: ak411x: Fix stall in work callbackTakashi Iwai
commit 4161b4505f1690358ac0a9ee59845a7887336b21 upstream. When ak4114 work calls its callback and the callback invokes ak4114_reinit(), it stalls due to flush_delayed_work(). For avoiding this, control the reentrance by introducing a refcount. Also flush_delayed_work() is replaced with cancel_delayed_work_sync(). The exactly same bug is present in ak4113.c and fixed as well. Reported-by: Pavel Hofman <pavel.hofman@ivitera.com> Acked-by: Jaroslav Kysela <perex@perex.cz> Tested-by: Pavel Hofman <pavel.hofman@ivitera.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space unitsJan Kara
commit 14bf61ffe6ac54afcd1e888a4407fe16054483db upstream. Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which tracks space limits and usage in 512-byte blocks. However VFS quotas track usage in bytes (as some filesystems require that) and we need to somehow pass this information. Upto now it wasn't a problem because we didn't do any unit conversion (thus VFS quota routines happily stuck number of bytes into d_bcount field of struct fd_disk_quota). Only if you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone tried this but reportedly some Samba users hit the problem in practice. So when we want interfaces compatible we need to fix this. We bite the bullet and define another quota structure used for passing information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have to have more conversion routines in fs/quota/quota.c and another copying of quota structure slows down getting of quota information by about 2% but it seems cleaner than overloading e.g. units of d_bcount to bytes. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>