summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2012-03-19Block: use a freezable workqueue for disk-event pollingAlan Stern
commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 upstream. This patch (as1519) fixes a bug in the block layer's disk-events polling. The polling is done by a work routine queued on the system_nrt_wq workqueue. Since that workqueue isn't freezable, the polling continues even in the middle of a system sleep transition. Obviously, polling a suspended drive for media changes and such isn't a good thing to do; in the case of USB mass-storage devices it can lead to real problems requiring device resets and even re-enumeration. The patch fixes things by creating a new system-wide, non-reentrant, freezable workqueue and using it for disk-events polling. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19block: Fix NULL pointer dereference in sd_revalidate_diskJun'ichi Nomura
commit fe316bf2d5847bc5dd975668671a7b1067603bc7 upstream. Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(), __blkdev_get() calls rescan_partitions() to remove in-kernel partition structures and raise KOBJ_CHANGE uevent. However it ends up calling driver's revalidate_disk without open and could cause oops. In the case of SCSI: process A process B ---------------------------------------------- sys_open __blkdev_get sd_open returns -ENOMEDIUM scsi_remove_device <scsi_device torn down> rescan_partitions sd_revalidate_disk <oops> Oopses are reported here: http://marc.info/?l=linux-scsi&m=132388619710052 This patch separates the partition invalidation from rescan_partitions() and use it for -ENOMEDIUM case. Reported-by: Huajun Li <huajun.li.lee@gmail.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19ipsec: be careful of non existing mac headersEric Dumazet
[ Upstream commit 03606895cd98c0a628b17324fd7b5ff15db7e3cd ] Niccolo Belli reported ipsec crashes in case we handle a frame without mac header (atm in his case) Before copying mac header, better make sure it is present. Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809 Reported-by: Niccolò Belli <darkbasic@linuxsystems.it> Tested-by: Niccolò Belli <darkbasic@linuxsystems.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12OMAPDSS: HDMI: PHY burnout fixTomi Valkeinen
commit c49d005b6cc8491fad5b24f82805be2d6bcbd3dd upstream. A hardware bug in the OMAP4 HDMI PHY causes physical damage to the board if the HDMI PHY is kept powered on when the cable is not connected. This patch solves the problem by adding hot-plug-detection into the HDMI IP driver. This is not a real HPD support in the sense that nobody else than the IP driver gets to know about the HPD events, but is only meant to fix the HW bug. The strategy is simple: If the display device is turned off by the user, the PHY power is set to OFF. When the display device is turned on by the user, the PHY power is set either to LDOON or TXON, depending on whether the HDMI cable is connected. The reason to avoid PHY OFF when the display device is on, but the cable is disconnected, is that when the PHY is turned OFF, the HDMI IP is not "ticking" and thus the DISPC does not receive pixel clock from the HDMI IP. This would, for example, prevent any VSYNCs from happening, and would thus affect the users of omapdss. By using LDOON when the cable is disconnected we'll avoid the HW bug, but keep the HDMI working as usual from the user's point of view. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12regset: Return -EFAULT, not -EIO, on host-side memory faultH. Peter Anvin
commit 5189fa19a4b2b4c3bec37c3a019d446148827717 upstream. There is only one error code to return for a bad user-space buffer pointer passed to a system call in the same address space as the system call is executed, and that is EFAULT. Furthermore, the low-level access routines, which catch most of the faults, return EFAULT already. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12regset: Prevent null pointer reference on readonly regsetsH. Peter Anvin
commit c8e252586f8d5de906385d8cf6385fee289a825e upstream. The regset common infrastructure assumed that regsets would always have .get and .set methods, but not necessarily .active methods. Unfortunately people have since written regsets without .set methods. Rather than putting in stub functions everywhere, handle regsets with null .get or .set methods explicitly. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12Fix autofs compile without CONFIG_COMPATLinus Torvalds
commit 3c761ea05a8900a907f32b628611873f6bef24b2 upstream. The autofs compat handling fix caused a compile failure when CONFIG_COMPAT isn't defined. Instead of adding random #ifdef'fery in autofs, let's just make the compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task() just hardcodes to zero. We could probably do something similar for a number of other cases where we have #ifdef's in code, but this is the low-hanging fruit. Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29epoll: limit pathsJason Baron
commit 28d82dc1c4edbc352129f97f4ca22624d1fe61de upstream. The current epoll code can be tickled to run basically indefinitely in both loop detection path check (on ep_insert()), and in the wakeup paths. The programs that tickle this behavior set up deeply linked networks of epoll file descriptors that cause the epoll algorithms to traverse them indefinitely. A couple of these sample programs have been previously posted in this thread: https://lkml.org/lkml/2011/2/25/297. To fix the loop detection path check algorithms, I simply keep track of the epoll nodes that have been already visited. Thus, the loop detection becomes proportional to the number of epoll file descriptor and links. This dramatically decreases the run-time of the loop check algorithm. In one diabolical case I tried it reduced the run-time from 15 mintues (all in kernel time) to .3 seconds. Fixing the wakeup paths could be done at wakeup time in a similar manner by keeping track of nodes that have already been visited, but the complexity is harder, since there can be multiple wakeups on different cpus...Thus, I've opted to limit the number of possible wakeup paths when the paths are created. This is accomplished, by noting that the end file descriptor points that are found during the loop detection pass (from the newly added link), are actually the sources for wakeup events. I keep a list of these file descriptors and limit the number and length of these paths that emanate from these 'source file descriptors'. In the current implemetation I allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of length 4 and 10 of length 5. Note that it is sufficient to check the 'source file descriptors' reachable from the newly added link, since no other 'source file descriptors' will have newly added links. This allows us to check only the wakeup paths that may have gotten too long, and not re-check all possible wakeup paths on the system. In terms of the path limit selection, I think its first worth noting that the most common case for epoll, is probably the model where you have 1 epoll file descriptor that is monitoring n number of 'source file descriptors'. In this case, each 'source file descriptor' has a 1 path of length 1. Thus, I believe that the limits I'm proposing are quite reasonable and in fact may be too generous. Thus, I'm hoping that the proposed limits will not prevent any workloads that currently work to fail. In terms of locking, I have extended the use of the 'epmutex' to all epoll_ctl add and remove operations. Currently its only used in a subset of the add paths. I need to hold the epmutex, so that we can correctly traverse a coherent graph, to check the number of paths. I believe that this additional locking is probably ok, since its in the setup/teardown paths, and doesn't affect the running paths, but it certainly is going to add some extra overhead. Also, worth noting is that the epmuex was recently added to the ep_ctl add operations in the initial path loop detection code using the argument that it was not on a critical path. Another thing to note here, is the length of epoll chains that is allowed. Currently, eventpoll.c defines: /* Maximum number of nesting allowed inside epoll sets */ #define EP_MAX_NESTS 4 This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS + 1). However, this limit is currently only enforced during the loop check detection code, and only when the epoll file descriptors are added in a certain order. Thus, this limit is currently easily bypassed. The newly added check for wakeup paths, stricly limits the wakeup paths to a length of 5, regardless of the order in which ep's are linked together. Thus, a side-effect of the new code is a more consistent enforcement of the graph depth. Thus far, I've tested this, using the sample programs previously mentioned, which now either return quickly or return -EINVAL. I've also testing using the piptest.c epoll tester, which showed no difference in performance. I've also created a number of different epoll networks and tested that they behave as expectded. I believe this solves the original diabolical test cases, while still preserving the sane epoll nesting. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Nelson Elhage <nelhage@ksplice.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()Oleg Nesterov
commit d80e731ecab420ddcb79ee9d0ac427acbc187b4b upstream. This patch is intentionally incomplete to simplify the review. It ignores ep_unregister_pollwait() which plays with the same wqh. See the next change. epoll assumes that the EPOLL_CTL_ADD'ed file controls everything f_op->poll() needs. In particular it assumes that the wait queue can't go away until eventpoll_release(). This is not true in case of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand which is not connected to the file. This patch adds the special event, POLLFREE, currently only for epoll. It expects that init_poll_funcptr()'ed hook should do the necessary cleanup. Perhaps it should be defined as EPOLLFREE in eventpoll. __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if ->signalfd_wqh is not empty, we add the new signalfd_cleanup() helper. ep_poll_callback(POLLFREE) simply does list_del_init(task_list). This make this poll entry inconsistent, but we don't care. If you share epoll fd which contains our sigfd with another process you should blame yourself. signalfd is "really special". I simply do not know how we can define the "right" semantics if it used with epoll. The main problem is, epoll calls signalfd_poll() once to establish the connection with the wait queue, after that signalfd_poll(NULL) returns the different/inconsistent results depending on who does EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd has nothing to do with the file, it works with the current thread. In short: this patch is the hack which tries to fix the symptoms. It also assumes that nobody can take tasklist_lock under epoll locks, this seems to be true. Note: - we do not have wake_up_all_poll() but wake_up_poll() is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE. - signalfd_cleanup() uses POLLHUP along with POLLFREE, we need a couple of simple changes in eventpoll.c to make sure it can't be "lost". Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29target: Allow control CDBs with data > 1 pageAndy Grover
commit 4949314c7283ea4f9ade182ca599583b89f7edd6 upstream. We need to handle >1 page control cdbs, so extend the code to do a vmap if bigger than 1 page. It seems like kmap() is still preferable if just a page, fewer TLB shootdowns(?), so keep using that when possible. Rename function pair for their new scope. Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29USB: Remove duplicate USB 3.0 hub feature #defines.Sarah Sharp
commit d9f5343e35d9138432657202afa8e3ddb2ade360 upstream. Somehow we ended up with duplicate hub feature #defines in ch11.h. Tatyana Brokhman first created the USB 3.0 hub feature macros in 2.6.38 with commit 0eadcc09203349b11ca477ec367079b23d32ab91 "usb: USB3.0 ch11 definitions". In 2.6.39, I modified a patch from John Youn that added similar macros in a different place in the same file, and committed dbe79bbe9dcb22cb3651c46f18943477141ca452 "USB 3.0 Hub Changes". Some of the #defines used different names for the same values. Others used exactly the same names with the same values, like these gems: #define USB_PORT_FEAT_BH_PORT_RESET 28 ... #define USB_PORT_FEAT_BH_PORT_RESET 28 According to my very geeky husband (who looked it up in the C99 spec), it is allowed to have object-like macros with duplicate names as long as the replacement list is exactly the same. However, he recalled that some compilers will give warnings when they find duplicate macros. It's probably best to remove the duplicates in the stable tree, so that the code compiles for everyone. The macros are now fixed to move the feature requests that are specific to USB 3.0 hubs into a new section (out of the USB 2.0 hub feature section), and use the most common macro name. This patch should be backported to 2.6.39. Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Cc: Tatyana Brokhman <tlinder@codeaurora.org> Cc: John Youn <johnyoun@synopsys.com> Cc: Jamey Sharp <jamey@minilop.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29net: Make qdisc_skb_cb upper size bound explicit.David S. Miller
[ Upstream commit 16bda13d90c8d5da243e2cfa1677e62ecce26860 ] Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside of other data structures. This is intended to be used by IPoIB so that it can remember addressing information stored at hard_header_ops->create() time that it can fetch when the packet gets to the transmit routine. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29ipv4: reset flowi parameters on route connectJulian Anastasov
[ Upstream commit e6b45241c57a83197e5de9166b3b0d32ac562609 ] Eric Dumazet found that commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) that comes in 3.0 added a regression. The problem appears to be that resulting flowi4_oif is used incorrectly as input parameter to some routing lookups. The result is that when connecting to local port without listener if the IP address that is used is not on a loopback interface we incorrectly assign RTN_UNICAST to the output route because no route is matched by oif=lo. The RST packet can not be sent immediately by tcp_v4_send_reset because it expects RTN_LOCAL. So, change ip_route_connect and ip_route_newports to update the flowi4 fields that are input parameters because we do not want unnecessary binding to oif. To make it clear what are the input parameters that can be modified during lookup and to show which fields of floiw4 are reused add a new function to update the flowi4 structure: flowi4_update_output. Thanks to Yurij M. Plotnikov for providing a bug report including a program to reproduce the problem. Thanks to Eric Dumazet for tracking the problem down to tcp_v4_send_reset and providing initial fix. Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29NFSv4: Fix an Oops in the NFSv4 getacl codeTrond Myklebust
commit 331818f1c468a24e581aedcbe52af799366a9dfe upstream. Commit bf118a342f10dafe44b14451a1392c3254629a1f (NFSv4: include bitmap in nfsv4 get acl data) introduces the 'acl_scratch' page for the case where we may need to decode multi-page data. However it fails to take into account the fact that the variable may be NULL (for the case where we're not doing multi-page decode), and it also attaches it to the encoding xdr_stream rather than the decoding one. The immediate result is an Oops in nfs4_xdr_enc_getacl due to the call to page_address() with a NULL page pointer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Andy Adamson <andros@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20crypto: sha512 - use standard ror64()Alexey Dobriyan
commit f2ea0f5f04c97b48c88edccba52b0682fbe45087 upstream. Use standard ror64() instead of hand-written. There is no standard ror64, so create it. The difference is shift value being "unsigned int" instead of uint64_t (for which there is no reason). gcc starts to emit native ROR instructions which it doesn't do for some reason currently. This should make the code faster. Patch survives in-tree crypto test and ping flood with hmac(sha512) on. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20mmc: dw_mmc: Fix PIO mode with support of highmemSeungwon Jeon
commit f9c2a0dc42a6938ff2a80e55ca2bbd1d5581c72e upstream. Current PIO mode makes a kernel crash with CONFIG_HIGHMEM. Highmem pages have a NULL from sg_virt(sg). This patch fixes the following problem. Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c0004000 [00000000] *pgd=00000000 Internal error: Oops: 817 [#1] PREEMPT SMP Modules linked in: CPU: 0 Not tainted (3.0.15-01423-gdbf465f #589) PC is at dw_mci_pull_data32+0x4c/0x9c LR is at dw_mci_read_data_pio+0x54/0x1f0 pc : [<c0358824>] lr : [<c035988c>] psr: 20000193 sp : c0619d48 ip : c0619d70 fp : c0619d6c r10: 00000000 r9 : 00000002 r8 : 00001000 r7 : 00000200 r6 : 00000000 r5 : e1dd3100 r4 : 00000000 r3 : 65622023 r2 : 0000007f r1 : eeb96000 r0 : e1dd3100 Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment xkernel Control: 10c5387d Table: 61e2004a DAC: 00000015 Process swapper (pid: 0, stack limit = 0xc06182f0) Stack: (0xc0619d48 to 0xc061a000) 9d40: e1dd3100 e1a4f000 00000000 e1dd3100 e1a4f000 00000200 9d60: c0619da4 c0619d70 c035988c c03587e4 c0619d9c e18158f4 e1dd3100 e1dd3100 9d80: 00000020 00000000 00000000 00000020 c06e8a84 00000000 c0619e04 c0619da8 9da0: c0359b24 c0359844 e18158f4 e1dd3164 e1dd3168 e1dd3150 3d02fc79 e1dd3154 9dc0: e1dd3178 00000000 00000020 00000000 e1dd3150 00000000 c10dd7e8 e1a84900 9de0: c061e7cc 00000000 00000000 0000008d c06e8a84 c061e780 c0619e4c c0619e08 9e00: c00c4738 c0359a34 3d02fc79 00000000 c0619e4c c05a1698 c05a1670 c05a165c 9e20: c04de8b0 c061e780 c061e7cc e1a84900 ffffed68 0000008d c0618000 00000000 9e40: c0619e6c c0619e50 c00c48b4 c00c46c8 c061e780 c00423ac c061e7cc ffffed68 9e60: c0619e8c c0619e70 c00c7358 c00c487c 0000008d ffffee38 c0618000 ffffed68 9e80: c0619ea4 c0619e90 c00c4258 c00c72b0 c00423ac ffffee38 c0619ecc c0619ea8 9ea0: c004241c c00c4234 ffffffff f8810000 0000006d 00000002 00000001 7fffffff 9ec0: c0619f44 c0619ed0 c0048bc0 c00423c4 220ae7a9 00000000 386f0d30 0005d3a4 9ee0: c00423ac c10dd0b8 c06f2cd8 c0618000 c0594778 c003a674 7fffffff c0619f44 9f00: 386f0d30 c0619f18 c00a6f94 c005be3c 80000013 ffffffff 386f0d30 0005d3a4 9f20: 386f0d30 0005d2d1 c10dd0a8 c10dd0b8 c06f2cd8 c0618000 c0619f74 c0619f48 9f40: c0345858 c005be00 c00a2440 c0618000 c0618000 c00410d8 c06c1944 c00410fc 9f60: c0594778 c003a674 c0619f9c c0619f78 c004a7e8 c03457b4 c0618000 c06c18f8 9f80: 00000000 c0039c70 c06c18d4 c003a674 c0619fb4 c0619fa0 c04ceafc c004a714 9fa0: c06287b4 c06c18f8 c0619ff4 c0619fb8 c0008b68 c04cea68 c0008578 00000000 9fc0: 00000000 c003a674 00000000 10c5387d c0628658 c003aa78 c062f1c4 4000406a 9fe0: 413fc090 00000000 00000000 c0619ff8 40008044 c0008858 00000000 00000000 Backtrace: [<c03587d8>] (dw_mci_pull_data32+0x0/0x9c) from [<c035988c>] (dw_mci_read_data_pio+0x54/0x1f0) r6:00000200 r5:e1a4f000 r4:e1dd3100 [<c0359838>] (dw_mci_read_data_pio+0x0/0x1f0) from [<c0359b24>] (dw_mci_interrupt+0xfc/0x4a4) [<c0359a28>] (dw_mci_interrupt+0x0/0x4a4) from [<c00c4738>] (handle_irq_event_percpu+0x7c/0x1b4) [<c00c46bc>] (handle_irq_event_percpu+0x0/0x1b4) from [<c00c48b4>] (handle_irq_event+0x44/0x64) [<c00c4870>] (handle_irq_event+0x0/0x64) from [<c00c7358>] (handle_fasteoi_irq+0xb4/0x124) r7:ffffed68 r6:c061e7cc r5:c00423ac r4:c061e780 [<c00c72a4>] (handle_fasteoi_irq+0x0/0x124) from [<c00c4258>] (generic_handle_irq+0x30/0x38) r7:ffffed68 r6:c0618000 r5:ffffee38 r4:0000008d [<c00c4228>] (generic_handle_irq+0x0/0x38) from [<c004241c>] (asm_do_IRQ+0x64/0xe0) r5:ffffee38 r4:c00423ac [<c00423b8>] (asm_do_IRQ+0x0/0xe0) from [<c0048bc0>] (__irq_svc+0x80/0x14c) Exception stack(0xc0619ed0 to 0xc0619f18) Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Will Newton <will.newton@imgtec.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20writeback: fix dereferencing NULL bdi->dev on trace_writeback_queueWu Fengguang
commit 977b7e3a52a7421ad33a393a38ece59f3d41c2fa upstream. When a SD card is hot removed without umount, del_gendisk() will call bdi_unregister() without destroying/freeing it. This leaves the bdi in the bdi->dev = NULL, bdi->wb.task = NULL, bdi->bdi_list removed state. When sync(2) gets the bdi before bdi_unregister() and calls bdi_queue_work() after the unregister, trace_writeback_queue will be dereferencing the NULL bdi->dev. Fix it with a simple test for NULL. LKML-reference: http://lkml.org/lkml/2012/1/18/346 Reported-by: Rabin Vincent <rabin@rab.in> Tested-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20writeback: fix NULL bdi->dev in trace writeback_single_inodeWu Fengguang
commit 15eb77a07c714ac80201abd0a9568888bcee6276 upstream. bdi_prune_sb() resets sb->s_bdi to default_backing_dev_info when the tearing down the original bdi. Fix trace_writeback_single_inode to use sb->s_bdi=default_backing_dev_info rather than bdi->dev=NULL for a teared down bdi. Reported-by: Rabin Vincent <rabin@rab.in> Tested-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernelWu Fengguang
commit 3310225dfc71a35a2cc9340c15c0e08b14b3c754 upstream. PROP_MAX_SHIFT should be set to <=32 on 64-bit box. This fixes two bugs in the below lines of bdi_dirty_limit(): bdi_dirty *= numerator; do_div(bdi_dirty, denominator); 1) divide error: do_div() only uses the lower 32 bit of the denominator, which may trimmed to be 0 when PROP_MAX_SHIFT > 32. 2) overflow: (bdi_dirty * numerator) could easily overflow if numerator used up to 48 bits, leaving only 16 bits to bdi_dirty Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Ilya Tumaykin <librarian_rus@yahoo.com> Tested-by: Ilya Tumaykin <librarian_rus@yahoo.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-13usb: ch9.h: usb_endpoint_maxp() uses __le16_to_cpu()Kuninori Morimoto
commit 9c0a835a9d9aed41bcf9c287f5069133a6e2a87b upstream. The usb/ch9.h will be installed to /usr/include/linux, and be used from user space. But le16_to_cpu() is only defined for kernel code. Without this patch, user space compile will be broken. Special thanks to Stefan Becker Reported-by: Stefan Becker <chemobejk@gmail.com> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-13PM / QoS: CPU C-state breakage with PM Qos changeVenkatesh Pallipadi
commit d020283dc694c9ec31b410f522252f7a8397e67d upstream. Looks like change "PM QoS: Move and rename the implementation files" merged during the 3.2 development cycle made PM QoS depend on CONFIG_PM which depends on (PM_SLEEP || PM_RUNTIME). That breaks CPU C-states with kernels not having these CONFIGs, causing CPUs to spend time in Polling loop idle instead of going into deep C-states, consuming way way more power. This is with either acpi idle or intel idle enabled. Either CONFIG_PM should be enabled with any pm_qos users or the !CONFIG_PM pm_qos_request() should return sane defaults not to break the existing users. Here's is the patch for the latter option. [rjw: Modified the changelog slightly.] Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-13PM / Hibernate: Fix s2disk regression related to freezing workqueuesRafael J. Wysocki
commit 181e9bdef37bfcaa41f3ab6c948a2a0d60a268b5 upstream. Commit 2aede851ddf08666f68ffc17be446420e9d2a056 PM / Hibernate: Freeze kernel threads after preallocating memory introduced a mechanism by which kernel threads were frozen after the preallocation of hibernate image memory to avoid problems with frozen kernel threads not responding to memory freeing requests. However, it overlooked the s2disk code path in which the SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE, which caused freeze_workqueues_begin() to BUG(), because it saw that worqueues had been already frozen. Although in principle this issue might be addressed by removing the relevant BUG_ON() from freeze_workqueues_begin(), that would reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4 attempted to avoid into that particular code path. For this reason, to fix the issue at hand, introduce thaw_kernel_threads() and make the SNAPSHOT_FREE ioctl execute it. Special thanks to Srivatsa S. Bhat for detailed analysis of the problem. Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-06PCI: Rework ASPM disable codeMatthew Garrett
commit 3c076351c4027a56d5005a39a0b518a4ba393ce2 upstream. Right now we forcibly clear ASPM state on all devices if the BIOS indicates that the feature isn't supported. Based on the Microsoft presentation "PCI Express In Depth for Windows Vista and Beyond", I'm starting to think that this may be an error. The implication is that unless the platform grants full control via _OSC, Windows will not touch any PCIe features - including ASPM. In that case clearing ASPM state would be an error unless the platform has granted us that control. This patch reworks the ASPM disabling code such that the actual clearing of state is triggered by a successful handoff of PCIe control to the OS. The general ASPM code undergoes some changes in order to ensure that the ability to clear the bits isn't overridden by ASPM having already been disabled. Further, this theoretically now allows for situations where only a subset of PCIe roots hand over control, leaving the others in the BIOS state. It's difficult to know for sure that this is the right thing to do - there's zero public documentation on the interaction between all of these components. But enough vendors enable ASPM on platforms and then set this bit that it seems likely that they're expecting the OS to leave them alone. Measured to save around 5W on an idle Thinkpad X220. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03netns: Fail conspicously if someone uses net_generic at an inappropriate time.Eric W. Biederman
[ Upstream commit 5ee4433efe99b9f39f6eff5052a177bbcfe72cea ] By definition net_generic should never be called when it can return NULL. Fail conspicously with a BUG_ON to make it clear when people mess up that a NULL return should never happen. Recently there was a bug in the CAIF subsystem where it was registered with register_pernet_device instead of register_pernet_subsys. It was erroneously concluded that net_generic could validly return NULL and that net_assign_generic was buggy (when it was just inefficient). Hopefully this BUG_ON will prevent people to coming to similar erroneous conclusions in the futrue. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03drm: Fix authentication kernel crashThomas Hellstrom
commit 598781d71119827b454fd75d46f84755bca6f0c6 upstream. If the master tries to authenticate a client using drm_authmagic and that client has already closed its drm file descriptor, either wilfully or because it was terminated, the call to drm_authmagic will dereference a stale pointer into kmalloc'ed memory and corrupt it. Typically this results in a hard system hang. This patch fixes that problem by removing any authentication tokens (struct drm_magic_entry) open for a file descriptor when that file descriptor is closed. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-01-25SHM_UNLOCK: fix Unevictable pages stranded after swapHugh Dickins
commit 245132643e1cfcd145bbc86a716c1818371fcb93 upstream. Commit cc39c6a9bbde ("mm: account skipped entries to avoid looping in find_get_pages") correctly fixed an infinite loop; but left a problem that find_get_pages() on shmem would return 0 (appearing to callers to mean end of tree) when it meets a run of nr_pages swap entries. The only uses of find_get_pages() on shmem are via pagevec_lookup(), called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's scan_mapping_unevictable_pages(). The first is already commented, and not worth worrying about; but the second can leave pages on the Unevictable list after an unusual sequence of swapping and locking. Fix that by using shmem_find_get_pages_and_swap() (then ignoring the swap) instead of pagevec_lookup(). But I don't want to contaminate vmscan.c with shmem internals, nor shmem.c with LRU locking. So move scan_mapping_unevictable_pages() into shmem.c, renaming it shmem_unlock_mapping(); and rename check_move_unevictable_page() to check_move_unevictable_pages(), looping down an array of pages, oftentimes under the same lock. Leave out the "rotate unevictable list" block: that's a leftover from when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed handling involved looking at pages at tail of LRU. Was there significance to the sequence first ClearPageUnevictable, then test page_evictable, then SetPageUnevictable here? I think not, we're under LRU lock, and have no barriers between those. Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25tuner: Fix numberspace conflict between xc4000 and pti 5nf05 tunersMiroslav Slugen
commit cd4ca7afc61d3b18fcd635002459fb6b1d701099 upstream. Update xc4000 tuner definition, number 81 is already in use by TUNER_PARTSNIC_PTI_5NF05. Signed-off-by: Miroslav Slugen <thunder.mmm@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25target: Set additional sense length field in sense dataRoland Dreier
commit 895f3022523361e9b383cf48f51feb1f7d5e7e53 upstream. The target code was not setting the additional sense length field in the sense data it returned, which meant that at least the Linux stack ignored the ASC/ASCQ fields. For example, without this patch, on a tcm_loop device: # sg_raw -v /dev/sda 2 0 0 0 0 0 gives cdb to send: 02 00 00 00 00 00 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Raw sense data (in hex): 70 00 05 00 00 00 00 00 while after the patch we correctly get the following (which matches what a regular disk returns): cdb to send: 02 00 00 00 00 00 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Additional sense: Invalid command operation code Raw sense data (in hex): 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25ACPI: Store SRAT table revisionKurt Garloff
commit 8df0eb7c9d96f9e82f233ee8b74e0f0c8471f868 upstream. In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides 32bits for these. The new fields were reserved before. According to the ACPI spec, the OS must disregrard reserved fields. In order to know whether or not, we must know what version the SRAT table has. This patch stores the SRAT table revision for later consumption by arch specific __init functions. Signed-off-by: Kurt Garloff <kurt@garloff.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25block: fail SCSI passthrough ioctls on partition devicesPaolo Bonzini
commit 0bfc96cb77224736dfa35c3c555d37b3646ef35e upstream. [ Changes with respect to 3.3: return -ENOTTY from scsi_verify_blk_ioctl and -ENOIOCTLCMD from sd_compat_ioctl. ] Linux allows executing the SG_IO ioctl on a partition or LVM volume, and will pass the command to the underlying block device. This is well-known, but it is also a large security problem when (via Unix permissions, ACLs, SELinux or a combination thereof) a program or user needs to be granted access only to part of the disk. This patch lets partitions forward a small set of harmless ioctls; others are logged with printk so that we can see which ioctls are actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred. Of course it was being sent to a (partition on a) hard disk, so it would have failed with ENOTTY and the patch isn't changing anything in practice. Still, I'm treating it specially to avoid spamming the logs. In principle, this restriction should include programs running with CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and /dev/sdb, it still should not be able to read/write outside the boundaries of /dev/sda2 independent of the capabilities. However, for now programs with CAP_SYS_RAWIO will still be allowed to send the ioctls. Their actions will still be logged. This patch does not affect the non-libata IDE driver. That driver however already tests for bd != bd->bd_contains before issuing some ioctl; it could be restricted further to forbid these ioctls even for programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO. Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [ Make it also print the command name when warning - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25block: add and use scsi_blk_cmd_ioctlPaolo Bonzini
commit 577ebb374c78314ac4617242f509e2f5e7156649 upstream. Introduce a wrapper around scsi_cmd_ioctl that takes a block device. The function will then be enhanced to detect partition block devices and, in that case, subject the ioctls to whitelisting. Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25fix shrink_dcache_parent() livelockMiklos Szeredi
commit eaf5f9073533cde21c7121c136f1c3f072d9cf59 upstream. Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may cause shrink_dcache_parent() to loop forever. Here's what appears to happen: 1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1 2 - CPU1: select_parent(P) locks P->d_lock 3 - CPU0: shrink_dentry_list() locks C->d_lock dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock 4 - CPU1: select_parent(P) locks C->d_lock, moves C from dispose list being processed on CPU0 to the new dispose list, returns 1 5 - CPU0: shrink_dentry_list() finds dispose list empty, returns 6 - Goto 2 with CPU0 and CPU1 switched Basically select_parent() steals the dentry from shrink_dentry_list() and thinks it found a new one, causing shrink_dentry_list() to think it's making progress and loop over and over. One way to trigger this is to make udev calls stat() on the sysfs file while it is going away. Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick: ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true" Then execute the following loop: while true; do echo -bond0 > /sys/class/net/bonding_masters echo +bond0 > /sys/class/net/bonding_masters echo -bond1 > /sys/class/net/bonding_masters echo +bond1 > /sys/class/net/bonding_masters done One fix would be to check all callers and prevent concurrent calls to shrink_dcache_parent(). But I think a better solution is to stop the stealing behavior. This patch adds a new dentry flag that is set when the dentry is added to the dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a new reference just before being pruned. If the dentry has this flag, select_parent() will skip it and let shrink_dentry_list() retry pruning it. With select_parent() skipping those dentries there will not be the appearance of progress (new dentries found) when there is none, hence shrink_dcache_parent() will not loop forever. Set the flag is also set in prune_dcache_sb() for consistency as suggested by Linus. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25svcrpc: destroy server sockets all at onceJ. Bruce Fields
commit 2fefb8a09e7ed251ae8996e0c69066e74c5aa560 upstream. There's no reason I can see that we need to call sv_shutdown between closing the two lists of sockets. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25V4L/DVB: v4l2-ioctl: integer overflow in video_usercopy()Dan Carpenter
commit 6c06108be53ca5e94d8b0e93883d534dd9079646 upstream. If ctrls->count is too high the multiplication could overflow and array_size would be lower than expected. Mauro and Hans Verkuil suggested that we cap it at 1024. That comes from the maximum number of controls with lots of room for expantion. $ grep V4L2_CID include/linux/videodev2.h | wc -l 211 Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25memcg: add mem_cgroup_replace_page_cache() to fix LRU issueKAMEZAWA Hiroyuki
commit ab936cbcd02072a34b60d268f94440fd5cf1970b upstream. Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a function replace_page_cache_page(). This function replaces a page in the radix-tree with a new page. WHen doing this, memory cgroup needs to fix up the accounting information. memcg need to check PCG_USED bit etc. In some(many?) cases, 'newpage' is on LRU before calling replace_page_cache(). So, memcg's LRU accounting information should be fixed, too. This patch adds mem_cgroup_replace_page_cache() and removes the old hooks. In that function, old pages will be unaccounted without touching res_counter and new page will be accounted to the memcg (of old page). WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid races with LRU handling. Background: replace_page_cache_page() is called by FUSE code in its splice() handling. Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated page and may be on LRU. LRU mis-accounting will be critical for memory cgroup because rmdir() checks the whole LRU is empty and there is no account leak. If a page is on the other LRU than it should be, rmdir() will fail. This bug was added in March 2011, but no bug report yet. I guess there are not many people who use memcg and FUSE at the same time with upstream kernels. The result of this bug is that admin cannot destroy a memcg because of account leak. So, no panic, no deadlock. And, even if an active cgroup exist, umount can succseed. So no problem at shutdown. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25include/linux/crash_dump.h needs elf.hFabio Estevam
commit 1f536b9e9f85456df93614b3c2f6a1a2b7d7cb9b upstream. Building an ARM target we get the following warnings: CC arch/arm/kernel/setup.o In file included from arch/arm/kernel/setup.c:39: arch/arm/include/asm/elf.h:102:1: warning: "vmcore_elf64_check_arch" redefined In file included from arch/arm/kernel/setup.c:24: include/linux/crash_dump.h:30:1: warning: this is the location of the previous definition Quoting Russell King: "linux/crash_dump.h makes no attempt to include asm/elf.h, but it depends on stuff in asm/elf.h to determine how stuff inside this file is defined at parse time. So, if asm/elf.h is included after linux/crash_dump.h or not at all, you get a different result from the situation where asm/elf.h is included before." So add elf.h header to crash_dump.h to avoid this problem. The original discussion about this can be found at: http://www.spinics.net/lists/arm-kernel/msg154113.html Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25xen/xenbus: Reject replies with payload > XENSTORE_PAYLOAD_MAX.Ian Campbell
commit 9e7860cee18241633eddb36a4c34c7b61d8cecbc upstream. Haogang Chen found out that: There is a potential integer overflow in process_msg() that could result in cross-domain attack. body = kmalloc(msg->hdr.len + 1, GFP_NOIO | __GFP_HIGH); When a malicious guest passes 0xffffffff in msg->hdr.len, the subsequent call to xb_read() would write to a zero-length buffer. The other end of this connection is always the xenstore backend daemon so there is no guest (malicious or otherwise) which can do this. The xenstore daemon is a trusted component in the system. However this seem like a reasonable robustness improvement so we should have it. And Ian when read the API docs found that: The payload length (len field of the header) is limited to 4096 (XENSTORE_PAYLOAD_MAX) in both directions. If a client exceeds the limit, its xenstored connection will be immediately killed by xenstored, which is usually catastrophic from the client's point of view. Clients (particularly domains, which cannot just reconnect) should avoid this. so this patch checks against that instead. This also avoids a potential integer overflow pointed out by Haogang Chen. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25PCI: Fix PCI_EXP_TYPE_RC_EC valueAlex Williamson
commit 1830ea91c20b06608f7cdb2455ce05ba834b3214 upstream. Spec shows this as 1010b = 0xa Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25NFSv4: include bitmap in nfsv4 get acl dataAndy Adamson
commit bf118a342f10dafe44b14451a1392c3254629a1f upstream. The NFSv4 bitmap size is unbounded: a server can return an arbitrary sized bitmap in an FATTR4_WORD0_ACL request. Replace using the nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data xdr length to the (cached) acl page data. This is a general solution to commit e5012d1f "NFSv4.1: update nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-12usb: ch9: fix up MaxStreams helperFelipe Balbi
commit 18b7ede5f7ee2092aedcb578d3ac30bd5d4fc23c upstream. [ removed the dwc3 portion of the patch as it didn't apply to older kernels - gregkh] According to USB 3.0 Specification Table 9-22, if bmAttributes [4:0] are set to zero, it means "no streams supported", but the way this helper was defined on Linux, we will *always* have one stream which might cause several problems. For example on DWC3, we would tell the controller endpoint has streams enabled and yet start transfers with Stream ID set to 0, which would goof up the host side. While doing that, convert the macro to an inline function due to the different checks we now need. Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-12usb: fix number of mapped SG DMA entriesClemens Ladisch
commit bc677d5b64644c399cd3db6a905453e611f402ab upstream. Add a new field num_mapped_sgs to struct urb so that we have a place to store the number of mapped entries and can also retain the original value of entries in num_sgs. Previously, usb_hcd_map_urb_for_dma() would overwrite this with the number of mapped entries, which would break dma_unmap_sg() because it requires the original number of entries. This fixes warnings like the following when using USB storage devices: ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:902 check_unmap+0x4e4/0x695() ehci_hcd 0000:00:12.2: DMA-API: device driver frees DMA sg list with different entry count [map count=4] [unmap count=1] Modules linked in: ohci_hcd ehci_hcd Pid: 0, comm: kworker/0:1 Not tainted 3.2.0-rc2+ #319 Call Trace: <IRQ> [<ffffffff81036d3b>] warn_slowpath_common+0x80/0x98 [<ffffffff81036de7>] warn_slowpath_fmt+0x41/0x43 [<ffffffff811fa5ae>] check_unmap+0x4e4/0x695 [<ffffffff8105e92c>] ? trace_hardirqs_off+0xd/0xf [<ffffffff8147208b>] ? _raw_spin_unlock_irqrestore+0x33/0x50 [<ffffffff811fa84a>] debug_dma_unmap_sg+0xeb/0x117 [<ffffffff8137b02f>] usb_hcd_unmap_urb_for_dma+0x71/0x188 [<ffffffff8137b166>] unmap_urb_for_dma+0x20/0x22 [<ffffffff8137b1c5>] usb_hcd_giveback_urb+0x5d/0xc0 [<ffffffffa0000d02>] ehci_urb_done+0xf7/0x10c [ehci_hcd] [<ffffffffa0001140>] qh_completions+0x429/0x4bd [ehci_hcd] [<ffffffffa000340a>] ehci_work+0x95/0x9c0 [ehci_hcd] ... ---[ end trace f29ac88a5a48c580 ]--- Mapped at: [<ffffffff811faac4>] debug_dma_map_sg+0x45/0x139 [<ffffffff8137bc0b>] usb_hcd_map_urb_for_dma+0x22e/0x478 [<ffffffff8137c494>] usb_hcd_submit_urb+0x63f/0x6fa [<ffffffff8137d01c>] usb_submit_urb+0x2c7/0x2de [<ffffffff8137dcd4>] usb_sg_wait+0x55/0x161 Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-03security: Fix security_old_inode_init_security() when CONFIG_SECURITY is not setJan Kara
Commit 1e39f384bb01 ("evm: fix build problems") makes the stub version of security_old_inode_init_security() return 0 when CONFIG_SECURITY is not set. But that makes callers such as reiserfs_security_init() assume that security_old_inode_init_security() has set name, value, and len arguments properly - but security_old_inode_init_security() left them uninitialized which then results in interesting failures. Revert security_old_inode_init_security() to the old behavior of returning EOPNOTSUPP since both callers (reiserfs and ocfs2) handle this just fine. [ Also fixed the S_PRIVATE(inode) case of the actual non-stub security_old_inode_init_security() function to return EOPNOTSUPP for the same reason, as pointed out by Mimi Zohar. It got incorrectly changed to match the new function in commit fb88c2b6cbb1: "evm: fix security/security_old_init_security return code". - Linus ] Reported-by: Jorge Bastos <mysql.jorge@decimal.pt> Acked-by: James Morris <jmorris@namei.org> Acked-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-31ipvs: try also real server with port 0 in backup serverJulian Anastasov
We should not forget to try for real server with port 0 in the backup server when processing the sync message. We should do it in all cases because the backup server can use different forwarding method. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-29procfs: do not confuse jiffies with cputime64_tAndreas Schwab
Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time for nohz") did not take into account that one some architectures jiffies and cputime use different units. This causes get_idle_time() to return numbers in the wrong units, making the idle time fields in /proc/stat wrong. Instead of converting the usec value returned by get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function usecs_to_cputime64 to convert it to the correct unit of cputime64_t. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Artem S. Tashkinov" <t.artem@mailcity.com> Cc: Dave Jones <davej@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-26KVM: Don't automatically expose the TSC deadline timer in cpuidJan Kiszka
Unlike all of the other cpuid bits, the TSC deadline timer bit is set unconditionally, regardless of what userspace wants. This is broken in several ways: - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC deadline timer feature, a guest that uses the feature will break - live migration to older host kernels that don't support the TSC deadline timer will cause the feature to be pulled from under the guest's feet; breaking it - guests that are broken wrt the feature will fail. Fix by not enabling the feature automatically; instead report it to userspace. Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not KVM_GET_SUPPORTED_CPUID. Fixes the Illumos guest kernel, which uses the TSC deadline timer feature. [avi: add the KVM_CAP + documentation] Reported-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: Fix race between CPU hotplug and lglocks
2011-12-23Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linuxLinus Torvalds
for linus: writeback reason binary tracing format fix * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: show writeback reason with __print_symbolic
2011-12-23net: relax rcvbuf limitsEric Dumazet
skb->truesize might be big even for a small packet. Its even bigger after commit 87fb4b7b533 (net: more accurate skb truesize) and big MTU. We should allow queueing at least one packet per receiver, even with a low RCVBUF setting. Reported-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22net: introduce DST_NOPEER dst flagEric Dumazet
Chris Boot reported crashes occurring in ipv6_select_ident(). [ 461.457562] RIP: 0010:[<ffffffff812dde61>] [<ffffffff812dde61>] ipv6_select_ident+0x31/0xa7 [ 461.578229] Call Trace: [ 461.580742] <IRQ> [ 461.582870] [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2 [ 461.589054] [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155 [ 461.595140] [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b [ 461.601198] [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e [nf_conntrack_ipv6] [ 461.608786] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.614227] [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543 [ 461.620659] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.626440] [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a [bridge] [ 461.633581] [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459 [ 461.639577] [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76 [bridge] [ 461.646887] [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f [bridge] [ 461.653997] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.659473] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.665485] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.671234] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.677299] [<ffffffffa0379215>] ? nf_bridge_update_protocol+0x20/0x20 [bridge] [ 461.684891] [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack] [ 461.691520] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.697572] [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56 [bridge] [ 461.704616] [<ffffffffa0379031>] ? nf_bridge_push_encap_header+0x1c/0x26 [bridge] [ 461.712329] [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95 [bridge] [ 461.719490] [<ffffffffa037900a>] ? nf_bridge_pull_encap_header+0x1c/0x27 [bridge] [ 461.727223] [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge] [ 461.734292] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.739758] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.746203] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.751950] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.758378] [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56 [bridge] This is caused by bridge netfilter special dst_entry (fake_rtable), a special shared entry, where attaching an inetpeer makes no sense. Problem is present since commit 87c48fa3b46 (ipv6: make fragment identifications less predictable) Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and __ip_select_ident() fallback to the 'no peer attached' handling. Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22VFS: Fix race between CPU hotplug and lglocksSrivatsa S. Bhat
Currently, the *_global_[un]lock_online() routines are not at all synchronized with CPU hotplug. Soft-lockups detected as a consequence of this race was reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng for finding out that the root-cause of this issue is the race condition between br_write_[un]lock() and CPU hotplug, which results in the lock states getting messed up). Fixing this race by just adding {get,put}_online_cpus() at appropriate places in *_global_[un]lock_online() is not a good option, because, then suddenly br_write_[un]lock() would become blocking, whereas they have been kept as non-blocking all this time, and we would want to keep them that way. So, overall, we want to ensure 3 things: 1. br_write_lock() and br_write_unlock() must remain as non-blocking. 2. The corresponding lock and unlock of the per-cpu spinlocks must not happen for different sets of CPUs. 3. Either prevent any new CPU online operation in between this lock-unlock, or ensure that the newly onlined CPU does not proceed with its corresponding per-cpu spinlock unlocked. To achieve all this: (a) We introduce a new spinlock that is taken by the *_global_lock_online() routine and released by the *_global_unlock_online() routine. (b) We register a callback for CPU hotplug notifications, and this callback takes the same spinlock as above. (c) We maintain a bitmap which is close to the cpu_online_mask, and once it is initialized in the lock_init() code, all future updates to it are done in the callback, under the above spinlock. (d) The above bitmap is used (instead of cpu_online_mask) while locking and unlocking the per-cpu locks. The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the br_write_lock-unlock sequence is in progress, the callback keeps spinning, thus preventing the CPU online operation till the lock-unlock sequence is complete. This takes care of requirement (3). The bitmap that we maintain remains unmodified throughout the lock-unlock sequence, since all updates to it are managed by the callback, which takes the same spinlock as the one taken by the lock code and released only by the unlock routine. Combining this with (d) above, satisfies requirement (2). Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug operations from racing with br_write_lock-unlock, requirement (1) is also taken care of. By the way, it is to be noted that a CPU offline operation can actually run in parallel with our lock-unlock sequence, because our callback doesn't react to notifications earlier than CPU_DEAD (in order to maintain our bitmap properly). And this means, since we use our own bitmap (which is stale, on purpose) during the lock-unlock sequence, we could end up unlocking the per-cpu lock of an offline CPU (because we had locked it earlier, when the CPU was online), in order to satisfy requirement (2). But this is harmless, though it looks a bit awkward. Debugged-by: Cong Meng <mc@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org