summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2014-11-14scsi: Fix error handling in SCSI_IOCTL_SEND_COMMANDJan Kara
commit 84ce0f0e94ac97217398b3b69c21c7a62ebeed05 upstream. When sg_scsi_ioctl() fails to prepare request to submit in blk_rq_map_kern() we jump to a label where we just end up copying (luckily zeroed-out) kernel buffer to userspace instead of reporting error. Fix the problem by jumping to the right label. CC: Jens Axboe <axboe@kernel.dk> CC: linux-scsi@vger.kernel.org Coverity-id: 1226871 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixed up the, now unused, out label. Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-14block: fix alignment_offset math that assumes io_min is a power-of-2Mike Snitzer
commit b8839b8c55f3fdd60dc36abcda7e0266aff7985c upstream. The math in both blk_stack_limits() and queue_limit_alignment_offset() assume that a block device's io_min (aka minimum_io_size) is always a power-of-2. Fix the math such that it works for non-power-of-2 io_min. This issue (of alignment_offset != 0) became apparent when testing dm-thinp with a thinp blocksize that matches a RAID6 stripesize of 1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data block size") unlocked the potential for alignment_offset != 0 due to the dm-thin-pool's io_min possibly being a non-power-of-2. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05partitions: aix.c: off by one bugDan Carpenter
commit d97a86c170b4e432f76db072a827fe30b4d6f659 upstream. The lvip[] array has "state->limit" elements so the condition here should be >= instead of >. Fixes: 6ceea22bbbc8 ('partitions: add aix lvm partition support files') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05genhd: fix leftover might_sleep() in blk_free_devt()Jens Axboe
commit 46f341ffcfb5d8530f7d1e60f3be06cce6661b62 upstream. Commit 2da78092 changed the locking from a mutex to a spinlock, so we now longer sleep in this context. But there was a leftover might_sleep() in there, which now triggers since we do the final free from an RCU callback. Get rid of it. Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05block: Fix dev_t minor allocation lifetimeKeith Busch
commit 2da78092dda13f1efd26edbbf99a567776913750 upstream. Releases the dev_t minor when all references are closed to prevent another device from acquiring the same major/minor. Since the partition's release may be invoked from call_rcu's soft-irq context, the ext_dev_idr's mutex had to be replaced with a spinlock so as not so sleep. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05cfq-iosched: Fix wrong children_weight calculationToshiaki Makita
commit e15693ef18e13e3e6bffe891fe140f18b8ff6d07 upstream. cfq_group_service_tree_add() is applying new_weight at the beginning of the function via cfq_update_group_weight(). This actually allows weight to change between adding it to and subtracting it from children_weight, and triggers WARN_ON_ONCE() in cfq_group_service_tree_del(), or even causes oops by divide error during vfr calculation in cfq_group_service_tree_add(). The detailed scenario is as follows: 1. Create blkio cgroups X and Y as a child of X. Set X's weight to 500 and perform some I/O to apply new_weight. This X's I/O completes before starting Y's I/O. 2. Y starts I/O and cfq_group_service_tree_add() is called with Y. 3. cfq_group_service_tree_add() walks up the tree during children_weight calculation and adds parent X's weight (500) to children_weight of root. children_weight becomes 500. 4. Set X's weight to 1000. 5. X starts I/O and cfq_group_service_tree_add() is called with X. 6. cfq_group_service_tree_add() applies its new_weight (1000). 7. I/O of Y completes and cfq_group_service_tree_del() is called with Y. 8. I/O of X completes and cfq_group_service_tree_del() is called with X. 9. cfq_group_service_tree_del() subtracts X's weight (1000) from children_weight of root. children_weight becomes -500. This triggers WARN_ON_ONCE(). 10. Set X's weight to 500. 11. X starts I/O and cfq_group_service_tree_add() is called with X. 12. cfq_group_service_tree_add() applies its new_weight (500) and adds it to children_weight of root. children_weight becomes 0. Calcularion of vfr triggers oops by divide error. weight should be updated right before adding it to children_weight. Reported-by: Ruki Sekiya <sekiya.ruki@lab.ntt.co.jp> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-31blkcg: don't call into policy draining if root_blkg is already goneTejun Heo
commit 0b462c89e31f7eb6789713437eb551833ee16ff3 upstream. While a queue is being destroyed, all the blkgs are destroyed and its ->root_blkg pointer is set to NULL. If someone else starts to drain while the queue is in this state, the following oops happens. NULL pointer dereference at 0000000000000028 IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230 PGD e4a1067 PUD b773067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched] CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000 RIP: 0010:[<ffffffff8144e944>] [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230 RSP: 0018:ffff88000efd7bf0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001 R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450 R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28 FS: 00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0 Stack: ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80 ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58 ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450 Call Trace: [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60 [<ffffffff81427641>] __blk_drain_queue+0x71/0x180 [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0 [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120 [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50 [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40 [<ffffffff8142d476>] blk_release_queue+0x26/0xd0 [<ffffffff81454968>] kobject_cleanup+0x38/0x70 [<ffffffff81454848>] kobject_put+0x28/0x60 [<ffffffff81427505>] blk_put_queue+0x15/0x20 [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0 [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0 [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20 [<ffffffff817930e2>] device_release+0x32/0xa0 [<ffffffff81454968>] kobject_cleanup+0x38/0x70 [<ffffffff81454848>] kobject_put+0x28/0x60 [<ffffffff817934d7>] put_device+0x17/0x20 [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0 [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40 [<ffffffff817d1257>] sdev_store_delete+0x27/0x30 [<ffffffff81792ca8>] dev_attr_store+0x18/0x30 [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50 [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170 [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0 [<ffffffff811f69bd>] SyS_write+0x4d/0xc0 [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b 776687bce42b ("block, blk-mq: draining can't be skipped even if bypass_depth was non-zero") made it easier to trigger this bug by making blk_queue_bypass_start() drain even when it loses the first bypass test to blk_cleanup_queue(); however, the bug has always been there even before the commit as blk_queue_bypass_start() could race against queue destruction, win the initial bypass test but perform the actual draining after blk_cleanup_queue() already destroyed all blkgs. Fix it by skippping calling into policy draining if all the blkgs are already gone. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Shirish Pargaonkar <spargaonkar@suse.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: Jet Chen <jet.chen@intel.com> Tested-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-31block: don't assume last put of shared tags is for the hostChristoph Hellwig
commit d45b3279a5a2252cafcd665bbf2db8c9b31ef783 upstream. There is no inherent reason why the last put of a tag structure must be the one for the Scsi_Host, as device model objects can be held for arbitrary periods. Merge blk_free_tags and __blk_free_tags into a single funtion that just release a references and get rid of the BUG() when the host reference wasn't the last. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-31block: provide compat ioctl for BLKZEROOUTMikulas Patocka
commit 3b3a1814d1703027f9867d0f5cbbfaf6c7482474 upstream. This patch provides the compat BLKZEROOUT ioctl. The argument is a pointer to two uint64_t values, so there is no need to translate it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt ↵Tejun Heo
an atomic_t commit a5049a8ae34950249a7ae94c385d7c5c98914412 upstream. Hello, So, this patch should do. Joe, Vivek, can one of you guys please verify that the oops goes away with this patch? Jens, the original thread can be read at http://thread.gmane.org/gmane.linux.kernel/1720729 The fix converts blkg->refcnt from int to atomic_t. It does some overhead but it should be minute compared to everything else which is going on and the involved cacheline bouncing, so I think it's highly unlikely to cause any noticeable difference. Also, the refcnt in question should be converted to a perpcu_ref for blk-mq anyway, so the atomic_t is likely to go away pretty soon anyway. Thanks. ------- 8< ------- __blkg_release_rcu() may be invoked after the associated request_queue is released with a RCU grace period inbetween. As such, the function and callbacks invoked from it must not dereference the associated request_queue. This is clearly indicated in the comment above the function. Unfortunately, while trying to fix a different issue, 2a4fd070ee85 ("blkcg: move bulk of blkcg_gq release operations to the RCU callback") ignored this and added [un]locking of @blkg->q->queue_lock to __blkg_release_rcu(). This of course can cause oops as the request_queue may be long gone by the time this code gets executed. general protection fault: 0000 [#1] SMP CPU: 21 PID: 30 Comm: rcuos/21 Not tainted 3.15.0 #1 Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.3:57 12/25/2013 task: ffff880854021de0 ti: ffff88085403c000 task.ti: ffff88085403c000 RIP: 0010:[<ffffffff8162e9e5>] [<ffffffff8162e9e5>] _raw_spin_lock_irq+0x15/0x60 RSP: 0018:ffff88085403fdf0 EFLAGS: 00010086 RAX: 0000000000020000 RBX: 0000000000000010 RCX: 0000000000000000 RDX: 000060ef80008248 RSI: 0000000000000286 RDI: 6b6b6b6b6b6b6b6b RBP: ffff88085403fdf0 R08: 0000000000000286 R09: 0000000000009f39 R10: 0000000000020001 R11: 0000000000020001 R12: ffff88103c17a130 R13: ffff88103c17a080 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88107fca0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006e5ab8 CR3: 000000000193d000 CR4: 00000000000407e0 Stack: ffff88085403fe18 ffffffff812cbfc2 ffff88103c17a130 0000000000000000 ffff88103c17a130 ffff88085403fec0 ffffffff810d1d28 ffff880854021de0 ffff880854021de0 ffff88107fcaec58 ffff88085403fe80 ffff88107fcaec30 Call Trace: [<ffffffff812cbfc2>] __blkg_release_rcu+0x72/0x150 [<ffffffff810d1d28>] rcu_nocb_kthread+0x1e8/0x300 [<ffffffff81091d81>] kthread+0xe1/0x100 [<ffffffff8163813c>] ret_from_fork+0x7c/0xb0 Code: ff 47 04 48 8b 7d 08 be 00 02 00 00 e8 55 48 a4 ff 5d c3 0f 1f 00 66 66 66 66 90 55 48 89 e5 +fa 66 66 90 66 66 90 b8 00 00 02 00 <f0> 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f +b7 RIP [<ffffffff8162e9e5>] _raw_spin_lock_irq+0x15/0x60 RSP <ffff88085403fdf0> The request_queue locking was added because blkcg_gq->refcnt is an int protected with the queue lock and __blkg_release_rcu() needs to put the parent. Let's fix it by making blkcg_gq->refcnt an atomic_t and dropping queue locking in the function. Given the general heavy weight of the current request_queue and blkcg operations, this is unlikely to cause any noticeable overhead. Moreover, blkcg_gq->refcnt is likely to be converted to percpu_ref in the near future, so whatever (most likely negligible) overhead it may add is temporary. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Joe Lawrence <joe.lawrence@stratus.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/g/alpine.DEB.2.02.1406081816540.17948@jlaw-desktop.mno.stratus.com Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-31blktrace: fix accounting of partially completed requestsRoman Pen
commit af5040da01ef980670b3741b3e10733ee3e33566 upstream. trace_block_rq_complete does not take into account that request can be partially completed, so we can get the following incorrect output of blkparser: C R 232 + 240 [0] C R 240 + 232 [0] C R 248 + 224 [0] C R 256 + 216 [0] but should be: C R 232 + 8 [0] C R 240 + 8 [0] C R 248 + 8 [0] C R 256 + 8 [0] Also, the whole output summary statistics of completed requests and final throughput will be incorrect. This patch takes into account real completion size of the request and fixes wrong completion accounting. Signed-off-by: Roman Pen <r.peniaev@gmail.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@redhat.com> CC: linux-kernel@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-20block: free q->flush_rq in blk_init_allocated_queue error pathsDave Jones
Commit 7982e90c3a57 ("block: fix q->flush_rq NULL pointer crash on dm-mpath flush") moved an allocation to blk_init_allocated_queue(), but neglected to free that allocation on the error paths that follow. Signed-off-by: Dave Jones <davej@fedoraproject.org> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-08block: change flush sequence list addition back to front addMike Snitzer
Commit 18741986 inadvertently changed the rq flush insertion from a head to a tail insertion. Fix that back up. Signed-off-by: Mike Snitzer <msnitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-03-08block: fix q->flush_rq NULL pointer crash on dm-mpath flushMike Snitzer
Commit 1874198 ("blk-mq: rework flush sequencing logic") switched ->flush_rq from being an embedded member of the request_queue structure to being dynamically allocated in blk_init_queue_node(). Request-based DM multipath doesn't use blk_init_queue_node(), instead it uses blk_alloc_queue_node() + blk_init_allocated_queue(). Because commit 1874198 placed the dynamic allocation of ->flush_rq in blk_init_queue_node() any flush issued to a dm-mpath device would crash with a NULL pointer, e.g.: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8125037e>] blk_rq_init+0x1e/0xb0 PGD bb3c7067 PUD bb01d067 PMD 0 Oops: 0002 [#1] SMP ... CPU: 5 PID: 5028 Comm: dt Tainted: G W O 3.14.0-rc3.snitm+ #10 ... task: ffff88032fb270e0 ti: ffff880079564000 task.ti: ffff880079564000 RIP: 0010:[<ffffffff8125037e>] [<ffffffff8125037e>] blk_rq_init+0x1e/0xb0 RSP: 0018:ffff880079565c98 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030 RDX: ffff880260c74048 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff880079565ca8 R08: ffff880260aa1e98 R09: 0000000000000001 R10: ffff88032fa78500 R11: 0000000000000246 R12: 0000000000000000 R13: ffff880260aa1de8 R14: 0000000000000650 R15: 0000000000000000 FS: 00007f8d36a2a700(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000079b36000 CR4: 00000000000007e0 Stack: 0000000000000000 ffff880260c74048 ffff880079565cd8 ffffffff81257a47 ffff880260aa1de8 ffff880260c74048 0000000000000001 0000000000000000 ffff880079565d08 ffffffff81257c2d 0000000000000000 ffff880260aa1de8 Call Trace: [<ffffffff81257a47>] blk_flush_complete_seq+0x2d7/0x2e0 [<ffffffff81257c2d>] blk_insert_flush+0x1dd/0x210 [<ffffffff8124ec59>] __elv_add_request+0x1f9/0x320 [<ffffffff81250681>] ? blk_account_io_start+0x111/0x190 [<ffffffff81253a4b>] blk_queue_bio+0x25b/0x330 [<ffffffffa0020bf5>] dm_request+0x35/0x40 [dm_mod] [<ffffffff812530c0>] generic_make_request+0xc0/0x100 [<ffffffff81253173>] submit_bio+0x73/0x140 [<ffffffff811becdd>] submit_bio_wait+0x5d/0x80 [<ffffffff81257528>] blkdev_issue_flush+0x78/0xa0 [<ffffffff811c1f6f>] blkdev_fsync+0x3f/0x60 [<ffffffff811b7fde>] vfs_fsync_range+0x1e/0x20 [<ffffffff811b7ffc>] vfs_fsync+0x1c/0x20 [<ffffffff811b81f1>] do_fsync+0x41/0x80 [<ffffffff8118874e>] ? SyS_lseek+0x7e/0x80 [<ffffffff811b8260>] SyS_fsync+0x10/0x20 [<ffffffff8154c2d2>] system_call_fastpath+0x16/0x1b Fix this by moving the ->flush_rq allocation from blk_init_queue_node() to blk_init_allocated_queue(). blk_init_queue_node() also calls blk_init_allocated_queue() so this change is functionality equivalent for all blk_init_queue_node() callers. Reported-by: Hannes Reinecke <hare@suse.de> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-03-07blk-mq: add REQ_SYNC earlyShaohua Li
Add REQ_SYNC early, so rq_dispatched[] in blk_mq_rq_ctx_init is set correctly. Signed-off-by: Shaohua Li<shli@fusionio.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-03-03rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlockMike Galbraith
[ 365.164040] BUG: sleeping function called from invalid context at kernel/rtmutex.c:674 [ 365.164041] in_atomic(): 1, irqs_disabled(): 1, pid: 26, name: migration/1 [ 365.164043] no locks held by migration/1/26. [ 365.164044] irq event stamp: 6648 [ 365.164056] hardirqs last enabled at (6647): [<ffffffff8153d377>] restore_args+0x0/0x30 [ 365.164062] hardirqs last disabled at (6648): [<ffffffff810ed98d>] multi_cpu_stop+0x9d/0x120 [ 365.164070] softirqs last enabled at (0): [<ffffffff810543bc>] copy_process.part.28+0x6fc/0x1920 [ 365.164072] softirqs last disabled at (0): [< (null)>] (null) [ 365.164076] CPU: 1 PID: 26 Comm: migration/1 Tainted: GF N 3.12.12-rt19-0.gcb6c4a2-rt #3 [ 365.164078] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011 [ 365.164091] 0000000000000001 ffff880a42ea7c30 ffffffff815367e6 ffffffff81a086c0 [ 365.164099] ffff880a42ea7c40 ffffffff8108919c ffff880a42ea7c60 ffffffff8153c24f [ 365.164107] ffff880a42ea91f0 00000000ffffffe1 ffff880a42ea7c88 ffffffff81297ec0 [ 365.164108] Call Trace: [ 365.164119] [<ffffffff810060b1>] try_stack_unwind+0x191/0x1a0 [ 365.164127] [<ffffffff81004872>] dump_trace+0x92/0x360 [ 365.164133] [<ffffffff81006108>] show_trace_log_lvl+0x48/0x60 [ 365.164138] [<ffffffff81004c18>] show_stack_log_lvl+0xd8/0x1d0 [ 365.164143] [<ffffffff81006160>] show_stack+0x20/0x50 [ 365.164153] [<ffffffff815367e6>] dump_stack+0x54/0x9a [ 365.164163] [<ffffffff8108919c>] __might_sleep+0xfc/0x140 [ 365.164173] [<ffffffff8153c24f>] rt_spin_lock+0x1f/0x70 [ 365.164182] [<ffffffff81297ec0>] blk_mq_main_cpu_notify+0x20/0x70 [ 365.164191] [<ffffffff81540a1c>] notifier_call_chain+0x4c/0x70 [ 365.164201] [<ffffffff81083499>] __raw_notifier_call_chain+0x9/0x10 [ 365.164207] [<ffffffff810567be>] cpu_notify+0x1e/0x40 [ 365.164217] [<ffffffff81525da2>] take_cpu_down+0x22/0x40 [ 365.164223] [<ffffffff810ed9c6>] multi_cpu_stop+0xd6/0x120 [ 365.164229] [<ffffffff810edd97>] cpu_stopper_thread+0xd7/0x1e0 [ 365.164235] [<ffffffff810863a3>] smpboot_thread_fn+0x203/0x380 [ 365.164241] [<ffffffff8107cbf8>] kthread+0xc8/0xd0 [ 365.164250] [<ffffffff8154440c>] ret_from_fork+0x7c/0xb0 [ 365.164429] smpboot: CPU 1 is now offline Signed-off-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-21blk-mq: support partial I/O completionsChristoph Hellwig
Add a new blk_mq_end_io_partial function to partially complete requests as needed by the SCSI layer. We do this by reusing blk_update_request to advance the bio instead of having a simplified version of it in the blk-mq code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-21blk-mq: merge blk_mq_insert_request and blk_mq_run_requestChristoph Hellwig
It's almost identical to blk_mq_insert_request, so fold the two into one slightly more generic function by making the flush special case a bit smarted. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-21blk-mq: remove blk_mq_alloc_rqChristoph Hellwig
There's only one caller, which is a straight wrapper and fits the naming scheme of the related functions a lot better. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-14Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO fixes from Jens Axboe: "Second round of updates and fixes for 3.14-rc2. Most of this stuff has been queued up for a while. The notable exception is the blk-mq changes, which are naturally a bit more in flux still. The pull request contains: - Two bug fixes for the new immutable vecs, causing crashes with raid or swap. From Kent. - Various blk-mq tweaks and fixes from Christoph. A fix for integrity bio's from Nic. - A few bcache fixes from Kent and Darrick Wong. - xen-blk{front,back} fixes from David Vrabel, Matt Rushton, Nicolas Swenson, and Roger Pau Monne. - Fix for a vec miscount with integrity vectors from Martin. - Minor annotations or fixes from Masanari Iida and Rashika Kheria. - Tweak to null_blk to do more normal FIFO processing of requests from Shlomo Pongratz. - Elevator switching bypass fix from Tejun. - Softlockup in blkdev_issue_discard() fix when !CONFIG_PREEMPT from me" * 'for-linus' of git://git.kernel.dk/linux-block: (31 commits) block: add cond_resched() to potentially long running ioctl discard loop xen-blkback: init persistent_purge_work work_struct blk-mq: pair blk_mq_start_request / blk_mq_requeue_request blk-mq: dont assume rq->errors is set when returning an error from ->queue_rq block: Fix cloning of discard/write same bios block: Fix type mismatch in ssize_t_blk_mq_tag_sysfs_show blk-mq: rework flush sequencing logic null_blk: use blk_complete_request and blk_mq_complete_request virtio_blk: use blk_mq_complete_request blk-mq: rework I/O completions fs: Add prototype declaration to appropriate header file include/linux/bio.h fs: Mark function as static in fs/bio-integrity.c block/null_blk: Fix completion processing from LIFO to FIFO block: Explicitly handle discard/write same segments block: Fix nr_vecs for inline integrity vectors blk-mq: Add bio_integrity setup to blk_mq_make_request blk-mq: initialize sg_reserved_size blk-mq: handle dma_drain_size blk-mq: divert __blk_put_request for MQ ops blk-mq: support at_head inserations for blk_execute_rq ...
2014-02-12block: add cond_resched() to potentially long running ioctl discard loopJens Axboe
When mkfs issues a full device discard and the device only supports discards of a smallish size, we can loop in blkdev_issue_discard() for a long time. If preempt isn't enabled, this can turn into a softlock situation and the kernel will start complaining. Add an explicit cond_resched() at the end of the loop to avoid that. Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-11blk-mq: pair blk_mq_start_request / blk_mq_requeue_requestChristoph Hellwig
Make sure we have a proper pairing between starting and requeueing requests. Move the dma drain and REQ_END setup into blk_mq_start_request, and make sure blk_mq_requeue_request properly undoes them, giving us a pair of function to prepare and unprepare a request without leaving side effects. Together this ensures we always clean up properly after BLK_MQ_RQ_QUEUE_BUSY returns from ->queue_rq. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-11blk-mq: dont assume rq->errors is set when returning an error from ->queue_rqChristoph Hellwig
rq->errors never has been part of the communication protocol between drivers and the block stack and most drivers will not have initialized it. Return -EIO to upper layers when the driver returns BLK_MQ_RQ_QUEUE_ERROR unconditionally. If a driver want to return a different error it can easily do so by returning success after calling blk_mq_end_io itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-10block: Fix type mismatch in ssize_t_blk_mq_tag_sysfs_showMasanari Iida
cppcheck detected following format string mismatch. [blk-mq-tag.c:201]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'int'. Change "cpu" from int to unsigned int, because the cpu never become minus value. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-10blk-mq: rework flush sequencing logicChristoph Hellwig
Witch to using a preallocated flush_rq for blk-mq similar to what's done with the old request path. This allows us to set up the request properly with a tag from the actually allowed range and ->rq_disk as needed by some drivers. To make life easier we also switch to dynamic allocation of ->flush_rq for the old path. This effectively reverts most of "blk-mq: fix for flush deadlock" and "blk-mq: Don't reserve a tag for flush request" Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-10blk-mq: rework I/O completionsChristoph Hellwig
Rework I/O completions to work more like the old code path. blk_mq_end_io now stays out of the business of deferring completions to others CPUs and calling blk_mark_rq_complete. The latter is very important to allow completing requests that have timed out and thus are already marked completed, the former allows using the IPI callout even for driver specific completions instead of having to reimplement them. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-07block: Explicitly handle discard/write same segmentsKent Overstreet
Immutable biovecs changed the way biovecs are interpreted - drivers no longer use bi_vcnt, they have to go by bi_iter.bi_size (to allow for using part of an existing segment without modifying it). This breaks with discards and write_same bios, since for those bi_size has nothing to do with segments in the biovec. So for now, we need a fairly gross hack - we fortunately know that there will never be more than one segment for the entire request, so we can special case discard/write_same. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-07blk-mq: Add bio_integrity setup to blk_mq_make_requestNicholas Bellinger
This patch adds the missing bio_integrity_enabled() + bio_integrity_prep() setup into blk_mq_make_request() in order to use DIF protection with scsi-mq. Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-07blk-mq: initialize sg_reserved_sizeChristoph Hellwig
To behave the same way as the old request path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-07blk-mq: handle dma_drain_sizeChristoph Hellwig
Make blk-mq handle the dma_drain_size field the same way as the old request path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-07blk-mq: divert __blk_put_request for MQ opsChristoph Hellwig
__blk_put_request needs to call into the blk-mq code just like blk_put_request. As we don't have the queue lock in this case both end up calling the same function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-07blk-mq: support at_head inserations for blk_execute_rqChristoph Hellwig
This is neede for proper SG_IO operation as well as various uses of blk_execute_rq from the SCSI midlayer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-01-31Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "The highlights this round include: - add support for SCSI Referrals (Hannes) - add support for T10 DIF into target core (nab + mkp) - add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab) - add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab) - prep changes to iser-target for >= v3.15 T10 DIF support (Sagi) - add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn) - allow percpu_ida_alloc() to receive task state bitmask (Kent) - fix >= v3.12 iscsi-target session reset hung task regression (nab) - fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab) - fix a long-standing network portal creation race (Andy)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits) target: Fix percpu_ref_put race in transport_lun_remove_cmd target/iscsi: Fix network portal creation race target: Report bad sector in sense data for DIF errors iscsi-target: Convert gfp_t parameter to task state bitmask iscsi-target: Fix connection reset hang with percpu_ida_alloc percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask iscsi-target: Pre-allocate more tags to avoid ack starvation qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO. qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine IB/isert: Move fastreg descriptor creation to a function IB/isert: Avoid frwr notation, user fastreg IB/isert: seperate connection protection domains and dma MRs tcm_loop: Enable DIF/DIX modes in SCSI host LLD target/rd: Add DIF protection into rd_execute_rw target/rd: Add support for protection SGL setup + release target/rd: Refactor rd_build_device_space + rd_release_device_space target/file: Add DIF protection support to fd_execute_rw target/file: Add DIF protection init/format support ...
2014-01-30block: __elv_next_request() shouldn't call into the elevator if bypassingTejun Heo
request_queue bypassing is used to suppress higher-level function of a request_queue so that they can be switched, reconfigured and shut down. A request_queue does the followings while bypassing. * bypasses elevator and io_cq association and queues requests directly to the FIFO dispatch queue. * bypasses block cgroup request_list lookup and always uses the root request_list. Once confirmed to be bypassing, specific elevator and block cgroup policy implementations can assume that nothing is in flight for them and perform various operations which would be dangerous otherwise. Such confirmation is acheived by short-circuiting all new requests directly to the dispatch queue and waiting for all the requests which were issued before to finish. Unfortunately, while the request allocating and draining sides were properly handled, we forgot to actually plug the request dispatch path. Even after bypassing mode is confirmed, if the attached driver tries to fetch a request and the dispatch queue is empty, __elv_next_request() would invoke the current elevator's elevator_dispatch_fn() callback. As all in-flight requests were drained, the elevator wouldn't contain any request but once bypass is confirmed we don't even know whether the elevator is even there. It might be in the process of being switched and half torn down. Frank Mayhar reports that this actually happened while switching elevators, leading to an oops. Let's fix it by making __elv_next_request() avoid invoking the elevator_dispatch_fn() callback if the queue is bypassing. It already avoids invoking the callback if the queue is dying. As a dying queue is guaranteed to be bypassing, we can simply replace blk_queue_dying() check with blk_queue_bypass(). Reported-by: Frank Mayhar <fmayhar@google.com> References: http://lkml.kernel.org/g/1390319905.20232.38.camel@bobble.lax.corp.google.com Cc: stable@vger.kernel.org Tested-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2014-01-30blk-mq: Don't reserve a tag for flush requestShaohua Li
Reserving a tag (request) for flush to avoid dead lock is a overkill. A tag is valuable resource. We can track the number of flush requests and disallow having too many pending flush requests allocated. With this patch, blk_mq_alloc_request_pinned() could do a busy nop (but not a dead loop) if too many pending requests are allocated and new flush request is allocated. But this should not be a problem, too many pending flush requests are very rare case. I verified this can fix the deadlock caused by too many pending flush requests. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2014-01-30Merge branch 'for-3.14/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO driver changes from Jens Axboe: - bcache update from Kent Overstreet. - two bcache fixes from Nicholas Swenson. - cciss pci init error fix from Andrew. - underflow fix in the parallel IDE pg_write code from Dan Carpenter. I'm sure the 1 (or 0) users of that are now happy. - two PCI related fixes for sx8 from Jingoo Han. - floppy init fix for first block read from Jiri Kosina. - pktcdvd error return miss fix from Julia Lawall. - removal of IRQF_SHARED from the SEGA Dreamcast CD-ROM code from Michael Opdenacker. - comment typo fix for the loop driver from Olaf Hering. - potential oops fix for null_blk from Raghavendra K T. - two fixes from Sam Bradshaw (Micron) for the mtip32xx driver, fixing an OOM problem and a problem with handling security locked conditions * 'for-3.14/drivers' of git://git.kernel.dk/linux-block: (47 commits) mg_disk: Spelling s/finised/finished/ null_blk: Null pointer deference problem in alloc_page_buffers mtip32xx: Correctly handle security locked condition mtip32xx: Make SGL container per-command to eliminate high order dma allocation drivers/block/loop.c: fix comment typo in loop_config_discard drivers/block/cciss.c:cciss_init_one(): use proper errnos drivers/block/paride/pg.c: underflow bug in pg_write() drivers/block/sx8.c: remove unnecessary pci_set_drvdata() drivers/block/sx8.c: use module_pci_driver() floppy: bail out in open() if drive is not responding to block0 read bcache: Fix auxiliary search trees for key size > cacheline size bcache: Don't return -EINTR when insert finished bcache: Improve bucket_prio() calculation bcache: Add bch_bkey_equal_header() bcache: update bch_bkey_try_merge bcache: Move insert_fixup() to btree_keys_ops bcache: Convert sorting to btree_keys bcache: Convert debug code to btree_keys bcache: Convert btree_iter to struct btree_keys bcache: Refactor bset_tree sysfs stats ...
2014-01-30Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
2014-01-28block/blk-mq-cpu.c: use hotcpu_notifier()Andrew Morton
Cleaner, reduces text size when cpu hotplug is disabled. Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2014-01-23percpu_ida: Make percpu_ida_alloc + callers accept task state bitmaskKent Overstreet
This patch changes percpu_ida_alloc() + callers to accept task state bitmask for prepare_to_wait() for code like target/iscsi that needs it for interruptible sleep, that is provided in a subsequent patch. It now expects TASK_UNINTERRUPTIBLE when the caller is able to sleep waiting for a new tag, or TASK_RUNNING when the caller cannot sleep, and is forced to return a negative value when no tags are available. v2 changes: - Include blk-mq + tcm_fc + vhost/scsi + target/iscsi changes - Drop signal_pending_state() call v3 changes: - Only call prepare_to_wait() + finish_wait() when != TASK_RUNNING (PeterZ) Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: <stable@vger.kernel.org> #3.12+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-21block: Fix memory leak in rw_copy_check_uvector() handlingChristian Engelmayer
Fix a memory leak in the error handling path of function sg_io() that is used during the processing of scsi ioctl. Memory already allocated by rw_copy_check_uvector() needs to be freed correctly. Detected by Coverity: CID 1128953. Signed-off-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2014-01-21block: remove unrelated header files and export symbolCaiZhiyong
Fix up the following items: - remove unrelated header files. - export interface function. - modify function cmdline_parts_parse return value, this will make it more friendly for the caller. Signed-off-by: CaiZhiyong <caizhiyong@huawei.com> Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> CC: Brian Norris <computersforpeace@gmail.com> Cc: "Wanglin (Albert)" <albert.wanglin@hisilicon.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Karel Zak <kzak@redhat.com> Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2014-01-21Merge branch 'for-3.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "The bulk of changes are cleanups and preparations for the upcoming kernfs conversion. - cgroup_event mechanism which is and will be used only by memcg is moved to memcg. - pidlist handling is updated so that it can be served by seq_file. Also, the list is not sorted if sane_behavior. cgroup documentation explicitly states that the file is not sorted but it has been for quite some time. - All cgroup file handling now happens on top of seq_file. This is to prepare for kernfs conversion. In addition, all operations are restructured so that they map 1-1 to kernfs operations. - Other cleanups and low-pri fixes" * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits) cgroup: trivial style updates cgroup: remove stray references to css_id doc: cgroups: Fix typo in doc/cgroups cgroup: fix fail path in cgroup_load_subsys() cgroup: fix missing unlock on error in cgroup_load_subsys() cgroup: remove for_each_root_subsys() cgroup: implement for_each_css() cgroup: factor out cgroup_subsys_state creation into create_css() cgroup: combine css handling loops in cgroup_create() cgroup: reorder operations in cgroup_create() cgroup: make for_each_subsys() useable under cgroup_root_mutex cgroup: css iterations and css_from_dir() are safe under cgroup_mutex cgroup: unify pidlist and other file handling cgroup: replace cftype->read_seq_string() with cftype->seq_show() cgroup: attach cgroup_open_file to all cgroup files cgroup: generalize cgroup_pidlist_open_file cgroup: unify read path so that seq_file is always used cgroup: unify cgroup_write_X64() and cgroup_write_string() cgroup: remove cftype->read(), ->read_map() and ->write() hugetlb_cgroup: convert away from cftype->read() ...
2014-01-08blk-mq: uses page->list incorrectlyDave Hansen
'struct page' has two list_head fields: 'lru' and 'list'. Conveniently, they are unioned together. This means that code can use them interchangably, which gets horribly confusing. The blk-mq made the logical decision to try to use page->list. But, that field was actually introduced just for the slub code. ->lru is the right field to use outside of slab/slub. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2014-01-08blk-mq: use __smp_call_function_single directlyChristoph Hellwig
__smp_call_function_single already avoids multiple IPIs by internally queing up the items, and now also is available for non-SMP builds as a trivially correct stub, so there is no need to wrap it. If the additional lock roundtrip cause problems my patch to convert the generic IPI code to llists is waiting to get merged will fix it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2014-01-08bcache/md: Use raid stripe sizeKent Overstreet
Now that we've got code for raid5/6 stripe awareness, bcache just needs to know about the stripes and when writing partial stripes is expensive - we probably don't want to enable this optimization for raid1 or 10, even though they have stripes. So add a flag to queue_limits. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2014-01-03blk-mq: fix initializing request's start timeMing Lei
blk_rq_init() is called in req's complete handler to initialize the request, so the members of start_time and start_time_ns might become inaccurate when it is allocated in future. The patch initializes the two members in blk_mq_rq_ctx_init() to fix the problem. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-31block: blk-mq: don't export blk_mq_free_queue()Ming Lei
blk_mq_free_queue() is called from release handler of queue kobject, so it needn't be called from drivers. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-31block: blk-mq: make blk_sync_queue support mqMing Lei
This patch moves synchronization on mq->delay_work from blk_mq_free_queue() to blk_sync_queue(), so that blk_sync_queue can work on mq. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-31block: blk-mq: support draining mq queueMing Lei
blk_mq_drain_queue() is introduced so that we can drain mq queue inside blk_cleanup_queue(). Also don't accept new requests any more if queue is marked as dying. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-31Merge tag 'v3.13-rc6' into for-3.14/coreJens Axboe
Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c