summaryrefslogtreecommitdiff
path: root/net/ceph
AgeCommit message (Collapse)Author
2018-12-08libceph: check authorizer reply/challenge length before readingIlya Dryomov
commit 130f52f2b203aa0aec179341916ffb2e905f3afd upstream. Avoid scribbling over memory if the received reply/challenge is larger than the buffer supplied with the authorizer. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08libceph: weaken sizeof check in ceph_x_verify_authorizer_reply()Ilya Dryomov
commit f1d10e04637924f2b00a0fecdd2ca4565f5cfc3f upstream. Allow for extending ceph_x_authorize_reply in the future. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08libceph: implement CEPHX_V2 calculation modeIlya Dryomov
commit cc255c76c70f7a87d97939621eae04b600d9f4a1 upstream. Derive the signature from the entire buffer (both AES cipher blocks) instead of using just the first half of the first block, leaving out data_crc entirely. This addresses CVE-2018-1129. Link: http://tracker.ceph.com/issues/24837 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> [bwh: Backported to 4.9: - Define and test the feature bit in the old way - Don't change any other feature bits in ceph_features.h] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08libceph: add authorizer challengeIlya Dryomov
commit 6daca13d2e72bedaaacfc08f873114c9307d5aea upstream. When a client authenticates with a service, an authorizer is sent with a nonce to the service (ceph_x_authorize_[ab]) and the service responds with a mutation of that nonce (ceph_x_authorize_reply). This lets the client verify the service is who it says it is but it doesn't protect against a replay: someone can trivially capture the exchange and reuse the same authorizer to authenticate themselves. Allow the service to reject an initial authorizer with a random challenge (ceph_x_authorize_challenge). The client then has to respond with an updated authorizer proving they are able to decrypt the service's challenge and that the new authorizer was produced for this specific connection instance. The accepting side requires this challenge and response unconditionally if the client side advertises they have CEPHX_V2 feature bit. This addresses CVE-2018-1128. Link: http://tracker.ceph.com/issues/24836 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08libceph: factor out encrypt_authorizer()Ilya Dryomov
commit 149cac4a50b0b4081b38b2f38de6ef71c27eaa85 upstream. Will be used for encrypting both the initial and updated authorizers. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08libceph: factor out __ceph_x_decrypt()Ilya Dryomov
commit c571fe24d243bfe7017f0e67fe800b3cc2a1d1f7 upstream. Will be used for decrypting the server challenge which is only preceded by ceph_x_encrypt_header. Drop struct_v check to allow for extending ceph_x_encrypt_header in the future. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08libceph: factor out __prepare_write_connect()Ilya Dryomov
commit c0f56b483aa09c99bfe97409a43ad786f33b8a5a upstream. Will be used for sending ceph_msg_connect with an updated authorizer, after the server challenges the initial authorizer. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08libceph: store ceph_auth_handshake pointer in ceph_connectionIlya Dryomov
commit 262614c4294d33b1f19e0d18c0091d9c329b544a upstream. We already copy authorizer_reply_buf and authorizer_reply_buf_len into ceph_connection. Factoring out __prepare_write_connect() requires two more: authorizer_buf and authorizer_buf_len. Store the pointer to the handshake in con->auth rather than piling on. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08libceph: no need to drop con->mutex for ->get_authorizer()Ilya Dryomov
commit b3bbd3f2ab19c8ca319003b4b51ce4c4ca74da06 upstream. ->get_authorizer(), ->verify_authorizer_reply(), ->sign_message() and ->check_message_signature() shouldn't be doing anything with or on the connection (like closing it or sending messages). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08libceph: drop len argument of *verify_authorizer_reply()Ilya Dryomov
commit 0dde584882ade13dc9708d611fbf69b0ae8a9e48 upstream. The length of the reply is protocol-dependent - for cephx it's ceph_x_authorize_reply. Nothing sensible can be passed from the messenger layer anyway. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-27libceph: fall back to sendmsg for slab pagesIlya Dryomov
commit 7e241f647dc7087a0401418a187f3f5b527cc690 upstream. skb_can_coalesce() allows coalescing neighboring slab objects into a single frag: return page == skb_frag_page(frag) && off == frag->page_offset + skb_frag_size(frag); ceph_tcp_sendpage() can be handed slab pages. One example of this is XFS: it passes down sector sized slab objects for its metadata I/O. If the kernel client is co-located on the OSD node, the skb may go through loopback and pop on the receive side with the exact same set of frags. When tcp_recvmsg() attempts to copy out such a frag, hardened usercopy complains because the size exceeds the object's allocated size: usercopy: kernel memory exposure attempt detected from ffff9ba917f20a00 (kmalloc-512) (1024 bytes) Although skb_can_coalesce() could be taught to return false if the resulting frag would cross a slab object boundary, we already have a fallback for non-refcounted pages. Utilize it for slab pages too. Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-01libceph: validate con->state at the top of try_write()Ilya Dryomov
commit 9c55ad1c214d9f8c4594ac2c3fa392c1c32431a7 upstream. ceph_con_workfn() validates con->state before calling try_read() and then try_write(). However, try_read() temporarily releases con->mutex, notably in process_message() and ceph_con_in_msg_alloc(), opening the window for ceph_con_close() to sneak in, close the connection and release con->sock. When try_write() is called on the assumption that con->state is still valid (i.e. not STANDBY or CLOSED), a NULL sock gets passed to the networking stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: selinux_socket_sendmsg+0x5/0x20 Make sure con->state is valid at the top of try_write() and add an explicit BUG_ON for this, similar to try_read(). Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/23706 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-01libceph: reschedule a tick in finish_hunting()Ilya Dryomov
commit 7b4c443d139f1d2b5570da475f7a9cbcef86740c upstream. If we go without an established session for a while, backoff delay will climb to 30 seconds. The keepalive timeout is also 30 seconds, so it's pretty easily hit after a prolonged hunting for a monitor: we don't get a chance to send out a keepalive in time, which means we never get back a keepalive ack in time, cutting an established session and attempting to connect to a different monitor every 30 seconds: [Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established [Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon [Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established [Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon [Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established [Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon [Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established [Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon The regular keepalive interval is 10 seconds. After ->hunting is cleared in finish_hunting(), call __schedule_delayed() to ensure we send out a keepalive after 10 seconds. Cc: stable@vger.kernel.org # 4.7+ Link: http://tracker.ceph.com/issues/23537 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-01libceph: un-backoff on tick when we have a authenticated sessionIlya Dryomov
commit facb9f6eba3df4e8027301cc0e514dc582a1b366 upstream. This means that if we do some backoff, then authenticate, and are healthy for an extended period of time, a subsequent failure won't leave us starting our hunting sequence with a large backoff. Mirrors ceph.git commit d466bc6e66abba9b464b0b69687cf45c9dccf383. Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13libceph: NULL deref on crush_decode() error pathDan Carpenter
[ Upstream commit 293dffaad8d500e1a5336eeb90d544cf40d4fbd8 ] If there is not enough space then ceph_decode_32_safe() does a goto bad. We need to return an error code in that situation. The current code returns ERR_PTR(0) which is NULL. The callers are not expecting that and it results in a NULL dereference. Fixes: f24e9980eb86 ("ceph: OSD client") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30libceph: don't WARN() if user tries to add invalid keyEric Biggers
commit b11270853fa3654f08d4a6a03b23ddb220512d8d upstream. The WARN_ON(!key->len) in set_secret() in net/ceph/crypto.c is hit if a user tries to add a key of type "ceph" with an invalid payload as follows (assuming CONFIG_CEPH_LIB=y): echo -e -n '\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' \ | keyctl padd ceph desc @s This can be hit by fuzzers. As this is merely bad input and not a kernel bug, replace the WARN_ON() with return -EINVAL. Fixes: 7af3ea189a9a ("libceph: stop allocating a new cipher on every crypto request") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08libceph: force GFP_NOIO for socket allocationsIlya Dryomov
commit 633ee407b9d15a75ac9740ba9d3338815e1fcb95 upstream. sock_alloc_inode() allocates socket+inode and socket_wq with GFP_KERNEL, which is not allowed on the writeback path: Workqueue: ceph-msgr con_work [libceph] ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000 0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00 ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148 Call Trace: [<ffffffff816dd629>] schedule+0x29/0x70 [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200 [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120 [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70 [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180 [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390 [<ffffffff81086335>] flush_work+0x165/0x250 [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0 [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs] [<ffffffff816d6b42>] ? __slab_free+0xee/0x234 [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs] [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30 [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs] [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs] [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs] [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs] [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40 [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs] [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs] [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs] [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs] [<ffffffff811c0c18>] super_cache_scan+0x178/0x180 [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340 [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450 [<ffffffff8115af70>] shrink_slab+0x100/0x140 [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490 [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0 [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40 [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110 [<ffffffff811a0ac5>] new_slab+0x2c5/0x390 [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459 [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0 [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0 [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0 [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0 [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0 [<ffffffff811d8566>] alloc_inode+0x26/0xa0 [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70 [<ffffffff815b933e>] sock_alloc+0x1e/0x80 [<ffffffff815ba855>] __sock_create+0x95/0x220 [<ffffffff815baa04>] sock_create_kern+0x24/0x30 [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph] [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd] [<ffffffff81084c19>] process_one_work+0x159/0x4f0 [<ffffffff8108561b>] worker_thread+0x11b/0x530 [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0 [<ffffffff8108b6f9>] kthread+0xc9/0xe0 [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90 [<ffffffff816e1b98>] ret_from_fork+0x58/0x90 [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90 Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here. Link: http://tracker.ceph.com/issues/19309 Reported-by: Sergey Jerusalimov <wintchester@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30libceph: don't set weight to IN when OSD is destroyedIlya Dryomov
commit b581a5854eee4b7851dedb0f8c2ceb54fb902c06 upstream. Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info, osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted. This changes the result of applying an incremental for clients, not just OSDs. Because CRUSH computations are obviously affected, pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on object placement, resulting in misdirected requests. Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f. Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals") Link: http://tracker.ceph.com/issues/19122 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12ceph: update readpages osd request according to size of pagesYan, Zheng
commit d641df819db8b80198fd85d9de91137e8a823b07 upstream. add_to_page_cache_lru() can fails, so the actual pages to read can be smaller than the initial size of osd request. We need to update osd request size in that case. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: stop allocating a new cipher on every crypto requestIlya Dryomov
commit 7af3ea189a9a13f090de51c97f676215dabc1205 upstream. This is useless and more importantly not allowed on the writeback path, because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which can recurse back into the filesystem: kworker/9:3 D ffff92303f318180 0 20732 2 0x00000080 Workqueue: ceph-msgr ceph_con_workfn [libceph] ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318 ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480 00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8 Call Trace: [<ffffffff951eb4e1>] ? schedule+0x31/0x80 [<ffffffff951eb77a>] ? schedule_preempt_disabled+0xa/0x10 [<ffffffff951ed1f4>] ? __mutex_lock_slowpath+0xb4/0x130 [<ffffffff951ed28b>] ? mutex_lock+0x1b/0x30 [<ffffffffc0a974b3>] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs] [<ffffffff94d92ba5>] ? move_active_pages_to_lru+0x125/0x270 [<ffffffff94f2b985>] ? radix_tree_gang_lookup_tag+0xc5/0x1c0 [<ffffffff94dad0f3>] ? __list_lru_walk_one.isra.3+0x33/0x120 [<ffffffffc0a98331>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs] [<ffffffff94e05bfe>] ? super_cache_scan+0x17e/0x190 [<ffffffff94d919f3>] ? shrink_slab.part.38+0x1e3/0x3d0 [<ffffffff94d9616a>] ? shrink_node+0x10a/0x320 [<ffffffff94d96474>] ? do_try_to_free_pages+0xf4/0x350 [<ffffffff94d967ba>] ? try_to_free_pages+0xea/0x1b0 [<ffffffff94d863bd>] ? __alloc_pages_nodemask+0x61d/0xe60 [<ffffffff94ddf42d>] ? cache_grow_begin+0x9d/0x560 [<ffffffff94ddfb88>] ? fallback_alloc+0x148/0x1c0 [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130 [<ffffffff94de09db>] ? __kmalloc+0x1eb/0x580 [<ffffffffc09fe2db>] ? crush_choose_firstn+0x3eb/0x470 [libceph] [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130 [<ffffffff94ed9c19>] ? crypto_spawn_tfm+0x39/0x60 [<ffffffffc08b30a3>] ? crypto_cbc_init_tfm+0x23/0x40 [cbc] [<ffffffff94ed857c>] ? __crypto_alloc_tfm+0xcc/0x130 [<ffffffff94edcc23>] ? crypto_skcipher_init_tfm+0x113/0x180 [<ffffffff94ed7cc3>] ? crypto_create_tfm+0x43/0xb0 [<ffffffff94ed83b0>] ? crypto_larval_lookup+0x150/0x150 [<ffffffff94ed7da2>] ? crypto_alloc_tfm+0x72/0x120 [<ffffffffc0a01dd7>] ? ceph_aes_encrypt2+0x67/0x400 [libceph] [<ffffffffc09fd264>] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph] [<ffffffff950d40a0>] ? release_sock+0x40/0x90 [<ffffffff95139f94>] ? tcp_recvmsg+0x4b4/0xae0 [<ffffffffc0a02714>] ? ceph_encrypt2+0x54/0xc0 [libceph] [<ffffffffc0a02b4d>] ? ceph_x_encrypt+0x5d/0x90 [libceph] [<ffffffffc0a02bdf>] ? calcu_signature+0x5f/0x90 [libceph] [<ffffffffc0a02ef5>] ? ceph_x_sign_message+0x35/0x50 [libceph] [<ffffffffc09e948c>] ? prepare_write_message_footer+0x5c/0xa0 [libceph] [<ffffffffc09ecd18>] ? ceph_con_workfn+0x2258/0x2dd0 [libceph] [<ffffffffc09e9903>] ? queue_con_delay+0x33/0xd0 [libceph] [<ffffffffc09f68ed>] ? __submit_request+0x20d/0x2f0 [libceph] [<ffffffffc09f6ef8>] ? ceph_osdc_start_request+0x28/0x30 [libceph] [<ffffffffc0b52603>] ? rbd_queue_workfn+0x2f3/0x350 [rbd] [<ffffffff94c94ec0>] ? process_one_work+0x160/0x410 [<ffffffff94c951bd>] ? worker_thread+0x4d/0x480 [<ffffffff94c95170>] ? process_one_work+0x410/0x410 [<ffffffff94c9af8d>] ? kthread+0xcd/0xf0 [<ffffffff951efb2f>] ? ret_from_fork+0x1f/0x40 [<ffffffff94c9aec0>] ? kthread_create_on_node+0x190/0x190 Allocating the cipher along with the key fixes the issue - as long the key doesn't change, a single cipher context can be used concurrently in multiple requests. We still can't take that GFP_KERNEL allocation though. Both ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here. Reported-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: uninline ceph_crypto_key_destroy()Ilya Dryomov
commit 6db2304aabb070261ad34923bfd83c43dfb000e3 upstream. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: make sure ceph_aes_crypt() IV is alignedIlya Dryomov
commit 124f930b8cbc4ac11236e6eb1c5f008318864588 upstream. ... otherwise the crypto stack will align it for us with a GFP_ATOMIC allocation and a memcpy() -- see skcipher_walk_first(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: remove now unused ceph_*{en,de}crypt*() functionsIlya Dryomov
commit 2b1e1a7cd0a615d57455567a549f9965023321b5 upstream. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: switch ceph_x_decrypt() to ceph_crypt()Ilya Dryomov
commit e15fd0a11db00fc7f470a9fc804657ec3f6d04a5 upstream. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: switch ceph_x_encrypt() to ceph_crypt()Ilya Dryomov
commit d03857c63bb036edff0aa7a107276360173aca4e upstream. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: tweak calcu_signature() a littleIlya Dryomov
commit 4eb4517ce7c9c573b6c823de403aeccb40018cfc upstream. - replace an ad-hoc array with a struct - rename to calc_signature() for consistency Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: rename and align ceph_x_authorizer::reply_bufIlya Dryomov
commit 7882a26d2e2e520099e2961d5e2e870f8e4172dc upstream. It's going to be used as a temporary buffer for in-place en/decryption with ceph_crypt() instead of on-stack buffers, so rename to enc_buf. Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: introduce ceph_crypt() for in-place en/decryptionIlya Dryomov
commit a45f795c65b479b4ba107b6ccde29b896d51ee98 upstream. Starting with 4.9, kernel stacks may be vmalloced and therefore not guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK option is enabled by default on x86. This makes it invalid to use on-stack buffers with the crypto scatterlist API, as sg_set_buf() expects a logical address and won't work with vmalloced addresses. There isn't a different (e.g. kvec-based) crypto API we could switch net/ceph/crypto.c to and the current scatterlist.h API isn't getting updated to accommodate this use case. Allocating a new header and padding for each operation is a non-starter, so do the en/decryption in-place on a single pre-assembled (header + data + padding) heap buffer. This is explicitly supported by the crypto API: "... the caller may provide the same scatter/gather list for the plaintext and cipher text. After the completion of the cipher operation, the plaintext data is replaced with the ciphertext data in case of an encryption and vice versa for a decryption." Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: introduce ceph_x_encrypt_offset()Ilya Dryomov
commit 55d9cc834f933698fc864f0d36f3cca533d30a8d upstream. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: old_key in process_one_ticket() is redundantIlya Dryomov
commit 462e650451c577d15eeb4d883d70fa9e4e529fad upstream. Since commit 0a990e709356 ("ceph: clean up service ticket decoding"), th->session_key isn't assigned until everything is decoded. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: ceph_x_encrypt_buflen() takes in_lenIlya Dryomov
commit 36721ece1e84a25130c4befb930509b3f96de020 upstream. Pass what's going to be encrypted - that's msg_b, not ticket_blob. ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change the maxlen calculation, but makes it a bit clearer. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09libceph: verify authorize reply on connectIlya Dryomov
commit 5c056fdc5b474329037f2aa18401bd73033e0ce0 upstream. After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b), the client gets back a ceph_x_authorize_reply, which it is supposed to verify to ensure the authenticity and protect against replay attacks. The code for doing this is there (ceph_x_verify_authorizer_reply(), ceph_auth_verify_authorizer_reply() + plumbing), but it is never invoked by the the messenger. AFAICT this goes back to 2009, when ceph authentication protocols support was added to the kernel client in 4e7a5dcd1bba ("ceph: negotiate authentication protocol; implement AUTH_NONE protocol"). The second param of ceph_connection_operations::verify_authorizer_reply is unused all the way down. Pass 0 to facilitate backporting, and kill it in the next commit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-10libceph: initialize last_linger_id with a large integerIlya Dryomov
osdc->last_linger_id is a counter for lreq->linger_id, which is used for watch cookies. Starting with a large integer should ease the task of telling apart kernel and userspace clients. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-11-10libceph: fix legacy layout decode with pool 0Yan, Zheng
If your data pool was pool 0, ceph_file_layout_from_legacy() transform that to -1 unconditionally, which broke upgrades. We only want do that for a fully zeroed ceph_file_layout, so that it still maps to a file_layout_t. If any fields are set, though, we trust the fl_pgpool to be a valid pool. Fixes: 7627151ea30bc ("libceph: define new ceph_file_layout structure") Link: http://tracker.ceph.com/issues/17825 Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-18mm: replace get_user_pages_unlocked() write/force parameters with gup_flagsLorenzo Stoakes
This removes the 'write' and 'force' use from get_user_pages_unlocked() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-05crush: remove redundant local variableIlya Dryomov
Remove extra x1 variable, it's just temporary placeholder that clutters the code unnecessarily. Reflects ceph.git commit 0d19408d91dd747340d70287b4ef9efd89e95c6b. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-05crush: don't normalize input of crush_ln iterativelyIlya Dryomov
Use __builtin_clz() supported by GCC and Clang to figure out how many bits we should shift instead of shifting by a bit in a loop until the value gets normalized. Improves performance of this function by up to 3x in worst-case scenario and overall straw2 performance by ~10%. Reflects ceph.git commit 110de33ca497d94fc4737e5154d3fe781fa84a0a. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03libceph: ceph_build_auth() doesn't need ceph_auth_build_hello()Ilya Dryomov
A static bug finder (EBA) on Linux 4.7: Double lock in net/ceph/auth.c second lock at 108: mutex_lock(& ac->mutex); [ceph_auth_build_hello] after calling from 263: ret = ceph_auth_build_hello(ac, msg_buf, msg_len); if ! ac->protocol -> true at 262 first lock at 261: mutex_lock(& ac->mutex); [ceph_build_auth] ceph_auth_build_hello() is never called, because the protocol is always initialized, whether we are checking existing tickets (in delayed_work()) or getting new ones after invalidation (in invalidate_authorizer()). Reported-by: Iago Abal <iari@itu.dk> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-08-24rbd: add 'client_addr' sysfs rbd device attributeIlya Dryomov
Export client addr/nonce, so userspace can check if a image is being blacklisted. Signed-off-by: Mike Christie <mchristi@redhat.com> [idryomov@gmail.com: ceph_client_addr(), endianess fix] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-08-24rbd: support for exclusive-lock featureIlya Dryomov
Add basic support for RBD_FEATURE_EXCLUSIVE_LOCK feature. Maintenance operations (resize, snapshot create, etc) are offloaded to librbd via returning -EOPNOTSUPP - librbd should request the lock and execute the operation. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Tested-by: Mike Christie <mchristi@redhat.com>
2016-08-24rbd: retry watch re-registration periodicallyIlya Dryomov
Revamp watch code to support retrying watch re-registration: - add rbd_dev->watch_state for more robust errcb handling - store watch cookie separately to avoid dereferencing watch_handle which is set to NULL on unwatch - move re-register code into a delayed work and retry re-registration every second, unless the client is blacklisted Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Tested-by: Mike Christie <mchristi@redhat.com>
2016-08-24libceph: rename ceph_client_id() -> ceph_client_gid()Ilya Dryomov
It's gid / global_id in other places. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-24libceph: support for blacklisting clientsDouglas Fuller
Reuse ceph_mon_generic_request infrastructure for sending monitor commands. In particular, add support for 'blacklist add' to prevent other, non-responsive clients from making further updates. Signed-off-by: Douglas Fuller <dfuller@redhat.com> [idryomov@gmail.com: refactor, misc fixes throughout] Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-24libceph: support for lock.lock_infoDouglas Fuller
Add an interface for the Ceph OSD lock.lock_info method and associated data structures. Based heavily on code by Mike Christie <michaelc@cs.wisc.edu>. Signed-off-by: Douglas Fuller <dfuller@redhat.com> [idryomov@gmail.com: refactor, misc fixes throughout] Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-24libceph: support for advisory locking on RADOS objectsDouglas Fuller
This patch adds support for rados lock, unlock and break lock. Based heavily on code by Mike Christie <michaelc@cs.wisc.edu>. Signed-off-by: Douglas Fuller <dfuller@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-24libceph: add ceph_osdc_call() single-page helperDouglas Fuller
Add a convenience function to osd_client to send Ceph OSD 'class' ops. The interface assumes that the request and reply data each consist of single pages. Signed-off-by: Douglas Fuller <dfuller@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-24libceph: support for CEPH_OSD_OP_LIST_WATCHERSDouglas Fuller
Add support for this Ceph OSD op, needed to support the RBD exclusive lock feature. Signed-off-by: Douglas Fuller <dfuller@redhat.com> [idryomov@gmail.com: refactor, misc fixes throughout] Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-24libceph: rename ceph_entity_name_encode() -> ceph_auth_entity_name_encode()Ilya Dryomov
Clear up EntityName vs entity_name_t confusion. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-08libceph: using kfree_rcu() to simplify the codeWei Yongjun
The callback function of call_rcu() just calls a kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>