summaryrefslogtreecommitdiff
path: root/fs/ceph
AgeCommit message (Collapse)Author
2012-11-05ceph: avoid 32-bit page index overflowAlex Elder
commit 6285bc231277419255f3498d3eb5ddc9f8e7fe79 upstream. A pgoff_t is defined (by default) to have type (unsigned long). On architectures such as i686 that's a 32-bit type. The ceph address space code was attempting to produce 64 bit offsets by shifting a page's index by PAGE_CACHE_SHIFT, but the result was not what was desired because the shift occurred before the result got promoted to 64 bits. Fix this by converting all uses of page->index used in this way to use the page_offset() macro, which ensures the 64-bit result has the intended value. This fixes http://tracker.newdream.net/issues/3112 Reported-by: Mohamed Pakkeer <pakkeer.mohideen@realimage.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-05ceph: Fix oops when handling mdsmap that decreases max_mdsYan, Zheng
commit 3e8f43a089f06279c5f76a9ccd42578eebf7bfa5 upstream. When i >= newmap->m_max_mds, ceph_mdsmap_get_addr(newmap, i) return NULL. Passing NULL to memcmp() triggers oops. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-05ceph: fix dentry reference leak in encode_fh()David Zafman
commit 52eb5a900a9863a8b77a895f770e5d825c8e02c6 upstream. Call to d_find_alias() needs a corresponding dput() This fixes http://tracker.newdream.net/issues/3271 Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-21tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checkingHugh Dickins
commit 35c2a7f4908d404c9124c2efc6ada4640ca4d5d5 upstream. Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(), u64 inum = fid->raw[2]; which is unhelpfully reported as at the end of shmem_alloc_inode(): BUG: unable to handle kernel paging request at ffff880061cd3000 IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Call Trace: [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0 [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0 [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10 [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6 Right, tmpfs is being stupid to access fid->raw[2] before validating that fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may fall at the end of a page, and the next page not be present. But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and could oops in the same way: add the missing fh_len checks to those. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Sage Weil <sage@inktank.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-21ceph: avoid divide by zero in __validate_layout()Sage Weil
If "l->stripe_unit" is zero the the mod on the next line will cause a divide by zero bug. This comes from the copy_from_user() in ceph_ioctl_set_layout_policy(). Passing 0 is valid, though (it means "do not change") so avoid the % check in that case. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-21ceph: tolerate (and warn on) extraneous dentry from mdsSage Weil
If the MDS gives us a dentry and we weren't prepared to handle it, WARN_ON_ONCE instead of crashing. Reported-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-20libceph: delay debugfs initialization until we learn global_idSage Weil
The debugfs directory includes the cluster fsid and our unique global_id. We need to delay the initialization of the debug entry until we have learned both the fsid and our global_id from the monitor or else the second client can't create its debugfs entry and will fail (and multiple client instances aren't properly reflected in debugfs). Reported by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-02ceph: simplify+fix atomic_openSage Weil
The initial ->atomic_open op was carried over from the old intent code, which was incomplete and didn't really work. Replace it with a fresh method. In particular: * always attempt to do an atomic open+lookup, both for the create case and for lookups of existing files. * fix symlink handling by returning 1 to the VFS so that we can follow the link to its destination. This fixes a longstanding ceph bug (#2392). Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is *not* dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...
2012-07-31Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "Lots of stuff this time around: - lots of cleanup and refactoring in the libceph messenger code, and many hard to hit races and bugs closed as a result. - lots of cleanup and refactoring in the rbd code from Alex Elder, mostly in preparation for the layering functionality that will be coming in 3.7. - some misc rbd cleanups from Josh Durgin that are finally going upstream - support for CRUSH tunables (used by newer clusters to improve the data placement) - some cleanup in our use of d_parent that Al brought up a while back - a random collection of fixes across the tree There is another patch coming that fixes up our ->atomic_open() behavior, but I'm going to hammer on it a bit more before sending it." Fix up conflicts due to commits that were already committed earlier in drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits) rbd: create rbd_refresh_helper() rbd: return obj version in __rbd_refresh_header() rbd: fixes in rbd_header_from_disk() rbd: always pass ops array to rbd_req_sync_op() rbd: pass null version pointer in add_snap() rbd: make rbd_create_rw_ops() return a pointer rbd: have __rbd_add_snap_dev() return a pointer libceph: recheck con state after allocating incoming message libceph: change ceph_con_in_msg_alloc convention to be less weird libceph: avoid dropping con mutex before fault libceph: verify state after retaking con lock after dispatch libceph: revoke mon_client messages on session restart libceph: fix handling of immediate socket connect failure ceph: update MAINTAINERS file libceph: be less chatty about stray replies libceph: clear all flags on con_close libceph: clean up con flags libceph: replace connection state bits with states libceph: drop unnecessary CLOSED check in socket state change callback libceph: close socket directly from ceph_con_close() ...
2012-07-30ceph: define snap counts as u32 everywhereAlex Elder
There are two structures in which a count of snapshots are maintained: struct ceph_snap_context { ... u32 num_snaps; ... } and struct ceph_snap_realm { ... u32 num_prior_parent_snaps; /* had prior to parent_since */ ... u32 num_snaps; ... } These fields never take on negative values (e.g., to hold special meaning), and so are really inherently unsigned. Furthermore they take their value from over-the-wire or on-disk formatted 32-bit values. So change their definition to have type u32, and change some spots elsewhere in the code to account for this change. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-30ceph: fix potential double freeAlan Cox
We re-run the loop but we don't re-set the attrs pointer back to NULL. Signed-off-by: Alan Cox <alan@linux.intel.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30ceph: close old con before reopening on mds reconnectSage Weil
When we detect a mds session reset, close the old ceph_connection before reopening it. This ensures we clean up the old socket properly and keep the ceph_connection state correct. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-30libceph: move feature bits to separate headerSage Weil
This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-31ceph: Push file_update_time() into ceph_page_mkwrite()Jan Kara
CC: Sage Weil <sage@newdream.net> CC: ceph-devel@vger.kernel.org Acked-by: Sage Weil <sage@newdream.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-30ceph: clean up useless d_parent checksSage Weil
d_parent is never NULL, and IS_ROOT() is the proper way to check for a (non-self-referential) parent. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-14VFS: Pass mount flags to sget()David Howells
Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14don't pass nameidata to ->create()Al Viro
boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14stop passing nameidata to ->lookup()Al Viro
Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14stop passing nameidata * to ->d_revalidate()Al Viro
Just the lookup flags. Die, bastard, die... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14make finish_no_open() return intAl Viro
namely, 1 ;-) That's what we want to return from ->atomic_open() instances after finish_no_open(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14kill struct opendataAl Viro
Just pass struct file *. Methods are happier that way... There's no need to return struct file * from finish_open() now, so let it return int. Next: saner prototypes for parts in namei.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14make ->atomic_open() return intAl Viro
Change of calling conventions: old new NULL 1 file 0 ERR_PTR(-ve) -ve Caller *knows* that struct file *; no need to return it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14->atomic_open() prototype change - pass int * instead of bool *Al Viro
... and let finish_open() report having opened the file via that sucker. Next step: don't modify od->filp at all. [AV: FILE_CREATE was already used by cifs; Miklos' fix folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14ceph: implement i_op->atomic_open()Miklos Szeredi
Add an ->atomic_open implementation which replaces the atomic lookup+open+create operation implemented via ->lookup and ->create operations. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14ceph: remove unused arg from ceph_lookup_open()Miklos Szeredi
What was the purpose of this? Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-05libceph: set peer name on con_open, not initSage Weil
The peer name may change on each open attempt, even when the connection is reused. Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-20ceph: check PG_Private flag before accessing page->privateYan, Zheng
I got lots of NULL pointer dereference Oops when compiling kernel on ceph. The bug is because the kernel page migration routine replaces some pages in the page cache with new pages, these new pages' private can be non-zero. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 28c0254ede13ab575d2df5c6585ed3d4817c3e6b)
2012-06-15Merge tag 'v3.5-rc1'Sage Weil
Linux 3.5-rc1 Conflicts: net/ceph/messenger.c
2012-06-06libceph: fully initialize connection in con_init()Alex Elder
Move the initialization of a ceph connection's private pointer, operations vector pointer, and peer name information into ceph_con_init(). Rearrange the arguments so the connection pointer is first. Hide the byte-swapping of the peer entity number inside ceph_con_init() Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-06-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs changes from Al Viro. "A lot of misc stuff. The obvious groups: * Miklos' atomic_open series; kills the damn abuse of ->d_revalidate() by NFS, which was the major stumbling block for all work in that area. * ripping security_file_mmap() and dealing with deadlocks in the area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in general. * ->encode_fh() switched to saner API; insane fake dentry in mm/cleancache.c gone. * assorted annotations in fs (endianness, __user) * parts of Artem's ->s_dirty work (jff2 and reiserfs parts) * ->update_time() work from Josef. * other bits and pieces all over the place. Normally it would've been in two or three pull requests, but signal.git stuff had eaten a lot of time during this cycle ;-/" Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the 'truncate_range' inode method was removed by the VM changes, the VFS update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due to sparse fix added twice, with other changes nearby). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits) nfs: don't open in ->d_revalidate vfs: retry last component if opening stale dentry vfs: nameidata_to_filp(): don't throw away file on error vfs: nameidata_to_filp(): inline __dentry_open() vfs: do_dentry_open(): don't put filp vfs: split __dentry_open() vfs: do_last() common post lookup vfs: do_last(): add audit_inode before open vfs: do_last(): only return EISDIR for O_CREAT vfs: do_last(): check LOOKUP_DIRECTORY vfs: do_last(): make ENOENT exit RCU safe vfs: make follow_link check RCU safe vfs: do_last(): use inode variable vfs: do_last(): inline walk_component() vfs: do_last(): make exit RCU safe vfs: split do_lookup() Btrfs: move over to use ->update_time fs: introduce inode operation ->update_time reiserfs: get rid of resierfs_sync_super reiserfs: mark the superblock as dirty a bit later ...
2012-06-01libceph: embed ceph messenger structure in ceph_clientAlex Elder
A ceph client has a pointer to a ceph messenger structure in it. There is always exactly one ceph messenger for a ceph client, so there is no need to allocate it separate from the ceph client structure. Switch the ceph_client structure to embed its ceph_messenger structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-06-01ceph: check PG_Private flag before accessing page->privateYan, Zheng
I got lots of NULL pointer dereference Oops when compiling kernel on ceph. The bug is because the kernel page migration routine replaces some pages in the page cache with new pages, these new pages' private can be non-zero. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-31Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge misc patches from Andrew Morton: - the "misc" tree - stuff from all over the map - checkpatch updates - fatfs - kmod changes - procfs - cpumask - UML - kexec - mqueue - rapidio - pidns - some checkpoint-restore feature work. Reluctantly. Most of it delayed a release. I'm still rather worried that we don't have a clear roadmap to completion for this work. * emailed from Andrew Morton <akpm@linux-foundation.org>: (78 patches) kconfig: update compression algorithm info c/r: prctl: add ability to set new mm_struct::exe_file c/r: prctl: extend PR_SET_MM to set up more mm_struct entries c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat syscalls, x86: add __NR_kcmp syscall fs, proc: introduce /proc/<pid>/task/<tid>/children entry sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() eventfd: change int to __u64 in eventfd_signal() fs/nls: add Apple NLS pidns: make killed children autoreap pidns: use task_active_pid_ns in do_notify_parent rapidio/tsi721: add DMA engine support rapidio: add DMA engine support for RIO data transfers ipc/mqueue: add rbtree node caching support tools/selftests: add mq_perf_tests ipc/mqueue: strengthen checks on mqueue creation ipc/mqueue: correct mq_attr_ok test ipc/mqueue: improve performance of send/recv selftests: add mq_open_tests ...
2012-05-31introduce SIZE_MAXXi Wang
ULONG_MAX is often used to check for integer overflow when calculating allocation size. While ULONG_MAX happens to work on most systems, there is no guarantee that `size_t' must be the same size as `long'. This patch introduces SIZE_MAX, the maximum value of `size_t', to improve portability and readability for allocation size validation. Signed-off-by: Xi Wang <xi.wang@gmail.com> Acked-by: Alex Elder <elder@dreamhost.com> Cc: David Airlie <airlied@linux.ie> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds
Pull ceph updates from Sage Weil: "There are some updates and cleanups to the CRUSH placement code, a bug fix with incremental maps, several cleanups and fixes from Josh Durgin in the RBD block device code, a series of cleanups and bug fixes from Alex Elder in the messenger code, and some miscellaneous bounds checking and gfp cleanups/fixes." Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the networking people preferring "unsigned int" over just "unsigned". * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits) libceph: fix pg_temp updates libceph: avoid unregistering osd request when not registered ceph: add auth buf in prepare_write_connect() ceph: rename prepare_connect_authorizer() ceph: return pointer from prepare_connect_authorizer() ceph: use info returned by get_authorizer ceph: have get_authorizer methods return pointers ceph: ensure auth ops are defined before use ceph: messenger: reduce args to create_authorizer ceph: define ceph_auth_handshake type ceph: messenger: check return from get_authorizer ceph: messenger: rework prepare_connect_authorizer() ceph: messenger: check prepare_write_connect() result ceph: don't set WRITE_PENDING too early ceph: drop msgr argument from prepare_write_connect() ceph: messenger: send banner in process_connect() ceph: messenger: reset connection kvec caller libceph: don't reset kvec in prepare_write_banner() ceph: ignore preferred_osd field ceph: fully initialize new layout ...
2012-05-29ceph: move encode_fh to new APISage Weil
Use parent_inode has a flag for whether nfsd wants a connectable fh, but generate one opportunistically so that we can take advantage of the additional info in there. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29->encode_fh() API changeAl Viro
pass inode + parent's inode or NULL instead of dentry + bool saying whether we want the parent or not. NOTE: that needs ceph fix folded in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-17ceph: use info returned by get_authorizerAlex Elder
Rather than passing a bunch of arguments to be filled in with the content of the ceph_auth_handshake buffer now returned by the get_authorizer method, just use the returned information in the caller, and drop the unnecessary arguments. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-05-17ceph: have get_authorizer methods return pointersAlex Elder
Have the get_authorizer auth_client method return a ceph_auth pointer rather than an integer, pointer-encoding any returned error value. This is to pave the way for making use of the returned value in an upcoming patch. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-05-17ceph: ensure auth ops are defined before useAlex Elder
In the create_authorizer method for both the mds and osd clients, the auth_client->ops pointer is blindly dereferenced. There is no obvious guarantee that this pointer has been assigned. And furthermore, even if the ops pointer is non-null there is definitely no guarantee that the create_authorizer or destroy_authorizer methods are defined. Add checks in both routines to make sure they are defined (non-null) before use. Add similar checks in a few other spots in these files while we're at it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-05-17ceph: messenger: reduce args to create_authorizerAlex Elder
Make use of the new ceph_auth_handshake structure in order to reduce the number of arguments passed to the create_authorizor method in ceph_auth_client_ops. Use a local variable of that type as a shorthand in the get_authorizer method definitions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-05-17ceph: define ceph_auth_handshake typeAlex Elder
The definitions for the ceph_mds_session and ceph_osd both contain five fields related only to "authorizers." Encapsulate those fields into their own struct type, allowing for better isolation in some upcoming patches. Fix the #includes in "linux/ceph/osd_client.h" to lay out their more complete canonical path. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-05-16ceph: ignore preferred_osd fieldSage Weil
Old users may not expect EINVAL, and there is no clear user-visibile behavior change now that we ignore it. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-05-16ceph: fully initialize new layoutSage Weil
When we are setting a new layout, fully initialize the structure: - zero it out - always set preferred_osd to -1 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-05-07ceph: refactor SETLAYOUT and SETDIRLAYOUT ioctl checks into common helperSage Weil
Both of these methods perform similar checks; move that code to a helper so that we can ensure the checks are consistent. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-07ceph: drop support for preferred_osd pgsSage Weil
This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
2012-03-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates for 3.4-rc1 from Sage Weil: "Alex has been busy. There are a range of rbd and libceph cleanups, especially surrounding device setup and teardown, and a few critical fixes in that code. There are more cleanups in the messenger code, virtual xattrs, a fix for CRC calculation/checks, and lots of other miscellaneous stuff. There's a patch from Amon Ott to make inos behave a bit better on 32-bit boxes, some decode check fixes from Xi Wang, and network throttling fix from Jim Schutt, and a couple RBD fixes from Josh Durgin. No new functionality, just a lot of cleanup and bug fixing." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (65 commits) rbd: move snap_rwsem to the device, rename to header_rwsem ceph: fix three bugs, two in ceph_vxattrcb_file_layout() libceph: isolate kmap() call in write_partial_msg_pages() libceph: rename "page_shift" variable to something sensible libceph: get rid of zero_page_address libceph: only call kernel_sendpage() via helper libceph: use kernel_sendpage() for sending zeroes libceph: fix inverted crc option logic libceph: some simple changes libceph: small refactor in write_partial_kvec() libceph: do crc calculations outside loop libceph: separate CRC calculation from byte swapping libceph: use "do" in CRC-related Boolean variables ceph: ensure Boolean options support both senses libceph: a few small changes libceph: make ceph_tcp_connect() return int libceph: encapsulate some messenger cleanup code libceph: make ceph_msgr_wq private libceph: encapsulate connection kvec operations libceph: move prepare_write_banner() ...
2012-03-22ceph: fix three bugs, two in ceph_vxattrcb_file_layout()Alex Elder
In ceph_vxattrcb_file_layout(), there is a check to determine whether a preferred PG should be formatted into the output buffer. That check assumes that a preferred PG number of 0 indicates "no preference," but that is wrong. No preference is indicated by a negative (specifically, -1) PG number. In addition, if that condition yields true, the preferred value is formatted into a sized buffer, but the size consumed by the earlier snprintf() call is not accounted for, opening up the possibilty of a buffer overrun. Finally, in ceph_vxattrcb_dir_rctime() where the nanoseconds part of the time displayed did not include leading 0's, which led to erroneous (sub-second portion of) time values being shown. This fixes these three issues: http://tracker.newdream.net/issues/2155 http://tracker.newdream.net/issues/2156 http://tracker.newdream.net/issues/2157 Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22ceph: ensure Boolean options support both sensesAlex Elder
Many ceph-related Boolean options offer the ability to both enable and disable a feature. For all those that don't offer this, add a new option so that they do. Note that ceph_show_options()--which reports mount options currently in effect--only reports the option if it is different from the default value. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>