summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-11-04ext3: return 32/64-bit dir name hash according to usage typeEric Sandeen
commit d7dab39b6e16d5eea78ed3c705d2a2d0772b4f06 upstream. This is based on commit d1f5273e9adb40724a85272f248f210dc4ce919a ext4: return 32/64-bit dir name hash according to usage type by Fan Yong <yong.fan@whamcloud.com> Traditionally ext2/3/4 has returned a 32-bit hash value from llseek() to appease NFSv2, which can only handle a 32-bit cookie for seekdir() and telldir(). However, this causes problems if there are 32-bit hash collisions, since the NFSv2 server can get stuck resending the same entries from the directory repeatedly. Allow ext3 to return a full 64-bit hash (both major and minor) for telldir to decrease the chance of hash collisions. This patch does implement a new ext3_dir_llseek op, because with 64-bit hashes, nfs will attempt to seek to a hash "offset" which is much larger than ext3's s_maxbytes. So for dx dirs, we call generic_file_llseek_size() with the appropriate max hash value as the maximum seekable size. Otherwise we just pass through to generic_file_llseek(). Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Patch-updated-by: Eric Sandeen <sandeen@redhat.com> (blame us if something is not correct) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-22ext4: fix memory leak in xattrDave Jones
commit 6e4ea8e33b2057b85d75175dd89b93f5e26de3bc upstream. If we take the 2nd retry path in ext4_expand_extra_isize_ea, we potentionally return from the function without having freed these allocations. If we don't do the return, we over-write the previous allocation pointers, so we leak either way. Spotted with Coverity. [ Fixed by tytso to set is and bs to NULL after freeing these pointers, in case in the retry loop we later end up triggering an error causing a jump to cleanup, at which point we could have a double free bug. -- Ted ] Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-22vfs: allow O_PATH file descriptors for fstatfs()Linus Torvalds
commit 9d05746e7b16d8565dddbe3200faa1e669d23bbf upstream. Olga reported that file descriptors opened with O_PATH do not work with fstatfs(), found during further development of ksh93's thread support. There is no reason to not allow O_PATH file descriptors here (fstatfs is very much a path operation), so use "fdget_raw()". See commit 55815f70147d ("vfs: make O_PATH file descriptors usable for 'fstat()'") for a very similar issue reported for fstat() by the same team. Reported-and-tested-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13ext4: avoid hang when mounting non-journal filesystems with orphan listTheodore Ts'o
commit 0e9a9a1ad619e7e987815d20262d36a2f95717ca upstream. When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) -js: This is a fix for CVE-2013-2015. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13Btrfs: change how we queue blocks for backref checkingJosef Bacik
commit b6c60c8018c4e9beb2f83fc82c09f9d033766571 upstream. Previously we only added blocks to the list to have their backrefs checked if the level of the block is right above the one we are searching for. This is because we want to make sure we don't add the entire path up to the root to the lists to make sure we process things one at a time. This assumes that if any blocks in the path to the root are going to be not checked (shared in other words) then they will be in the level right above the current block on up. This isn't quite right though since we can have blocks higher up the list that are shared because they are attached to a reloc root. But we won't add this block to be checked and then later on we will BUG_ON(!upper->checked). So instead keep track of wether or not we've queued a block to be checked in this current search, and if we haven't go ahead and queue it to be checked. This patch fixed the panic I was seeing where we BUG_ON(!upper->checked). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-01fanotify: dont merge permission eventsLino Sanfilippo
commit 03a1cec1f17ac1a6041996b3e40f96b5a2f90e1b upstream. Boyd Yang reported a problem for the case that multiple threads of the same thread group are waiting for a reponse for a permission event. In this case it is possible that some of the threads are never woken up, even if the response for the event has been received (see http://marc.info/?l=linux-kernel&m=131822913806350&w=2). The reason is that we are currently merging permission events if they belong to the same thread group. But we are not prepared to wake up more than one waiter for each event. We do wait_event(group->fanotify_data.access_waitq, event->response || atomic_read(&group->fanotify_data.bypass_perm)); and after that event->response = 0; which is the reason that even if we woke up all waiters for the same event some of them may see event->response being already set 0 again, then go back to sleep and block forever. With this patch we avoid that more than one thread is waiting for a response by not merging permission events for the same thread group any more. Reported-by: Boyd Yang <boyd.yang@gmail.com> Signed-off-by: Lino Sanfilippo <LinoSanfilipp@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com> Cc: Mihai Donțu <mihai.dontu@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26fuse: invalidate inode attributes on xattr modificationAnand Avati
commit d331a415aef98717393dda0be69b7947da08eba3 upstream. Calls like setxattr and removexattr result in updation of ctime. Therefore invalidate inode attributes to force a refresh. Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26fuse: postpone end_page_writeback() in fuse_writepage_locked()Maxim Patlasov
commit 4a4ac4eba1010ef9a804569058ab29e3450c0315 upstream. The patch fixes a race between ftruncate(2), mmap-ed write and write(2): 1) An user makes a page dirty via mmap-ed write. 2) The user performs shrinking truncate(2) intended to purge the page. 3) Before fuse_do_setattr calls truncate_pagecache, the page goes to writeback. fuse_writepage_locked fills FUSE_WRITE request and releases the original page by end_page_writeback. 4) fuse_do_setattr() completes and successfully returns. Since now, i_mutex is free. 5) Ordinary write(2) extends i_size back to cover the page. Note that fuse_send_write_pages do wait for fuse writeback, but for another page->index. 6) fuse_writepage_locked proceeds by queueing FUSE_WRITE request. fuse_send_writepage is supposed to crop inarg->size of the request, but it doesn't because i_size has already been extended back. Moving end_page_writeback to the end of fuse_writepage_locked fixes the race because now the fact that truncate_pagecache is successfully returned infers that fuse_writepage_locked has already called end_page_writeback. And this, in turn, infers that fuse_flush_writepages has already called fuse_send_writepage, and the latter used valid (shrunk) i_size. write(2) could not extend it because of i_mutex held by ftruncate(2). Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26isofs: Refuse RW mount of the filesystem instead of making it ROJan Kara
commit 17b7f7cf58926844e1dd40f5eb5348d481deca6a upstream. Refuse RW mount of isofs filesystem. So far we just silently changed it to RO mount but when the media is writeable, block layer won't notice this change and thus will think device is used RW and will block eject button of the drive. That is unexpected by users because for non-writeable media eject button works just fine. Userspace mount(8) command handles this just fine and retries mounting with MS_RDONLY set so userspace shouldn't see any regression. Plus any tool mounting isofs is likely confronted with the case of read-only media where block layer already refuses to mount the filesystem without MS_RDONLY set so our behavior shouldn't be anything new for it. Reported-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26ocfs2: fix the end cluster offset of FIEMAPJie Liu
commit 28e8be31803b19d0d8f76216cb11b480b8a98bec upstream. Call fiemap ioctl(2) with given start offset as well as an desired mapping range should show extents if possible. However, we somehow figure out the end offset of mapping via 'mapping_end -= cpos' before iterating the extent records which would cause problems if the given fiemap length is too small to a cluster size, e.g, Cluster size 4096: debugfs.ocfs2 1.6.3 Block Size Bits: 12 Cluster Size Bits: 12 The extended fiemap test utility From David: https://gist.github.com/anonymous/6172331 # dd if=/dev/urandom of=/ocfs2/test_file bs=1M count=1000 # ./fiemap /ocfs2/test_file 4096 10 start: 4096, length: 10 File /ocfs2/test_file has 0 extents: # Logical Physical Length Flags ^^^^^ <-- No extent is shown In this case, at ocfs2_fiemap(): cpos == mapping_end == 1. Hence the loop of searching extent records was not executed at all. This patch remove the in question 'mapping_end -= cpos', and loops until the cpos is larger than the mapping_end as usual. # ./fiemap /ocfs2/test_file 4096 10 start: 4096, length: 10 File /ocfs2/test_file has 1 extents: # Logical Physical Length Flags 0: 0000000000000000 0000000056a01000 0000000006a00000 0000 Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reported-by: David Weber <wb@munzinger.de> Tested-by: David Weber <wb@munzinger.de> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fashen <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26cifs: ensure that srv_mutex is held when dealing with ssocket pointerJeff Layton
commit 73e216a8a42c0ef3d08071705c946c38fdbe12b0 upstream. Oleksii reported that he had seen an oops similar to this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 IP: [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0 PGD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ipt_MASQUERADE xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables carl9170 ath usb_storage f2fs nfnetlink_log nfnetlink md4 cifs dns_resolver hid_generic usbhid hid af_packet uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev rfcomm btusb bnep bluetooth qmi_wwan qcserial cdc_wdm usb_wwan usbnet usbserial mii snd_hda_codec_hdmi snd_hda_codec_realtek iwldvm mac80211 coretemp intel_powerclamp kvm_intel kvm iwlwifi snd_hda_intel cfg80211 snd_hda_codec xhci_hcd e1000e ehci_pci snd_hwdep sdhci_pci snd_pcm ehci_hcd microcode psmouse sdhci thinkpad_acpi mmc_core i2c_i801 pcspkr usbcore hwmon snd_timer snd_page_alloc snd ptp rfkill pps_core soundcore evdev usb_common vboxnetflt(O) vboxdrv(O)Oops#2 Part8 loop tun binfmt_misc fuse msr acpi_call(O) ipv6 autofs4 CPU: 0 PID: 21612 Comm: kworker/0:1 Tainted: G W O 3.10.1SIGN #28 Hardware name: LENOVO 2306CTO/2306CTO, BIOS G2ET92WW (2.52 ) 02/22/2013 Workqueue: cifsiod cifs_echo_request [cifs] task: ffff8801e1f416f0 ti: ffff880148744000 task.ti: ffff880148744000 RIP: 0010:[<ffffffff814dcc13>] [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0 RSP: 0000:ffff880148745b00 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880148745b78 RCX: 0000000000000048 RDX: ffff880148745c90 RSI: ffff880181864a00 RDI: ffff880148745b78 RBP: ffff880148745c48 R08: 0000000000000048 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880181864a00 R13: ffff880148745c90 R14: 0000000000000048 R15: 0000000000000048 FS: 0000000000000000(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000088 CR3: 000000020c42c000 CR4: 00000000001407b0 Oops#2 Part7 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880148745b30 ffffffff810c4af9 0000004848745b30 ffff880181864a00 ffffffff81ffbc40 0000000000000000 ffff880148745c90 ffffffff810a5aab ffff880148745bc0 ffffffff81ffbc40 ffff880148745b60 ffffffff815a9fb8 Call Trace: [<ffffffff810c4af9>] ? finish_task_switch+0x49/0xe0 [<ffffffff810a5aab>] ? lock_timer_base.isra.36+0x2b/0x50 [<ffffffff815a9fb8>] ? _raw_spin_unlock_irqrestore+0x18/0x40 [<ffffffff810a673f>] ? try_to_del_timer_sync+0x4f/0x70 [<ffffffff815aa38f>] ? _raw_spin_unlock_bh+0x1f/0x30 [<ffffffff814dcc87>] kernel_sendmsg+0x37/0x50 [<ffffffffa081a0e0>] smb_send_kvec+0xd0/0x1d0 [cifs] [<ffffffffa081a263>] smb_send_rqst+0x83/0x1f0 [cifs] [<ffffffffa081ab6c>] cifs_call_async+0xec/0x1b0 [cifs] [<ffffffffa08245e0>] ? free_rsp_buf+0x40/0x40 [cifs] Oops#2 Part6 [<ffffffffa082606e>] SMB2_echo+0x8e/0xb0 [cifs] [<ffffffffa0808789>] cifs_echo_request+0x79/0xa0 [cifs] [<ffffffff810b45b3>] process_one_work+0x173/0x4a0 [<ffffffff810b52a1>] worker_thread+0x121/0x3a0 [<ffffffff810b5180>] ? manage_workers.isra.27+0x2b0/0x2b0 [<ffffffff810bae00>] kthread+0xc0/0xd0 [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120 [<ffffffff815b199c>] ret_from_fork+0x7c/0xb0 [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120 Code: 84 24 b8 00 00 00 4c 89 f1 4c 89 ea 4c 89 e6 48 89 df 4c 89 60 18 48 c7 40 28 00 00 00 00 4c 89 68 30 44 89 70 14 49 8b 44 24 28 <ff> 90 88 00 00 00 3d ef fd ff ff 74 10 48 8d 65 e0 5b 41 5c 41 RIP [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0 RSP <ffff880148745b00> CR2: 0000000000000088 The client was in the middle of trying to send a frame when the server->ssocket pointer got zeroed out. In most places, that we access that pointer, the srv_mutex is held. There's only one spot that I see that the server->ssocket pointer gets set and the srv_mutex isn't held. This patch corrects that. The upstream bug report was here: https://bugzilla.kernel.org/show_bug.cgi?id=60557 Reported-by: Oleksii Shevchuk <alxchk@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-07SCSI: sg: Fix user memory corruption when SG_IO is interrupted by a signalRoland Dreier
commit 35dc248383bbab0a7203fca4d722875bc81ef091 upstream. There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp->read_wait, (srp_done(sfp, srp) || sdp->detached)); but neither srp_done() nor sdp->detached is true, so we end up just setting srp->orphan and returning to userspace: srp->orphan = 1; write_unlock_irq(&sfp->rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(&sfp->rq_list_lock, iflags); if (unlikely(srp->orphan)) { if (sfp->keep_orphan) srp->sg_io_owned = 0; else done = 0; } srp->done = done; write_unlock_irqrestore(&sfp->rq_list_lock, iflags); if (likely(done)) { /* Now wake up any sg_read() that is waiting for this * packet. */ wake_up_interruptible(&sfp->read_wait); kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN); kref_put(&sfp->f_ref, sg_remove_sfp); } else { INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext); schedule_work(&srp->ew.work); } Since srp->orphan *is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() -> sg_finish_rem_req() -> blk_rq_unmap_user() -> ... -> bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current->mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! As suggested by James Bottomley <James.Bottomley@hansenpartnership.com>, add a check for current->mm (which is NULL if we're on a kernel thread without a real userspace address space) in bio_uncopy_user(), and skip the copy if we're on a kernel thread. There's no reason that I can think of for any caller of bio_uncopy_user() to want to do copying on a kernel thread with a random active userspace address space. Huge thanks to Costa Sapuntzakis <costa@purestorage.com> for the original pointer to this bug in the sg code. Signed-off-by: Roland Dreier <roland@purestorage.com> Tested-by: David Milburn <dmilburn@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: James Bottomley <JBottomley@Parallels.com> [lizf: backported to 3.4: - Use __bio_for_each_segment() instead of bio_for_each_segment_all()] Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-07jfs: fix readdir cookie incompatibility with NFSv4Dave Kleikamp
commit 44512449c0ab368889dd13ae0031fba74ee7e1d2 upstream. NFSv4 reserves readdir cookie values 0-2 for special entries (. and ..), but jfs allows a value of 2 for a non-special entry. This incompatibility can result in the nfs client reporting a readdir loop. This patch doesn't change the value stored internally, but adds one to the value exposed to the iterate method. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> [bwh: Backported to 3.2: - Adjust context - s/ctx->pos/filp->f_pos/] Tested-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-29nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP ↵Vyacheslav Dubeyko
error detection commit 4bf93b50fd04118ac7f33a3c2b8a0a1f9fa80bc9 upstream. Fix the issue with improper counting number of flying bio requests for BIO_EOPNOTSUPP error detection case. The sb_nbio must be incremented exactly the same number of times as complete() function was called (or will be called) because nilfs_segbuf_wait() will call wail_for_completion() for the number of times set to sb_nbio: do { wait_for_completion(&segbuf->sb_bio_event); } while (--segbuf->sb_nbio > 0); Two functions complete() and wait_for_completion() must be called the same number of times for the same sb_bio_event. Otherwise, wait_for_completion() will hang or leak. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-29nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP ↵Vyacheslav Dubeyko
error commit 2df37a19c686c2d7c4e9b4ce1505b5141e3e5552 upstream. Remove double call of bio_put() in nilfs_end_bio_write() for the case of BIO_EOPNOTSUPP error detection. The issue was found by Dan Carpenter and he suggests first version of the fix too. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-20jbd2: Fix use after free after error in jbd2_journal_dirty_metadata()Jan Kara
commit 91aa11fae1cf8c2fd67be0609692ea9741cdcc43 upstream. When jbd2_journal_dirty_metadata() returns error, __ext4_handle_dirty_metadata() stops the handle. However callers of this function do not count with that fact and still happily used now freed handle. This use after free can result in various issues but very likely we oops soon. The motivation of adding __ext4_journal_stop() into __ext4_handle_dirty_metadata() in commit 9ea7a0df seems to be only to improve error reporting. So replace __ext4_journal_stop() with ext4_journal_abort_handle() which was there before that commit and add WARN_ON_ONCE() to dump stack to provide useful information. Reported-by: Sage Weil <sage@inktank.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-20fs/proc/task_mmu.c: fix buffer overflow in add_page_map()yonghua zheng
commit 8c8296223f3abb142be8fc31711b18a704c0e7d8 upstream. Recently we met quite a lot of random kernel panic issues after enabling CONFIG_PROC_PAGE_MONITOR. After debuggind we found this has something to do with following bug in pagemap: In struct pagemapread: struct pagemapread { int pos, len; pagemap_entry_t *buffer; bool v2; }; pos is number of PM_ENTRY_BYTES in buffer, but len is the size of buffer, it is a mistake to compare pos and len in add_page_map() for checking buffer is full or not, and this can lead to buffer overflow and random kernel panic issue. Correct len to be total number of PM_ENTRY_BYTES in buffer. [akpm@linux-foundation.org: document pagemapread.pos and .len units, fix PM_ENTRY_BYTES definition] Signed-off-by: Yonghua Zheng <younghua.zheng@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-14vfs: d_obtain_alias() needs to use "/" as default name.NeilBrown
commit b911a6bdeef5848c468597d040e3407e0aee04ce upstream. NFS appears to use d_obtain_alias() to create the root dentry rather than d_make_root. This can cause 'prepend_path()' to complain that the root has a weird name if an NFS filesystem is lazily unmounted. e.g. if "/mnt" is an NFS mount then { cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; } will cause a WARN message like WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0() ... Root dentry has weird name <> to appear in kernel logs. So change d_obtain_alias() to use "/" rather than "" as the anonymous name. Signed-off-by: NeilBrown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [bwh: Backported to 3.2: use named initialisers instead of QSTR_INIT()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-14debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)Oleg Nesterov
commit 776164c1faac4966ab14418bb0922e1820da1d19 upstream. debugfs_remove_recursive() is wrong, 1. it wrongly assumes that !list_empty(d_subdirs) means that this dir should be removed. This is not that bad by itself, but: 2. if d_subdirs does not becomes empty after __debugfs_remove() it gives up and silently fails, it doesn't even try to remove other entries. However ->d_subdirs can be non-empty because it still has the already deleted !debugfs_positive() entries. 3. simple_release_fs() is called even if __debugfs_remove() fails. Suppose we have dir1/ dir2/ file2 file1 and someone opens dir1/dir2/file2. Now, debugfs_remove_recursive(dir1/dir2) succeeds, and dir1/dir2 goes away. But debugfs_remove_recursive(dir1) silently fails and doesn't remove this directory. Because it tries to delete (the already deleted) dir1/dir2/file2 again and then fails due to "Avoid infinite loop" logic. Test-case: #!/bin/sh cd /sys/kernel/debug/tracing echo 'p:probe/sigprocmask sigprocmask' >> kprobe_events sleep 1000 < events/probe/sigprocmask/id & echo -n >| kprobe_events [ -d events/probe ] && echo "ERR!! failed to rm probe" And after that it is not possible to create another probe entry. With this patch debugfs_remove_recursive() skips !debugfs_positive() files although this is not strictly needed. The most important change is that it does not try to make ->d_subdirs empty, it simply scans the whole list(s) recursively and removes as much as possible. Link: http://lkml.kernel.org/r/20130726151256.GC19472@redhat.com Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-14cifs: extend the buffer length enought for sprintf() usingChen Gang
commit 057d6332b24a4497c55a761c83c823eed9e3f23b upstream. For cifs_set_cifscreds() in "fs/cifs/connect.c", 'desc' buffer length is 'CIFSCREDS_DESC_SIZE' (56 is less than 256), and 'ses->domainName' length may be "255 + '\0'". The related sprintf() may cause memory overflow, so need extend related buffer enough to hold all things. It is also necessary to be sure of 'ses->domainName' must be less than 256, and define the related macro instead of hard code number '256'. Signed-off-by: Chen Gang <gang.chen@asianux.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Scott Lovenberg <scott.lovenberg@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-14ext4: fix mount/remount error messages for incompatible mount optionsPiotr Sarna
commit 6ae6514b33f941d3386da0dfbe2942766eab1577 upstream. Commit 5688978 ("ext4: improve handling of conflicting mount options") introduced incorrect messages shown while choosing wrong mount options. First of all, both cases of incorrect mount options, "data=journal,delalloc" and "data=journal,dioread_nolock" result in the same error message. Secondly, the problem above isn't solved for remount option: the mismatched parameter is simply ignored. Moreover, ext4_msg states that remount with options "data=journal,delalloc" succeeded, which is not true. To fix it up, I added a simple check after parse_options() call to ensure that data=journal and delalloc/dioread_nolock parameters are not present at the same time. Signed-off-by: Piotr Sarna <p.sarna@partner.samsung.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-14ext4: make sure group number is bumped after a inode allocation raceTheodore Ts'o
commit a34eb503742fd25155fd6cff6163daacead9fbc3 upstream. When we try to allocate an inode, and there is a race between two CPU's trying to grab the same inode, _and_ this inode is the last free inode in the block group, make sure the group number is bumped before we continue searching the rest of the block groups. Otherwise, we end up searching the current block group twice, and we end up skipping searching the last block group. So in the unlikely situation where almost all of the inodes are allocated, it's possible that we will return ENOSPC even though there might be free inodes in that last block group. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-11fanotify: info leak in copy_event_to_user()Dan Carpenter
commit de1e0c40aceb9d5bff09c3a3b97b2f1b178af53f upstream. The ->reserved field isn't cleared so we leak one byte of stack information to userspace. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-04livelock avoidance in sget()Al Viro
commit acfec9a5a892f98461f52ed5770de99a3e571ae2 upstream. Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about to fail. The superblock is on ->fs_supers, ->s_umount is held exclusive, ->s_active is 1. Along comes two more processes, trying to mount the same thing; sget() in each is picking that superblock, bumping ->s_count and trying to grab ->s_umount. ->s_active is 3 now. Original mount(2) finally gets to deactivate_locked_super() on failure; ->s_active is 2, superblock is still ->fs_supers because shutdown will *not* happen until ->s_active hits 0. ->s_umount is dropped and now we have two processes chasing each other: s_active = 2, A acquired ->s_umount, B blocked A sees that the damn thing is stillborn, does deactivate_locked_super() s_active = 1, A drops ->s_umount, B gets it A restarts the search and finds the same superblock. And bumps it ->s_active. s_active = 2, B holds ->s_umount, A blocked on trying to get it ... and we are in the earlier situation with A and B switched places. The root cause, of course, is that ->s_active should not grow until we'd got MS_BORN. Then failing ->mount() will have deactivate_locked_super() shut the damn thing down. Fortunately, it's easy to do - the key point is that grab_super() is called only for superblocks currently on ->fs_supers, so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and bump ->s_active; we must never increment ->s_count for superblocks past ->kill_sb(), but grab_super() is never called for those. The bug is pretty old; we would've caught it by now, if not for accidental exclusion between sget() for block filesystems; the things like cgroup or e.g. mtd-based filesystems don't have anything of that sort, so they get bitten. The right way to deal with that is obviously to fix sget()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-04Btrfs: re-add root to dead root list if we stop dropping itJosef Bacik
commit d29a9f629e009c9b90e5859bce581070fd6247fc upstream. If we stop dropping a root for whatever reason we need to add it back to the dead root list so that we will re-start the dropping next transaction commit. The other case this happens is if we recover a drop because we will add a root without adding it to the fs radix tree, so we can leak it's root and commit root extent buffer, adding this to the dead root list makes this cleanup happen. Thanks, Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-04Btrfs: fix lock leak when resuming snapshot deletionJosef Bacik
commit fec386ac1428f9c0e672df952cbca5cebd4e4e2f upstream. We aren't setting path->locks[level] when we resume a snapshot deletion which means we won't unlock the buffer when we free the path. This causes deadlocks if we happen to re-allocate the block before we've evicted the extent buffer from cache. Thanks, Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28lockd: protect nlm_blocked access in nlmsvc_retry_blockedDavid Jeffery
commit 1c327d962fc420aea046c16215a552710bde8231 upstream. In nlmsvc_retry_blocked, the check that the list is non-empty and acquiring the pointer of the first entry is unprotected by any lock. This allows a rare race condition when there is only one entry on the list. A function such as nlmsvc_grant_callback() can be called, which will temporarily remove the entry from the list. Between the list_empty() and list_entry(),the list may become empty, causing an invalid pointer to be used as an nlm_block, leading to a possible crash. This patch adds the nlm_block_lock around these calls to prevent concurrent use of the nlm_blocked list. This was a regression introduced by f904be9cc77f361d37d71468b13ff3d1a1823dea "lockd: Mostly remove BKL from the server". Signed-off-by: David Jeffery <djeffery@redhat.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28writeback: Fix periodic writeback after fs mountJan Kara
commit a5faeaf9109578e65e1a32e2a3e76c8b47e7dcb6 upstream. Code in blkdev.c moves a device inode to default_backing_dev_info when the last reference to the device is put and moves the device inode back to its bdi when the first reference is acquired. This includes moving to wb.b_dirty list if the device inode is dirty. The code however doesn't setup timer to wake corresponding flusher thread and while wb.b_dirty list is non-empty __mark_inode_dirty() will not set it up either. Thus periodic writeback is effectively disabled until a sync(2) call which can lead to unexpected data loss in case of crash or power failure. Fix the problem by setting up a timer for periodic writeback in case we add the first dirty inode to wb.b_dirty list in bdev_inode_switch_bdi(). Reported-by: Bert De Jonghe <Bert.DeJonghe@amplidata.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ext3: fix data=journal fast mount/umount hangJan Kara
commit e643692138cfa33528f054b071ba2583509bb217 upstream. In data=journal mode, if we unmount the file system before a transaction has a chance to complete, when the journal inode is being evicted, we can end up calling into log_wait_commit() for the last transaction, after the journalling machinery has been shut down. That triggers the WARN_ONCE in __log_start_commit(). Arguably we should adjust ext3_should_journal_data() to return FALSE for the journal inode, but the only place it matters is ext3_evict_inode(), and so it's to save a bit of CPU time, and to make the patch much more obviously correct by inspection(tm), we'll fix it by explicitly not trying to waiting for a journal commit when we are evicting the journal inode, since it's guaranteed to never succeed in this case. This can be easily replicated via: mount -t ext3 -o data=journal /dev/vdb /vdb ; umount /vdb This is a port of ext4 fix from Ted Ts'o. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21Handle big endianness in NTLM (ntlmv2) authenticationSteve French
commit fdf96a907c1fbb93c633e2b7ede3b8df26d6a4c0 upstream. This is RH bug 970891 Uppercasing of username during calculation of ntlmv2 hash fails because UniStrupr function does not handle big endian wchars. Also fix a comment in the same code to reflect its correct usage. [To make it easier for stable (rather than require 2nd patch) fixed this patch of Shirish's to remove endian warning generated by sparse -- steve f.] Reported-by: steve <sanpatr1@in.ibm.com> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: don't allow ext4_free_blocks() to fail due to ENOMEMTheodore Ts'o
commit e7676a704ee0a1ef71a6b23760b5a8f6896cb1a1 upstream. The filesystem should not be marked inconsistent if ext4_free_blocks() is not able to allocate memory. Unfortunately some callers (most notably ext4_truncate) don't have a way to reflect an error back up to the VFS. And even if we did, most userspace applications won't deal with most system calls returning ENOMEM anyway. Reported-by: Nagachandra P <nagachandra@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: fix overflow when counting used blocks on 32-bit architecturesJan Kara
commit 8af8eecc1331dbf5e8c662022272cf667e213da5 upstream. The arithmetics adding delalloc blocks to the number of used blocks in ext4_getattr() can easily overflow on 32-bit archs as we first multiply number of blocks by blocksize and then divide back by 512. Make the arithmetics more clever and also use proper type (unsigned long long instead of unsigned long). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: fix data offset overflow in ext4_xattr_fiemap() on 32-bit archsJan Kara
commit a60697f411eb365fb09e639e6f183fe33d1eb796 upstream. On 32-bit architectures with 32-bit sector_t computation of data offset in ext4_xattr_fiemap() can overflow resulting in reporting bogus data location. Fix the problem by typing block number to proper type before shifting. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ocfs2: xattr: fix inlined xattr reflinkJunxiao Bi
commit ef962df057aaafd714f5c22ba3de1be459571fdf upstream. Inlined xattr shared free space of inode block with inlined data or data extent record, so the size of the later two should be adjusted when inlined xattr is enabled. See ocfs2_xattr_ibody_init(). But this isn't done well when reflink. For inode with inlined data, its max inlined data size is adjusted in ocfs2_duplicate_inline_data(), no problem. But for inode with data extent record, its record count isn't adjusted. Fix it, or data extent record and inlined xattr may overwrite each other, then cause data corruption or xattr failure. One panic caused by this bug in our test environment is the following: kernel BUG at fs/ocfs2/xattr.c:1435! invalid opcode: 0000 [#1] SMP Pid: 10871, comm: multi_reflink_t Not tainted 2.6.39-300.17.1.el5uek #1 RIP: ocfs2_xa_offset_pointer+0x17/0x20 [ocfs2] RSP: e02b:ffff88007a587948 EFLAGS: 00010283 RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00000000000051e4 RDX: ffff880057092060 RSI: 0000000000000f80 RDI: ffff88007a587a68 RBP: ffff88007a587948 R08: 00000000000062f4 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000010 R13: ffff88007a587a68 R14: 0000000000000001 R15: ffff88007a587c68 FS: 00007fccff7f06e0(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000015cf000 CR3: 000000007aa76000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process multi_reflink_t Call Trace: ocfs2_xa_reuse_entry+0x60/0x280 [ocfs2] ocfs2_xa_prepare_entry+0x17e/0x2a0 [ocfs2] ocfs2_xa_set+0xcc/0x250 [ocfs2] ocfs2_xattr_ibody_set+0x98/0x230 [ocfs2] __ocfs2_xattr_set_handle+0x4f/0x700 [ocfs2] ocfs2_xattr_set+0x6c6/0x890 [ocfs2] ocfs2_xattr_user_set+0x46/0x50 [ocfs2] generic_setxattr+0x70/0x90 __vfs_setxattr_noperm+0x80/0x1a0 vfs_setxattr+0xa9/0xb0 setxattr+0xc3/0x120 sys_fsetxattr+0xa8/0xd0 system_call_fastpath+0x16/0x1b Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()Al Viro
commit 64cb927371cd2ec43758d8a094a003d27bc3d0dc upstream. Both ext3 and ext4 htree_dirblock_to_tree() is just filling the in-core rbtree for use by call_filldir(). All updates of ->f_pos are done by the latter; bumping it here (on error) is obviously wrong - we might very well have it nowhere near the block we'd found an error in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21jbd2: fix theoretical race in jbd2__journal_restartTheodore Ts'o
commit 39c04153fda8c32e85b51c96eb5511a326ad7609 upstream. Once we decrement transaction->t_updates, if this is the last handle holding the transaction from closing, and once we release the t_handle_lock spinlock, it's possible for the transaction to commit and be released. In practice with normal kernels, this probably won't happen, since the commit happens in a separate kernel thread and it's unlikely this could all happen within the space of a few CPU cycles. On the other hand, with a real-time kernel, this could potentially happen, so save the tid found in transaction->t_tid before we release t_handle_lock. It would require an insane configuration, such as one where the jbd2 thread was set to a very high real-time priority, perhaps because a high priority real-time thread is trying to read or write to a file system. But some people who use real-time kernels have been known to do insane things, including controlling laser-wielding industrial robots. :-) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21CIFS use sensible file nlink values if unprovidedSteve French
commit 6658b9f70ebca5fc0795b1d6d733996af1e2caa7 upstream. Certain servers may not set the NumberOfLinks field in query file/path info responses. In such a case, cifs_inode_needs_reval() assumes that all regular files are hardlinks and triggers revalidation, leading to excessive and unnecessary network traffic. This change hardcodes cf_nlink (and subsequently i_nlink) when not returned by the server, similar to what already occurs in cifs_mkdir(). Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-13nfsd4: fix decoding of compounds across page boundariesJ. Bruce Fields
commit 247500820ebd02ad87525db5d9b199e5b66f6636 upstream. A freebsd NFSv4.0 client was getting rare IO errors expanding a tarball. A network trace showed the server returning BAD_XDR on the final getattr of a getattr+write+getattr compound. The final getattr started on a page boundary. I believe the Linux client ignores errors on the post-write getattr, and that that's why we haven't seen this before. Reported-by: Rick Macklem <rmacklem@uoguelph.ca> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-13hpfs: better test for errorsMikulas Patocka
commit 3ebacb05044f82c5f0bb456a894eb9dc57d0ed90 upstream. The test if bitmap access is out of bound could errorneously pass if the device size is divisible by 16384 sectors and we are asking for one bitmap after the end. Check for invalid size in the superblock. Invalid size could cause integer overflows in the rest of the code. Signed-off-by: Mikulas Patocka <mpatocka@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-03UBIFS: fix a horrid bugArtem Bityutskiy
commit 605c912bb843c024b1ed173dc427cd5c08e5d54d upstream. Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are in the middle of 'ubifs_readdir()'. This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage, but this may corrupt memory and lead to all kinds of problems like crashes an security holes. This patch fixes the problem by using the 'file->f_version' field, which '->llseek()' always unconditionally sets to zero. We set it to 1 in 'ubifs_readdir()' and whenever we detect that it became 0, we know there was a seek and it is time to clear the state saved in 'file->private_data'. I tested this patch by writing a user-space program which runds readdir and seek in parallell. I could easily crash the kernel without these patches, but could not crash it with these patches. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-03UBIFS: prepare to fix a horrid bugArtem Bityutskiy
commit 33f1a63ae84dfd9ad298cf275b8f1887043ced36 upstream. Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are in the middle of 'ubifs_readdir()'. First of all, this means that 'file->private_data' can be freed while 'ubifs_readdir()' uses it. But this particular patch does not fix the problem. This patch is only a preparation, and the fix will follow next. In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly, because 'file->f_pos' can be changed by '->llseek()' at any point. This may lead 'ubifs_readdir()' to returning inconsistent data: directory entry names may correspond to incorrect file positions. So here we introduce a local variable 'pos', read 'file->f_pose' once at very the beginning, and then stick to 'pos'. The result of this is that when 'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of 'ubifs_readdir()', the latter "wins". Reported-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-03perf: Disable monitoring on setuid processes for regular usersStephane Eranian
commit 2976b10f05bd7f6dab9f9e7524451ddfed656a89 upstream. There was a a bug in setup_new_exec(), whereby the test to disabled perf monitoring was not correct because the new credentials for the process were not yet committed and therefore the get_dumpable() test was never firing. The patch fixes the problem by moving the perf_event test until after the credentials are committed. Signed-off-by: Stephane Eranian <eranian@google.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-27inotify: invalid mask should return a error number but not set itZhao Hongjiang
commit 04df32fa10ab9a6f0643db2949d42efc966bc844 upstream. When we run the crackerjack testsuite, the inotify_add_watch test is stalled. This is caused by the invalid mask 0 - the task is waiting for the event but it never comes. inotify_add_watch() should return -EINVAL as it did before commit 676a0675cf92 ("inotify: remove broken mask checks causing unmount to be EINVAL"). That commit removes the invalid mask check, but that check is needed. Check the mask's ALL_INOTIFY_BITS before the inotify_arg_to_mask() call. If none are set, just return -EINVAL. Because IN_UNMOUNT is in ALL_INOTIFY_BITS, this change will not trigger the problem that above commit fixed. [akpm@linux-foundation.org: fix build] Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Jim Somerville <Jim.Somerville@windriver.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-20ceph: fix statvfs fr_sizeSage Weil
commit 92a49fb0f79f3300e6e50ddf56238e70678e4202 upstream. Different versions of glibc are broken in different ways, but the short of it is that for the time being, frsize should == bsize, and be used as the multiple for the blocks, free, and available fields. This mirrors what is done for NFS. The previous reporting of the page size for frsize meant that newer glibc and df would report a very small value for the fs size. Fixes http://tracker.ceph.com/issues/3793. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-20libceph: wrap auth ops in wrapper functionsSage Weil
commit 27859f9773e4a0b2042435b13400ee2c891a61f4 upstream. Use wrapper functions that check whether the auth op exists so that callers do not need a bunch of conditional checks. Simplifies the external interface. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-20libceph: add update_authorizer auth methodSage Weil
commit 0bed9b5c523d577378b6f83eab5835fe30c27208 upstream. Currently the messenger calls out to a get_authorizer con op, which will create a new authorizer if it doesn't yet have one. In the meantime, when we rotate our service keys, the authorizer doesn't get updated. Eventually it will be rejected by the server on a new connection attempt and get invalidated, and we will then rebuild a new authorizer, but this is not ideal. Instead, if we do have an authorizer, call a new update_authorizer op that will verify that the current authorizer is using the latest secret. If it is not, we will build a new one that does. This avoids the transient failure. This fixes one of the sorry sequence of events for bug http://tracker.ceph.com/issues/4282 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-20ceph: ceph_pagelist_append might sleep while atomicJim Schutt
commit 39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40 upstream. Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc() while holding a lock, but it's spoiled because ceph_pagelist_addpage() always calls kmap(), which might sleep. Here's the result: [13439.295457] ceph: mds0 reconnect start [13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58 [13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1 . . . [13439.376225] Call Trace: [13439.378757] [<ffffffff81076f4c>] __might_sleep+0xfc/0x110 [13439.384353] [<ffffffffa03f4ce0>] ceph_pagelist_append+0x120/0x1b0 [libceph] [13439.391491] [<ffffffffa0448fe9>] ceph_encode_locks+0x89/0x190 [ceph] [13439.398035] [<ffffffff814ee849>] ? _raw_spin_lock+0x49/0x50 [13439.403775] [<ffffffff811cadf5>] ? lock_flocks+0x15/0x20 [13439.409277] [<ffffffffa045e2af>] encode_caps_cb+0x41f/0x4a0 [ceph] [13439.415622] [<ffffffff81196748>] ? igrab+0x28/0x70 [13439.420610] [<ffffffffa045e9f8>] ? iterate_session_caps+0xe8/0x250 [ceph] [13439.427584] [<ffffffffa045ea25>] iterate_session_caps+0x115/0x250 [ceph] [13439.434499] [<ffffffffa045de90>] ? set_request_path_attr+0x2d0/0x2d0 [ceph] [13439.441646] [<ffffffffa0462888>] send_mds_reconnect+0x238/0x450 [ceph] [13439.448363] [<ffffffffa0464542>] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph] [13439.455250] [<ffffffffa0462e42>] check_new_map+0x352/0x500 [ceph] [13439.461534] [<ffffffffa04631ad>] ceph_mdsc_handle_map+0x1bd/0x260 [ceph] [13439.468432] [<ffffffff814ebc7e>] ? mutex_unlock+0xe/0x10 [13439.473934] [<ffffffffa043c612>] extra_mon_dispatch+0x22/0x30 [ceph] [13439.480464] [<ffffffffa03f6c2c>] dispatch+0xbc/0x110 [libceph] [13439.486492] [<ffffffffa03eec3d>] process_message+0x1ad/0x1d0 [libceph] [13439.493190] [<ffffffffa03f1498>] ? read_partial_message+0x3e8/0x520 [libceph] . . . [13439.587132] ceph: mds0 reconnect success [13490.720032] ceph: mds0 caps stale [13501.235257] ceph: mds0 recovery completed [13501.300419] ceph: mds0 caps renewed Fix it up by encoding locks into a buffer first, and when the number of encoded locks is stable, copy that into a ceph_pagelist. [elder@inktank.com: abbreviated the stack info a bit.] Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-20ceph: add cpu_to_le32() calls when encoding a reconnect capabilityJim Schutt
commit c420276a532a10ef59849adc2681f45306166b89 upstream. In his review, Alex Elder mentioned that he hadn't checked that num_fcntl_locks and num_flock_locks were properly decoded on the server side, from a le32 over-the-wire type to a cpu type. I checked, and AFAICS it is done; those interested can consult Locker::_do_cap_update() in src/mds/Locker.cc and src/include/encoding.h in the Ceph server code (git://github.com/ceph/ceph). I also checked the server side for flock_len decoding, and I believe that also happens correctly, by virtue of having been declared __le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-07jfs: fix a couple racesDave Kleikamp
commit 73aaa22d5ffb2630456bac2f9a4ed9b81d0d7271 upstream. This patch fixes races uncovered by xfstests testcase 068. One race is the result of jfs_sync() trying to write a sync point to the journal after it has been frozen (or possibly in the process). Since freezing sync's the journal, there is no need to write a sync point so we simply want to return. The second involves jfs_write_inode() being called on a deleted inode. It calls jfs_flush_journal which is held up by the jfs_commit thread doing the final iput on the same deleted inode, which itself is waiting for the I_SYNC flag to be cleared. jfs_write_inode need not do anything when i_nlink is zero, which is the easy fix. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-07cifs: fix potential buffer overrun when composing a new options stringJeff Layton
commit 166faf21bd14bc5c5295a44874bf7f3930c30b20 upstream. Consider the case where we have a very short ip= string in the original mount options, and when we chase a referral we end up with a very long IPv6 address. Be sure to allow for that possibility when estimating the size of the string to allocate. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>