summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-12-08fuse: prevent fuse_put_request on invalid pointerAnand V. Avati
commit f60311d5f7670d9539b424e4ed8b5c0872fc9e83 upstream. fuse_direct_io() has a loop where requests are allocated in each iteration. if allocation fails, the loop is broken out and follows into an unconditional fuse_put_request() on that invalid pointer. Signed-off-by: Anand V. Avati <avati@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-08fuse: reject O_DIRECT flag also in fuse_createCsaba Henk
commit 1b7323965a8c6eee9dc4e345a7ae4bff1dc93149 upstream. The comment in fuse_open about O_DIRECT: "VFS checks this, but only _after_ ->open()" also holds for fuse_create, however, the same kind of check was missing there. As an impact of this bug, open(newfile, O_RDWR|O_CREAT|O_DIRECT) fails, but a stub newfile will remain if the fuse server handled the implied FUSE_CREATE request appropriately. Other impact: in the above situation ima_file_free() will complain to open/free imbalance if CONFIG_IMA is set. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Harshavardhana <harsha@gluster.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-08jffs2: Fix memory corruption in jffs2_read_inode_range()David Woodhouse
commit 199bc9ff5ca5e4b3bcaff8927b2983c65f34c263 upstream. In 2.6.23 kernel, commit a32ea1e1f925399e0d81ca3f7394a44a6dafa12c ("Fix read/truncate race") fixed a race in the generic code, and as a side effect, now do_generic_file_read() can ask us to readpage() past the i_size. This seems to be correctly handled by the block routines (e.g. block_read_full_page() fills the page with zeroes in case if somebody is trying to read past the last inode's block). JFFS2 doesn't handle this; it assumes that it won't be asked to read pages which don't exist -- and thus that there will be at least _one_ valid 'frag' on the page it's being asked to read. It will fill any holes with the following memset: memset(buf, 0, min(end, frag->ofs + frag->size) - offset); When the 'closest smaller match' returned by jffs2_lookup_node_frag() is actually on a previous page and ends before 'offset', that results in: memset(buf, 0, <huge unsigned negative>); Hopefully, in most cases the corruption is fatal, and quickly causing random oopses, like this: root@10.0.0.4:~/ltp-fs-20090531# ./testcases/kernel/fs/ftest/ftest01 Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc01cd980 Oops: Kernel access of bad area, sig: 11 [#1] [...] NIP [c01cd980] rb_insert_color+0x38/0x184 LR [c0043978] enqueue_hrtimer+0x88/0xc4 Call Trace: [c6c63b60] [c004f9a8] tick_sched_timer+0xa0/0xe4 (unreliable) [c6c63b80] [c0043978] enqueue_hrtimer+0x88/0xc4 [c6c63b90] [c0043a48] __run_hrtimer+0x94/0xbc [c6c63bb0] [c0044628] hrtimer_interrupt+0x140/0x2b8 [c6c63c10] [c000f8e8] timer_interrupt+0x13c/0x254 [c6c63c30] [c001352c] ret_from_except+0x0/0x14 --- Exception: 901 at memset+0x38/0x5c LR = jffs2_read_inode_range+0x144/0x17c [c6c63cf0] [00000000] (null) (unreliable) This patch fixes the issue, plus fixes all LTP tests on NAND/UBI with JFFS2 filesystem that were failing since 2.6.23 (seems like the bug above also broke the truncation). Reported-By: Anton Vorontsov <avorontsov@ru.mvista.com> Tested-By: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09NFSv4: The link() operation should return any delegation on the fileTrond Myklebust
commit 9a3936aac133037f65124fcb2d676a6c201a90a4 upstream. Otherwise, we have to wait for the server to recall it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09NFSv4: Fix a problem whereby a buggy server can oops the kernelTrond Myklebust
commit d953126a28f97ec965d23c69fd5795854c048f30 upstream. We just had a case in which a buggy server occasionally returns the wrong attributes during an OPEN call. While the client does catch this sort of condition in nfs4_open_done(), and causes the nfs4_atomic_open() to return -EISDIR, the logic in nfs_atomic_lookup() is broken, since it causes a fallback to an ordinary lookup instead of just returning the error. When the buggy server then returns a regular file for the fallback lookup, the VFS allows the open, and bad things start to happen, since the open file doesn't have any associated NFSv4 state. The fix is firstly to return the EISDIR/ENOTDIR errors immediately, and secondly to ensure that we are always careful when dereferencing the nfs_open_context state pointer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09NFSv4: Kill nfs4_renewd_prepare_shutdown()Trond Myklebust
commit 3050141bae57984dd660e6861632ccf9b8bca77e upstream. The NFSv4 renew daemon is shared between all active super blocks that refer to a particular NFS server, so it is wrong to be shutting it down in nfs4_kill_super every time a super block is destroyed. This patch therefore kills nfs4_renewd_prepare_shutdown altogether, and leaves it up to nfs4_shutdown_client() to also shut down the renew daemon by means of the existing call to nfs4_kill_renewd(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09nfs: Avoid overrun when copying client IP address stringBen Hutchings
commit f4373bf9e67e4a653c8854acd7b02dac9714c98a upstream. As seen in <http://bugs.debian.org/549002>, nfs4_init_client() can overrun the source string when copying the client IP address from nfs_parsed_mount_data::client_address to nfs_client::cl_ipaddr. Since these are both treated as null-terminated strings elsewhere, the copy should be done with strlcpy() not memcpy(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCETrond Myklebust
commit 52567b03ca38b6e556ced450d64dba8d66e23b0e upstream. RFC 3530 states that when we recieve the error NFS4ERR_RESOURCE, we are not supposed to bump the sequence number on OPEN, LOCK, LOCKU, CLOSE, etc operations. The problem is that we map that error into EREMOTEIO in the XDR layer, and so the NFSv4 middle-layer routines like seqid_mutating_err(), and nfs_increment_seqid() don't recognise it. The fix is to defer the mapping until after the middle layers have processed the error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09nfs: Panic when commit failsTerry Loftin
commit a8b40bc7e635831b61c43acc71a86d3a68b2dff0 upstream. Actually pass the NFS_FILE_SYNC option to the server to avoid a Panic in nfs_direct_write_complete() when a commit fails. At the end of an nfs write, if the nfs commit fails, all the writes will be rescheduled. They are supposed to be rescheduled as NFS_FILE_SYNC writes, but the rpc_task structure is not completely intialized and so the option is not passed. When the rescheduled writes complete, the return indicates that they are NFS_UNSTABLE and we try to do another commit. This leads to a Panic because the commit data structure pointer was set to null in the initial (failed) commit attempt. Signed-off-by: Terry Loftin <terry.loftin@hp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09hfsplus: refuse to mount volumes larger than 2TBBen Hutchings
commit 5c36fe3d87b3f0c85894a49193c66096a3d6b26f upstream. As found in <http://bugs.debian.org/550010>, hfsplus is using type u32 rather than sector_t for some sector number calculations. In particular, hfsplus_get_block() does: u32 ablock, dblock, mask; ... map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask)); I am not confident that I can find and fix all cases where a sector number may be truncated. For now, avoid data loss by refusing to mount HFS+ volumes with more than 2^32 sectors (2TB). [akpm@linux-foundation.org: fix 32 and 64-bit issues] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Eric Sesterhenn <snakebyte@gmx.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09fs: pipe.c null pointer dereferenceEarl Chew
commit ad3960243e55320d74195fb85c975e0a8cc4466c upstream. This patch fixes a null pointer exception in pipe_rdwr_open() which generates the stack trace: > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: > [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70 > [<ffffffff8028125c>] __dentry_open+0x13c/0x230 > [<ffffffff8028143d>] do_filp_open+0x2d/0x40 > [<ffffffff802814aa>] do_sys_open+0x5a/0x100 > [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67 The failure mode is triggered by an attempt to open an anonymous pipe via /proc/pid/fd/* as exemplified by this script: ============================================================= while : ; do { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } & PID=$! OUT=$(ps -efl | grep 'sleep 1' | grep -v grep | { read PID REST ; echo $PID; } ) OUT="${OUT%% *}" DELAY=$((RANDOM * 1000 / 32768)) usleep $((DELAY * 1000 + RANDOM % 1000 )) echo n > /proc/$OUT/fd/1 # Trigger defect done ============================================================= Note that the failure window is quite small and I could only reliably reproduce the defect by inserting a small delay in pipe_rdwr_open(). For example: static int pipe_rdwr_open(struct inode *inode, struct file *filp) { msleep(100); mutex_lock(&inode->i_mutex); Although the defect was observed in pipe_rdwr_open(), I think it makes sense to replicate the change through all the pipe_*_open() functions. The core of the change is to verify that inode->i_pipe has not been released before attempting to manipulate it. If inode->i_pipe is no longer present, return ENOENT to indicate so. The comment about potentially using atomic_t for i_pipe->readers and i_pipe->writers has also been removed because it is no longer relevant in this context. The inode->i_mutex lock must be used so that inode->i_pipe can be dealt with correctly. Signed-off-by: Earl Chew <earl_chew@agilent.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12eCryptfs: Prevent lower dentry from going negative during unlink (CVE-2009-2908)Tyler Hicks
commit 9c2d2056647790c5034d722bd24e9d913ebca73c upstream. When calling vfs_unlink() on the lower dentry, d_delete() turns the dentry into a negative dentry when the d_count is 1. This eventually caused a NULL pointer deref when a read() or write() was done and the negative dentry's d_inode was dereferenced in ecryptfs_read_update_atime() or ecryptfs_getxattr(). Placing mutt's tmpdir in an eCryptfs mount is what initially triggered the oops and I was able to reproduce it with the following sequence: open("/tmp/upper/foo", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW, 0600) = 3 link("/tmp/upper/foo", "/tmp/upper/bar") = 0 unlink("/tmp/upper/foo") = 0 open("/tmp/upper/bar", O_RDWR|O_CREAT|O_NOFOLLOW, 0600) = 4 unlink("/tmp/upper/bar") = 0 write(4, "eCryptfs test\n"..., 14 <unfinished ...> +++ killed by SIGKILL +++ https://bugs.launchpad.net/ecryptfs/+bug/387073 Reported-by: Loïc Minier <loic.minier@canonical.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: ecryptfs-devel@lists.launchpad.net Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05fs: make sure data stored into inode is properly seen before unlocking new inodeJan Kara
commit 580be0837a7a59b207c3d5c661d044d8dd0a6a30 upstream. In theory it could happen that on one CPU we initialize a new inode but clearing of I_NEW | I_LOCK gets reordered before some of the initialization. Thus on another CPU we return not fully uptodate inode from iget_locked(). This seems to fix a corruption issue on ext3 mounted over NFS. [akpm@linux-foundation.org: add some commentary] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24nfsd: fix hung up of nfs client while sync write data to nfs serverWei Yongjun
commit a0d24b295aed7a9daf4ca36bd4784e4d40f82303 upstream. nfsd: fix hung up of nfs client while sync write data to nfs server Commit 'Short write in nfsd becomes a full write to the client' (31dec2538e45e9fff2007ea1f4c6bae9f78db724) broken the sync write. With the following commands to reproduce: $ mount -t nfs -o sync 192.168.0.21:/nfsroot /mnt $ cd /mnt $ echo aaaa > temp.txt Then nfs client is hung up. In SYNC mode the server alaways return the write count 0 to the client. This is because the value of host_err in nfsd_vfs_write() will be overwrite in SYNC mode by 'host_err=nfsd_sync(file);', and then we return host_err(which is now 0) as write count. This patch fixed the problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24Short write in nfsd becomes a full write to the clientDavid Shaw
commit 31dec2538e45e9fff2007ea1f4c6bae9f78db724 upstream. Short write in nfsd becomes a full write to the client If a filesystem being written to via NFS returns a short write count (as opposed to an error) to nfsd, nfsd treats that as a success for the entire write, rather than the short count that actually succeeded. For example, given a 8192 byte write, if the underlying filesystem only writes 4096 bytes, nfsd will ack back to the nfs client that all 8192 bytes were written. The nfs client does have retry logic for short writes, but this is never called as the client is told the complete write succeeded. There are probably other ways it could happen, but in my case it happened with a fuse (filesystem in userspace) filesystem which can rather easily have a partial write. Here is a patch to properly return the short write count to the client. Signed-off-by: David Shaw <dshaw@jabberwocky.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24udf: Use device size when drive reported bogus number of written blocksJan Kara
commit 24a5d59f3477bcff4c069ff4d0ca9a3e037d0235 upstream. Some drives report 0 as the number of written blocks when there are some blocks recorded. Use device size in such case so that we can automagically mount such media. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24binfmt_elf: fix PT_INTERP bss handlingRoland McGrath
commit 9f0ab4a3f0fdb1ff404d150618ace2fa069bb2e1 upstream. In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15JFFS2: add missing verify buffer allocation/deallocationMassimo Cirillo
commit bc8cec0dff072f1a45ce7f6b2c5234bb3411ac51 upstream. The function jffs2_nor_wbuf_flash_setup() doesn't allocate the verify buffer if CONFIG_JFFS2_FS_WBUF_VERIFY is defined, so causing a kernel panic when that macro is enabled and the verify function is called. Similarly the jffs2_nor_wbuf_flash_cleanup() must free the buffer if CONFIG_JFFS2_FS_WBUF_VERIFY is enabled. The following patch fixes the problem. The following patch applies to 2.6.30 kernel. Signed-off-by: Massimo Cirillo <maxcir@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-09OCFS2: fix build errorGreg Kroah-Hartman
Somehow a previous patch did not get committed correctly. This fixes the build. Thanks to Jayson King, Michael Tokarev, Joel Becker, and Chuck Ebbert for pointing out the problem, and the solution. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08ocfs2: ocfs2_write_begin_nolock() should handle len=0Sunil Mushran
commit 8379e7c46cc48f51197dd663fc6676f47f2a1e71 upstream. Bug introduced by mainline commit e7432675f8ca868a4af365759a8d4c3779a3d922 The bug causes ocfs2_write_begin_nolock() to oops when len=0. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08ocfs2: Initialize the cluster we're writing to in a non-sparse extendSunil Mushran
commit e7432675f8ca868a4af365759a8d4c3779a3d922 upstream. In a non-sparse extend, we correctly allocate (and zero) the clusters between the old_i_size and pos, but we don't zero the portions of the cluster we're writing to outside of pos<->len. It handles clustersize > pagesize and blocksize < pagesize. [Cleaned up by Joel Becker.] Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-17Revert "compat_ioctl: hook up compat handler for FIEMAP ioctl"Greg Kroah-Hartman
This reverts commit 9ac3664242f11fb38ea5029712bc77ee317fe38c. This ioctl is not present in the 2.6.27 tree. I incorrectly added this patch to this tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16NFS: Fix an O_DIRECT Oops...Trond Myklebust
commit 1ae88b2e446261c038f2c0c3150ffae142b227a2 upstream. We can't call nfs_readdata_release()/nfs_writedata_release() without first initialising and referencing args.context. Doing so inside nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment() causes an Oops. We should rather be calling nfs_readdata_free()/nfs_writedata_free() in those cases. Looking at the O_DIRECT code, the "struct nfs_direct_req" is already referencing the nfs_open_context for us. Since the readdata and writedata structures carry a reference to that, we can simplify things by getting rid of the extra nfs_open_context references, so that we can replace all instances of nfs_readdata_release()/nfs_writedata_release(). Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16mm_for_maps: shift down_read(mmap_sem) to the callerOleg Nesterov
commit 00f89d218523b9bf6b522349c039d5ac80aa536d upstream. mm_for_maps() takes ->mmap_sem after security checks, this looks strange and obfuscates the locking rules. Move this lock to its single caller, m_start(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16mm_for_maps: simplify, use ptrace_may_access()Oleg Nesterov
commit 13f0feafa6b8aead57a2a328e2fca6a5828bf286 upstream. It would be nice to kill __ptrace_may_access(). It requires task_lock(), but this lock is only needed to read mm->flags in the middle. Convert mm_for_maps() to use ptrace_may_access(), this also simplifies the code a little bit. Also, we do not need to take ->mmap_sem in advance. In fact I think mm_for_maps() should not play with ->mmap_sem at all, the caller should take this lock. With or without this patch, without ->cred_guard_mutex held we can race with exec() and get the new ->mm but check old creds. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16flat: fix uninitialized ptr with shared libsLinus Torvalds
commit 3440625d78711bee41a84cf29c3d8c579b522666 upstream. The new credentials code broke load_flat_shared_library() as it now uses an uninitialized cred pointer. Reported-by: Bernd Schmidt <bernds_cb1@t-online.de> Tested-by: Bernd Schmidt <bernds_cb1@t-online.de> Cc: Mike Frysinger <vapier@gentoo.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16compat_ioctl: hook up compat handler for FIEMAP ioctlEric Sandeen
commit 69130c7cf96ea853dc5be599dd6a4b98907d39cc upstream. The FIEMAP_IOC_FIEMAP mapping ioctl was missing a 32-bit compat handler, which means that 32-bit suerspace on 64-bit kernels cannot use this ioctl command. The structure is nicely aligned, padded, and sized, so it is just this simple. Tested w/ 32-bit ioctl tester (from Josef) on a 64-bit kernel on ext4. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Mark Lord <lkml@rtr.ca> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Josef Bacik <josef@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16sysfs: fix hardlink count on device_movePeter Oberparleiter
commit 0f58b44582001c8bcdb75f36cf85ebbe5170e959 upstream. Update directory hardlink count when moving kobjects to a new parent. Fixes the following problem which occurs when several devices are moved to the same parent and then unregistered: > ls -laF /sys/devices/css0/defunct/ > total 0 > drwxr-xr-x 4294967295 root root 0 2009-07-14 17:02 ./ > drwxr-xr-x 114 root root 0 2009-07-14 17:02 ../ > drwxr-xr-x 2 root root 0 2009-07-14 17:01 power/ > -rw-r--r-- 1 root root 4096 2009-07-14 17:01 uevent Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30eCryptfs: parse_tag_3_packet check tag 3 packet encrypted key size ↵Ramon de Carvalho Valle
(CVE-2009-2407) commit f151cd2c54ddc7714e2f740681350476cda03a28 upstream. The parse_tag_3_packet function does not check if the tag 3 packet contains a encrypted key size larger than ECRYPTFS_MAX_ENCRYPTED_KEY_BYTES. Signed-off-by: Ramon de Carvalho Valle <ramon@risesecurity.org> [tyhicks@linux.vnet.ibm.com: Added printk newline and changed goto to out_free] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30eCryptfs: Check Tag 11 literal data buffer size (CVE-2009-2406)Tyler Hicks
commit 6352a29305373ae6196491e6d4669f301e26492e upstream. Tag 11 packets are stored in the metadata section of an eCryptfs file to store the key signature(s) used to encrypt the file encryption key. After extracting the packet length field to determine the key signature length, a check is not performed to see if the length would exceed the key signature buffer size that was passed into parse_tag_11_packet(). Thanks to Ramon de Carvalho Valle for finding this bug using fsfuzzer. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30elf: fix one check-after-useAmerigo Wang
commit e2dbe12557d85d81f4527879499f55681c3cca4f upstream. Check before use it. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02jbd: fix race in buffer processing in commit codeJan Kara
commit a61d90d75d0f9e86432c45b496b4b0fbf0fd03dc upstream. In commit code, we scan buffers attached to a transaction. During this scan, we sometimes have to drop j_list_lock and then we recheck whether the journal buffer head didn't get freed by journal_try_to_free_buffers(). But checking for buffer_jbd(bh) isn't enough because a new journal head could get attached to our buffer head. So add a check whether the journal head remained the same and whether it's still at the same transaction and list. This is a nasty bug and can cause problems like memory corruption (use after free) or trigger various assertions in JBD code (observed). Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Fix race in ext4_inode_info.i_cached_extentTheodore Ts'o
(cherry picked from commit 2ec0ae3acec47f628179ee95fe2c4da01b5e9fc4) If two CPU's simultaneously call ext4_ext_get_blocks() at the same time, there is nothing protecting the i_cached_extent structure from being used and updated at the same time. This could potentially cause the wrong location on disk to be read or written to, including potentially causing the corruption of the block group descriptors and/or inode table. This bug has been in the ext4 code since almost the very beginning of ext4's development. Fortunately once the data is stored in the page cache cache, ext4_get_blocks() doesn't need to be called, so trying to replicate this problem to the point where we could identify its root cause was *extremely* difficult. Many thanks to Kevin Shanahan for working over several months to be able to reproduce this easily so we could finally nail down the cause of the corruption. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Clear the unwritten buffer_head flag after the extent is initializedAneesh Kumar K.V
(cherry picked from commit 2a8964d63d50dd2d65d71d342bc7fb6ef4117614) The BH_Unwritten flag indicates that the buffer is allocated on disk but has not been written; that is, the disk was part of a persistent preallocation area. That flag should only be set when a get_blocks() function is looking up a inode's logical to physical block mapping. When ext4_get_blocks_wrap() is called with create=1, the uninitialized extent is converted into an initialized one, so the BH_Unwritten flag is no longer appropriate. Hence, we need to make sure the BH_Unwritten is not left set, since the combination of BH_Mapped and BH_Unwritten is not allowed; among other things, it will result ext4's get_block() to be called over and over again during the write_begin phase of write(2). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Use a fake block number for delayed new buffer_headAneesh Kumar K.V
(cherry picked from commit 33b9817e2ae097c7b8d256e3510ac6c54fc6d9d0) Use a very large unsigned number (~0xffff) as as the fake block number for the delayed new buffer. The VFS should never try to write out this number, but if it does, this will make it obvious. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Fix sub-block zeroing for writes into preallocated extentsAneesh Kumar K.V
(cherry picked from commit 9c1ee184a30394e54165fa4c15923cabd952c106) We need to mark the buffer_head mapping preallocated space as new during write_begin. Otherwise we don't zero out the page cache content properly for a partial write. This will cause file corruption with preallocation. Now that we mark the buffer_head new we also need to have a valid buffer_head blocknr so that unmap_underlying_metadata() unmaps the correct block. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Ignore i_file_acl_high unless EXT4_FEATURE_INCOMPAT_64BIT is presentTheodore Ts'o
(cherry picked from commit a9e817425dc0baede8ebe5fbc9984a640257432b) Don't try to look at i_file_acl_high unless the INCOMPAT_64BIT feature bit is set. The field is normally zero, but older versions of e2fsck didn't automatically check to make sure of this, so in the spirit of "be liberal in what you accept", don't look at i_file_acl_high unless we are using a 64-bit filesystem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Fix softlockup caused by illegal i_file_acl value in on-disk inodeTheodore Ts'o
(cherry picked from commit 485c26ec70f823f2a9cf45982b724893e53a859e) If the block containing external extended attributes (which is stored in i_file_acl and i_file_acl_high) is larger than the on-disk filesystem, the process which tried to access the extended attributes will endlessly issue kernel printks complaining that "__find_get_block_slow() failed", locking up that CPU until the system is forcibly rebooted. So when we read in the inode, make sure the i_file_acl value is legal, and if not, flag the filesystem as being corrupted. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: really print the find_group_flex fallback warning only onceChuck Ebbert
(cherry picked from commit 6b82f3cb2d480b7714eb0ff61aee99c22160389e) Missing braces caused the warning to print more than once. Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: fix locking typo in mballoc which could cause soft lockup hangsTheodore Ts'o
upstream commit: e7c9e3e99adf6c49c5d593a51375916acc039d1e Smatch (http://repo.or.cz/w/smatch.git/) complains about the locking in ext4_mb_add_n_trim() from fs/ext4/mballoc.c 4438 list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[order], 4439 pa_inode_list) { 4440 spin_lock(&tmp_pa->pa_lock); 4441 if (tmp_pa->pa_deleted) { 4442 spin_unlock(&pa->pa_lock); 4443 continue; 4444 } Brown paper bag time... Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: fix typo which causes a memory leak on error pathDan Carpenter
upstream commit: a7b19448ddbdc34b2b8fedc048ba154ca798667b This was found by smatch (http://repo.or.cz/w/smatch.git/) Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11jbd2: Update locking comentsJan Kara
(cherry picked from commit 86db97c87f744364d5889ca8a4134ca2048b8f83) Update information about locking in JBD2 revoke code. Inconsistency in comments found by Lin Tan <tammy000@gmail.com> CC: Lin Tan <tammy000@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Check for an valid i_mode when reading the inode from diskTheodore Ts'o
(cherry picked from commit 563bdd61fe4dbd6b58cf7eb06f8d8f14479ae1dc) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Fix discard of inode prealloc space with delayed allocation.Aneesh Kumar K.V
(cherry picked from commit d6014301b5599fba395c42a1e96a7fe86f7d0b2d) With delayed allocation we should not/cannot discard inode prealloc space during file close. We would still have dirty pages for which we haven't allocated blocks yet. With this fix after each get_blocks request we check whether we have zero reserved blocks and if yes and we don't have any writers on the file we discard inode prealloc space. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Automatically allocate delay allocated blocks on renameTheodore Ts'o
(cherry picked from commit 8750c6d5fcbd3342b3d908d157f81d345c5325a7) When renaming a file such that a link to another inode is overwritten, force any delay allocated blocks that to be allocated so that if the filesystem is mounted with data=ordered, the data blocks will be pushed out to disk along with the journal commit. Many application programs expect this, so we do this to avoid zero length files if the system crashes unexpectedly. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: Automatically allocate delay allocated blocks on closeTheodore Ts'o
(cherry picked from commit 7d8f9f7d150dded7b68e61ca6403a1f166fb4edf) When closing a file that had been previously truncated, force any delay allocated blocks that to be allocated so that if the filesystem is mounted with data=ordered, the data blocks will be pushed out to disk along with the journal commit. Many application programs expect this, so we do this to avoid zero length files if the system crashes unexpectedly. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctlTheodore Ts'o
(cherry picked from commit ccd2506bd43113659aa904d5bea5d1300605e2a6) Add an ioctl which forces all of the delay allocated blocks to be allocated. This also provides a function ext4_alloc_da_blocks() which will be used by the following commits to force files to be fully allocated to preserve application-expected ext3 behaviour. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: return -EIO not -ESTALE on directory traversal through deleted inodeBryan Donlan
(cherry picked from commit e6f009b0b45220c004672d41a58865e94946104d) ext4_iget() returns -ESTALE if invoked on a deleted inode, in order to report errors to NFS properly. However, in ext4_lookup(), this -ESTALE can be propagated to userspace if the filesystem is corrupted such that a directory entry references a deleted inode. This leads to a misleading error message - "Stale NFS file handle" - and confusion on the part of the admin. The bug can be easily reproduced by creating a new filesystem, making a link to an unused inode using debugfs, then mounting and attempting to ls -l said link. This patch thus changes ext4_lookup to return -EIO if it receives -ESTALE from ext4_iget(), as ext4 does for other filesystem metadata corruption; and also invokes the appropriate ext*_error functions when this case is detected. Signed-off-by: Bryan Donlan <bdonlan@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: tighten restrictions on inode flagsDuane Griffin
(cherry picked from commit 2dc6b0d48ca0599837df21b14bb8393d0804af57) At the moment there are few restrictions on which flags may be set on which inodes. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. Introduces a flags masking function which masks flags based on mode and use it during inode creation and when flags are set via the ioctl to facilitate future consistency. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11ext4: don't inherit inappropriate inode flags from parentDuane Griffin
(cherry picked from commit 8fa43a81b97853fc69417bb6054182e78f95cbeb) At present INDEX and EXTENTS are the only flags that new ext4 inodes do NOT inherit from their parent. In addition prevent the flags DIRTY, ECOMPR, IMAGIC, TOPDIR, HUGE_FILE and EXT_MIGRATE from being inherited. List inheritable flags explicitly to prevent future flags from accidentally being inherited. This fixes the TOPDIR flag inheritance bug reported at http://bugzilla.kernel.org/show_bug.cgi?id=9866. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>