summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2008-07-14fix SMP ordering hole in fcntl_setlk() (CVE-2008-1669)Al Viro
fcntl_setlk()/close() race prevention has a subtle hole - we need to make sure that if we *do* have an fcntl/close race on SMP box, the access to descriptor table and inode->i_flock won't get reordered. As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs. STORE descriptor table entry, LOAD inode->i_flock with not a single lock in common on both sides. We do have BKL around the first STORE, but check in locks_remove_posix() is outside of BKL and for a good reason - we don't want BKL on common path of close(2). Solution is to hold ->file_lock around fcheck() in there; that orders us wrt removal from descriptor table that preceded locks_remove_posix() on close path and we either come first (in which case eviction will be handled by the close side) or we'll see the effect of close and do eviction ourselves. Note that even though it's read-only access, we do need ->file_lock here - rcu_read_lock() won't be enough to order the things. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-07-14asn1: additional sanity checking during BER decoding (CVE-2008-1673)Chris Wright
- Don't trust a length which is greater than the working buffer. An invalid length could cause overflow when calculating buffer size for decoding oid. - An oid length of zero is invalid and allows for an off-by-one error when decoding oid because the first subid actually encodes first 2 subids. - A primitive encoding may not have an indefinite length. Thanks to Wei Wang from McAfee for report. Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-01-21NFS: call nfs_wb_all() only on regular filesTrond Myklebust
It looks like nfs_setattr() and nfs_rename() also need to test whether the target is a regular file before calling nfs_wb_all()... It isn't technically needed since the version of nfs_wb_all() that exists on 2.6.16 should be safe to call on non-regular files (it will be a no-op). However it is a useful optimisation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-01-21NFS: writes should not clobber utimes() callsTrond Myklebust
Ensure that we flush out writes in the case when someone calls utimes() in order to set the file times. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-01-21vfs: coredumping fix (CVE-2007-6206)Ingo Molnar
fix: http://bugzilla.kernel.org/show_bug.cgi?id=3043 only allow coredumping to the same uid that the coredumping task runs under. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-01-16limit minixfs printks on corrupted dir i_size (CVE-2006-6058)Eric Sandeen
First reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html Essentially a corrupted minix dir inode reporting a very large i_size will loop for a very long time in minix_readdir, minix_find_entry, etc, because on EIO they just move on to try the next page. This is under the BKL, printk-storming as well. This can lock up the machine for a very long time. Simply ratelimiting the printks gets things back under control. Make the message a bit more informative while we're here. Adrian Bunk: Backported to 2.6.16. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-01-16fix messages in fs/minixDenis Vlasenko
Believe it or not, but in fs/minix/*, the oldest filesystem in the kernel, something still can be fixed: printk("new_inode: bit already set"); "\n" is missing! While at it, I also removed periods from the end of error messages and made capitalization uniform. Also s/i-node/inode/, s/printk (/printk(/ Signed-off-by: Denis Vlasenko <vda@ilport.com.ua> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-01-16Use access mode instead of open flags to determine needed permissions ↵Linus Torvalds
(CVE-2008-0001) patch 974a9f0b47da74e28f68b9c8645c3786aa5ace1a in mainline Way back when (in commit 834f2a4a1554dc5b2598038b3fe8703defcbe467, aka "VFS: Allow the filesystem to return a full file pointer on open intent" to be exact), Trond changed the open logic to keep track of the original flags to a file open, in order to pass down the the intent of a dentry lookup to the low-level filesystem. However, when doing that reorganization, it changed the meaning of namei_flags, and thus inadvertently changed the test of access mode for directories (and RO filesystem) to use the wrong flag. So fix those test back to use access mode ("acc_mode") rather than the open flag ("flag"). Issue noticed by Bill Roman at Datalight. Reported-and-tested-by: Bill Roman <bill.roman@datalight.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-11-03knfsd: allow nfsd READDIR to return 64bit cookiesNeil Brown
->readdir passes lofft_t offsets (used as nfs cookies) to nfs3svc_encode_entry{,_plus}, but when they pass it on to encode_entry it becomes an 'off_t', which isn't good. So filesystems that returned 64bit offsets would lose. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-11-03buffer: memorder fixNick Piggin
unlock_buffer(), like unlock_page(), must not clear the lock without ensuring that the critical section is closed. Mingming later sent the same patch, saying: We are running SDET benchmark and saw double free issue for ext3 extended attributes block, which complains the same xattr block already being freed (in ext3_xattr_release_block()). The problem could also been triggered by multiple threads loop untar/rm a kernel tree. The race is caused by missing a memory barrier at unlock_buffer() before the lock bit being cleared, resulting in possible concurrent h_refcounter update. That causes a reference counter leak, then later leads to the double free that we have seen. Inside unlock_buffer(), there is a memory barrier is placed *after* the lock bit is being cleared, however, there is no memory barrier *before* the bit is cleared. On some arch the h_refcount update instruction and the clear bit instruction could be reordered, thus leave the critical section re-entered. The race is like this: For example, if the h_refcount is initialized as 1, cpu 0: cpu1 -------------------------------------- ----------------------------------- lock_buffer() /* test_and_set_bit */ clear_buffer_locked(bh); lock_buffer() /* test_and_set_bit */ h_refcount = h_refcount+1; /* = 2*/ h_refcount = h_refcount + 1; /*= 2 */ clear_buffer_locked(bh); .... ...... We lost a h_refcount here. We need a memory barrier before the buffer head lock bit being cleared to force the order of the two writes. Please apply. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-11-02CIFS should honour umask (CVE-2007-3740)Steve French
This patch makes CIFS honour a process' umask like other filesystems. Of course the server is still free to munge the permissions if it wants to; but the client will send the "right" permissions to begin with. A few caveats: 1) It only applies to filesystems that have CAP_UNIX (aka support unix extensions) 2) It applies the correct mode to the follow up CIFSSMBUnixSetPerms() after remote creation When mode to CIFS/NTFS ACL mapping is complete we can do the same thing for that case for servers which do not support the Unix Extensions. Signed-off-by: Matt Keenen <matt@opcode-solutions.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-28hugetlb: fix size=4G parsingHugh Dickins
On 32-bit machines, mount -t hugetlbfs -o size=4G gave a 0GB filesystem, size=5G gave a 1GB filesystem etc: there's no point in masking size with HPAGE_MASK just before shifting its lower bits away, and since HPAGE_MASK is a UL, that removed all the higher bits of the unsigned long long size. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19hugetlb: fix prio_tree unit (CVE-2007-4133)Hugh Dickins
hugetlb_vmtruncate_list was misconverted to prio_tree: its prio_tree is in units of PAGE_SIZE (PAGE_CACHE_SIZE) like any other, not HPAGE_SIZE (whereas its radix_tree is kept in units of HPAGE_SIZE, otherwise slots would be absurdly sparse). At first I thought the error benign, just calling __unmap_hugepage_range on more vmas than necessary; but on 32-bit machines, when the prio_tree is searched correctly, it happens to ensure the v_offset calculation won't overflow. As it stood, when truncating at or beyond 4GB, it was liable to discard pages COWed from lower offsets; or even to clear pmd entries of preceding vmas, triggering exit_mmap's BUG_ON(nr_ptes). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19hugetlbfs: add Kconfig help textArthur Othieno
In kernel bugzilla #6248 (http://bugzilla.kernel.org/show_bug.cgi?id=6248), Adrian Bunk <bunk@stusta.de> notes that CONFIG_HUGETLBFS is missing Kconfig help text. Signed-off-by: Arthur Othieno <apgo@patchbomb.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-07sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses (CVE-2007-3104)Eric Sandeen
Backport of ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/broken-out/gregkh-driver-sysfs-allocate-inode-number-using-ida.patch For regular files in sysfs, sysfs_readdir wants to traverse sysfs_dirent->s_dentry->d_inode->i_ino to get to the inode number. But, the dentry can be reclaimed under memory pressure, and there is no synchronization with readdir. This patch follows Tejun's scheme of allocating and storing an inode number in the new s_ino member of a sysfs_dirent, when dirents are created, and retrieving it from there for readdir, so that the pointer chain doesn't have to be traversed. Tejun's upstream patch uses a new-ish "ida" allocator which brings along some extra complexity; this -stable patch has a brain-dead incrementing counter which does not guarantee uniqueness, but because sysfs doesn't hash inodes as iunique expects, uniqueness wasn't guaranteed today anyway. Adrian Bunk: Backported to 2.6.16. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-07Reset current->pdeath_signal on SUID binary execution (CVE-2007-3848)Marcel Holtmann
This fixes a vulnerability in the "parent process death signal" implementation discoverd by Wojciech Purczynski of COSEINC PTE Ltd. and iSEC Security Research. http://marc.info/?l=bugtraq&m=118711306802632&w=2 Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-07-22coda: wrong order of arguments of ->readdir()Al Viro
Shows how many people are testing coda - the bug had been there for 5 years and results of stepping on it are not subtle. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-07-22ntfs_init_locked_inode(): fix array indexingAndrew Morton
Local variable `i' is a byte-counter. Don't use it as an index into an array of le32's. Reported-by: "young dave" <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-04-26aio: remove bare user-triggerable error printkZach Brown
The user can generate console output if they cause do_mmap() to fail during sys_io_setup(). This was seen in a regression test that does exactly that by spinning calling mmap() until it gets -ENOMEM before calling io_setup(). We don't need this printk at all, just remove it. Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-03-02fs/bad_inode.c 64bit fixAdrian Bunk
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-26fix ext3 block bitmap leakageKirill Korotaev
This patch fixes ext3 block bitmap leakage, which leads to the following fsck messages on _healthy_ filesystem: Block bitmap differences: -64159 -73707 All kernels up to 2.6.17 have this bug. Found by Vasily Averin <vvs@sw.ru> and Andrey Savochkin <saw@sawoct.com> Test case triggered the issue was created by Dmitry Monakhov <dmonakhov@sw.ru> Signed-Off-By: Kirill Korotaev <dev@openvz.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-21fix bad_inode_ops memory corruption (CVE-2006-5753)Eric Sandeen
CVE-2006-5753 is for a case where an inode can be marked bad, switching the ops to bad_inode_ops, which are all connected as: static int return_EIO(void) { return -EIO; } #define EIO_ERROR ((void *) (return_EIO)) static struct inode_operations bad_inode_ops = { .create = bad_inode_create ...etc... The problem here is that the void cast causes return types to not be promoted, and for ops such as listxattr which expect more than 32 bits of return value, the 32-bit -EIO is interpreted as a large positive 64-bit number, i.e. 0x00000000fffffffa instead of 0xfffffffa. This goes particularly badly when the return value is taken as a number of bytes to copy into, say, a user's buffer for example... I originally had coded up the fix by creating a return_EIO_<TYPE> macro for each return type, like this: static int return_EIO_int(void) { return -EIO; } #define EIO_ERROR_INT ((void *) (return_EIO_int)) static struct inode_operations bad_inode_ops = { .create = EIO_ERROR_INT, ...etc... but Al felt that it was probably better to create an EIO-returner for each actual op signature. Since so few ops share a signature, I just went ahead & created an EIO function for each individual file & inode op that returns a value. Adrian Bunk: backported to 2.6.16 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-21Fix a free-wrong-pointer bug in nfs/acl server (CVE-2007-0772)Greg Banks
Due to type confusion, when an nfsacl verison 2 'ACCESS' request finishes and tries to clean up, it calls fh_put on entiredly the wrong thing and this can cause an oops. Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-13Fix up CIFS for "test_clear_page_dirty()" removalLinus Torvalds
This also adds he required page "writeback" flag handling, that cifs hasn't been doing and that the page dirty flag changes made obvious. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Acked-by: Steve French <smfltc@us.ibm.com>
2007-02-13fix umask when noACL kernel meets extN tuned for ACLsHugh Dickins
Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or ext3 filesystem is tuned to mount with "acl", but mounted by a kernel built without ACL support, then umask was ignored when creating inodes - though root or user has umask 022, touch creates files as 0666, and mkdir creates directories as 0777. This appears to have worked right until 2.6.11, when a fix to the default mode on symlinks (always 0777) assumed VFS applies umask: which it does, unless the mount is marked for ACLs; but ext[23] set MS_POSIXACL in s_flags according to s_mount_opt set according to def_mount_opts. We could revert to the 2.6.10 ext[23]_init_acl (adding an S_ISLNK test); but other filesystems only set MS_POSIXACL when ACLs are configured. We could fix this at another level; but it seems most robust to avoid setting the s_mount_opt flag in the first place (at the expense of more ifdefs). Likewise don't set the XATTR_USER flag when built without XATTR support. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-03reiserfs: avoid tail packing if an inode was ever mmappedVladimir Saveliev
This patch fixes a confusion reiserfs has for a long time. On release file operation reiserfs used to try to pack file data stored in last incomplete page of some files into metadata blocks. After packing the page got cleared with clear_page_dirty. It did not take into account that the page may be mmaped into other process's address space. Recent replacement for clear_page_dirty cancel_dirty_page found the confusion with sanity check that page has to be not mapped. The patch fixes the confusion by making reiserfs avoid tail packing if an inode was ever mmapped. reiserfs_mmap and reiserfs_file_release are serialized with mutex in reiserfs specific inode. reiserfs_mmap locks the mutex and sets a bit in reiserfs specific inode flags. reiserfs_file_release checks the bit having the mutex locked. If bit is set - tail packing is avoided. This eliminates a possibility that mmapped page gets cancel_page_dirty-ed. Signed-off-by: Vladimir Saveliev <vs@namesys.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-22adfs: fix filename handlingJames Bursa
Fix filenames on adfs discs being terminated at the first character greater than 128 (adfs filenames are Latin 1). I saw this problem when using a loopback adfs image on a 2.6.17-rc5 x86_64 machine, and the patch fixed it there. Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-20mm: fix bug in set_page_dirty_buffersNick Piggin
This was triggered, but not the fault of, the dirty page accounting patches. Suitable for -stable as well, after it goes upstream. Unable to handle kernel NULL pointer dereference at virtual address 0000004c EIP is at _spin_lock+0x12/0x66 Call Trace: [<401766e7>] __set_page_dirty_buffers+0x15/0xc0 [<401401e7>] set_page_dirty+0x2c/0x51 [<40140db2>] set_page_dirty_balance+0xb/0x3b [<40145d29>] __do_fault+0x1d8/0x279 [<40147059>] __handle_mm_fault+0x125/0x951 [<401133f1>] do_page_fault+0x440/0x59f [<4034d0c1>] error_code+0x39/0x40 [<08048a33>] 0x8048a33 ======================= Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09skip data conversion in compat_sys_mount when data_page is NULLAndrey Mirkin
OpenVZ Linux kernel team has found a problem with mounting in compat mode. Simple command "mount -t smbfs ..." on Fedora Core 5 distro in 32-bit mode leads to oops: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff802bc7c6>] compat_sys_mount+0xd6/0x290 PGD 34d48067 PUD 34d03067 PMD 0 Oops: 0000 [1] SMP CPU: 0 Modules linked in: iptable_nat simfs smbfs ip_nat ip_conntrack vzdquota parport_pc lp parport 8021q bridge llc vznetdev vzmon nfs lockd sunrpc vzdev iptable_filter af_packet xt_length ipt_ttl xt_tcpmss ipt_TCPMSS iptable_mangle xt_limit ipt_tos ipt_REJECT ip_tables x_tables thermal processor fan button battery asus_acpi ac uhci_hcd ehci_hcd usbcore i2c_i801 i2c_core e100 mii floppy ide_cd cdrom Pid: 14656, comm: mount RIP: 0060:[<ffffffff802bc7c6>] [<ffffffff802bc7c6>] compat_sys_mount+0xd6/0x290 RSP: 0000:ffff810034d31f38 EFLAGS: 00010292 RAX: 000000000000002c RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff810034c86bc0 RSI: 0000000000000096 RDI: ffffffff8061fc90 RBP: ffff810034d31f78 R08: 0000000000000000 R09: 000000000000000d R10: ffff810034d31e58 R11: 0000000000000001 R12: ffff810039dc3000 R13: 000000000805ea48 R14: 0000000000000000 R15: 00000000c0ed0000 FS: 0000000000000000(0000) GS:ffffffff80749000(0033) knlGS:00000000b7d556b0 CS: 0060 DS: 007b ES: 007b CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000034d43000 CR4: 00000000000006e0 Process mount (pid: 14656, veid=300, threadinfo ffff810034d30000, task ffff810034c86bc0) Stack: 0000000000000000 ffff810034dd0000 ffff810034e4a000 000000000805ea48 0000000000000000 0000000000000000 0000000000000000 0000000000000000 000000000805ea48 ffffffff8021e64e 0000000000000000 0000000000000000 Call Trace: [<ffffffff8021e64e>] ia32_sysret+0x0/0xa Code: 83 3b 06 0f 85 41 01 00 00 0f b7 43 0c 89 43 14 0f b7 43 0a RIP [<ffffffff802bc7c6>] compat_sys_mount+0xd6/0x290 RSP <ffff810034d31f38> CR2: 0000000000000000 The problem is that data_page pointer can be NULL, so we should skip data conversion in this case. Signed-off-by: Andrey Mirkin <amirkin@openvz.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09corrupted cramfs filesystems cause kernel oops (CVE-2006-5823)Phillip Lougher
Steve Grubb's fzfuzzer tool (http://people.redhat.com/sgrubb/files/ fsfuzzer-0.6.tar.gz) generates corrupt Cramfs filesystems which cause Cramfs to kernel oops in cramfs_uncompress_block(). The cause of the oops is an unchecked corrupted block length field read by cramfs_readpage(). This patch adds a sanity check to cramfs_readpage() which checks that the block length field is sensible. The (PAGE_CACHE_SIZE << 1) size check is intentional, even though the uncompressed data is not going to be larger than PAGE_CACHE_SIZE, gzip sometimes generates compressed data larger than the original source data. Mkcramfs checks that the compressed size is always less than or equal to PAGE_CACHE_SIZE << 1. Of course Cramfs could use the original uncompressed data in this case, but it doesn't. Signed-off-by: Phillip Lougher <phillip@lougher.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09handle ext3 directory corruption better (CVE-2006-6053)Eric Sandeen
I've been using Steve Grubb's purely evil "fsfuzzer" tool, at http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gz Basically it makes a filesystem, splats some random bits over it, then tries to mount it and do some simple filesystem actions. At best, the filesystem catches the corruption gracefully. At worst, things spin out of control. As you might guess, we found a couple places in ext3 where things spin out of control :) First, we had a corrupted directory that was never checked for consistency... it was corrupt, and pointed to another bad "entry" of length 0. The for() loop looped forever, since the length of ext3_next_entry(de) was 0, and we kept looking at the same pointer over and over and over and over... I modeled this check and subsequent action on what is done for other directory types in ext3_readdir... (adding this check adds some computational expense; I am testing a followup patch to reduce the number of times we check and re-check these directory entries, in all cases. Thanks for the idea, Andreas). Next we had a root directory inode which had a corrupted size, claimed to be > 200M on a 4M filesystem. There was only really 1 block in the directory, but because the size was so large, readdir kept coming back for more, spewing thousands of printk's along the way. Per Andreas' suggestion, if we're in this read error condition and we're trying to read an offset which is greater than i_blocks worth of bytes, stop trying, and break out of the loop. With these two changes fsfuzz test survives quite well on ext3. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09ext2: skip pages past number of blocks in ext2_find_entry (CVE-2006-6054)Eric Sandeen
This one was pointed out on the MOKB site: http://kernelfun.blogspot.com/2006/11/mokb-09-11-2006-linux-26x-ext2checkpage.html If a directory's i_size is corrupted, ext2_find_entry() will keep processing pages until the i_size is reached, even if there are no more blocks associated with the directory inode. This patch puts in some minimal sanity-checking so that we don't keep checking pages (and issuing errors) if we know there can be no more data to read, based on the block count of the directory inode. This is somewhat similar in approach to the ext3 patch I sent earlier this year. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09hfs_fill_super returns success even if no root inode (CVE-2006-6056)Eric Sandeen
http://kernelfun.blogspot.com/2006/11/mokb-14-11-2006-linux-26x-selinux.html mount that image... fs: filesystem was not cleanly unmounted, running fsck.hfs is recommended. mounting read-only. hfs: get root inode failed. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018 printing eip ... EIP is at superblock_doinit+0x21/0x767 ... [] selinux_sb_kern_mount+0xc/0x4b [] vfs_kern_mount+0x99/0xf6 [] do_kern_mount+0x2d/0x3e [] do_mount+0x5fa/0x66d [] sys_mount+0x77/0xae [] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb hfs_fill_super() returns success even if root_inode = hfs_iget(sb, &fd.search_key->cat, &rec); or sb->s_root = d_alloc_root(root_inode); fails. This superblock finds its way to superblock_doinit() which does: struct dentry *root = sb->s_root; struct inode *inode = root->d_inode; and boom. Need to make sure the error cases return an error, I think. [akpm@osdl.org: return -ENOMEM on oom] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09grow_buffers() infinite loop fix (CVE-2006-5757/CVE-2006-6060)Andrew Morton
If grow_buffers() is for some reason passed a block number which wants to li outside the maximum-addressable pagecache range (PAGE_SIZE * 4G bytes) then will accidentally truncate `index' and will then instnatiate a page at the wrong pagecache offset. This causes __getblk_slow() to go into an infinite loop. This can happen with corrupted disks, or with software errors elsewhere. Detect that, and handle it. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-04fuse: fix hang on SMPMiklos Szeredi
Fuse didn't always call i_size_write() with i_mutex held which caused rare hangs on SMP/32bit. This bug has been present since fuse-2.2, well before being merged into mainline. The simplest solution is to protect i_size_write() with the per-connection spinlock. Using i_mutex for this purpose would require some restructuring of the code and I'm not even sure it's always safe to acquire i_mutex in all places i_size needs to be set. Since most of vmtruncate is already duplicated for other reasons, duplicate the remaining part as well, making all i_size_write() calls internal to fuse. Using i_size_write() was unnecessary in fuse_init_inode(), since this function is only called on a newly created locked inode. Reported by a few people over the years, but special thanks to Dana Henriksen who was persistent enough in helping me debug it. Adrian Bunk: Backported to 2.6.16. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-18NFS: nfs_lookup - don't hash dentry when optimising away the lookupTrond Myklebust
If the open intents tell us that a given lookup is going to result in a, exclusive create, we currently optimize away the lookup call itself. The reason is that the lookup would not be atomic with the create RPC call, so why do it in the first place? A problem occurs, however, if the VFS aborts the exclusive create operation after the lookup, but before the call to create the file/directory: in this case we will end up with a hashed negative dentry in the dcache that has never been looked up. Fix this by only actually hashing the dentry once the create operation has been successfully completed. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-09binfmt_elf: fix checks for bad addressChuck Ebbert
Fix check for bad address; use macro instead of open-coding two checks. Taken from RHEL4 kernel update. From: Ernie Petrides <petrides@redhat.com> For background, the BAD_ADDR() macro should return TRUE if the address is TASK_SIZE, because that's the lowest address that is *not* valid for user-space mappings. The macro was correct in binfmt_aout.c but was wrong for the "equal to" case in binfmt_elf.c. There were two in-line validations of user-space addresses in binfmt_elf.c, which have been appropriately converted to use the corrected BAD_ADDR() macro in the patch you posted yesterday. Note that the size checks against TASK_SIZE are okay as coded. The additional changes that I propose are below. These are in the error paths for bad ELF entry addresses once load_elf_binary() has already committed to exec'ing the new image (following the tearing down of the task's original address space). The 1st hunk deals with the interp-side of the outer "if". There were two problems here. The printk() should be removed because this path can be triggered at will by a bogus interpreter image created and used by a malicious user. Further, the error code should not be ENOEXEC, because that causes the loop in search_binary_handler() to continue trying other exec handlers (twice, in fact). But it's too late for this to work correctly, because the user address space has already been torn down, and an exec() failure cannot be returned to the user code because the code no longer exists. The only recovery is to force a SIGSEGV, but it's best to terminate the search loop immediately. I somewhat arbitrarily chose EINVAL as a fallback error code, but any error returned by load_elf_interp() will override that (but this value will never be seen by user-space). The 2nd hunk deals with the non-interp-side of the outer "if". There were two problems here as well. The SIGSEGV needs to be forced, because a prior sigaction() syscall might have set the associated disposition to SIG_IGN. And the ENOEXEC should be changed to EINVAL as described above. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04fcntl(F_SETSIG) fixTrond Myklebust
fcntl(F_SETSIG) no longer works on leases because lease_release_private_callback() gets called as the lease is copied in order to initialise it. The problem is that lease_alloc() performs an unnecessary initialisation, which sets the lease_manager_ops. Avoid the problem by allocating the target lease structure using locks_alloc_lock(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04JFS: pageno needs to be longDave Kleikamp
diRead and diWrite are representing the page number as an unsigned int. This causes file system corruption on volumes larger than 16TB. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-29freevxfs: Add missing lock_kernel() to vxfs_readdirJosh Triplett
Commit 7b2fd697427e73c81d5fa659efd91bd07d303b0e in the historical GIT tree stopped calling the readdir member of a file_operations struct with the big kernel lock held, and fixed up all the readdir functions to do their own locking. However, that change added calls to unlock_kernel() in vxfs_readdir, but no call to lock_kernel(). Fix this by adding a call to lock_kernel(). Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-29add forgotten ->b_data in memcpy() call in ext3/resize.c (oopsable)Al Viro
sbi->s_group_desc is an array of pointers to buffer_head. memcpy() of buffer size from address of buffer_head is a bad idea - it will generate junk in any case, may oops if buffer_head is close to the end of slab page and next page is not mapped and isn't what was intended there. IOW, ->b_data is missing in that call. Fortunately, result doesn't go into the primary on-disk data structures, so only backup ones get crap written to them; that had allowed this bug to remain unnoticed until now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-25CIFS: report rename failure when target file is locked by WindowsSteve French
Fixes Samba bugzilla bug # 4182 Rename by handle failures (retry after rename by path) were not being returned back. Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-24sysfs: remove duplicated dput in sysfs_update_fileHidetoshi Seto
Following function can drops d_count twice against one reference by lookup_one_len. <SOURCE> /** * sysfs_update_file - update the modified timestamp on an object attribute. * @kobj: object we're acting for. * @attr: attribute descriptor. */ int sysfs_update_file(struct kobject * kobj, const struct attribute * attr) { struct dentry * dir = kobj->dentry; struct dentry * victim; int res = -ENOENT; mutex_lock(&dir->d_inode->i_mutex); victim = lookup_one_len(attr->name, dir, strlen(attr->name)); if (!IS_ERR(victim)) { /* make sure dentry is really there */ if (victim->d_inode && (victim->d_parent->d_inode == dir->d_inode)) { victim->d_inode->i_mtime = CURRENT_TIME; fsnotify_modify(victim); /** * Drop reference from initial sysfs_get_dentry(). */ dput(victim); res = 0; } else d_drop(victim); /** * Drop the reference acquired from sysfs_get_dentry() above. */ dput(victim); } mutex_unlock(&dir->d_inode->i_mutex); return res; } </SOURCE> PCI-hotplug (drivers/pci/hotplug/pci_hotplug_core.c) is only user of this function. I confirmed that dentry of /sys/bus/pci/slots/XXX/* have negative d_count value. This patch removes unnecessary dput(). Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-20Fix BeFS slab corruptionDiego Calleja
In bugzilla #6941, Jens Kilian reported: "The function befs_utf2nls (in fs/befs/linuxvfs.c) writes a 0 byte past the end of a block of memory allocated via kmalloc(), leading to memory corruption. This happens only for filenames which are pure ASCII and a multiple of 4 bytes in length. [...] Without DEBUG_SLAB, this leads to further corruption and hard lockups; I believe this is the bug which has made kernels later than 2.6.8 unusable for me. (This must be due to changes in memory management, the bug has been in the BeFS driver since the time it was introduced (AFAICT).) Steps to reproduce: Create a directory (in BeOS, naturally :-) with files named, e.g., "1", "22", "333", "4444", ... Mount it in Linux and do an "ls" or "find"" This patch implements the suggested fix. Credits to Jens Kilian for debugging the problem and finding the right fix. Signed-off-by: Diego Calleja <diegocg@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-17ext3 -nobh option causes oopsBadari Pulavarty
For files other than IFREG, nobh option doesn't make sense. Modifications to them are journalled and needs buffer heads to do that. Without this patch, we get kernel oops in page_buffers(). Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-09[DISKLABEL] SUN: Fix signed int usage for sector countJeff Mahoney
The current sun disklabel code uses a signed int for the sector count. When partitions larger than 1 TB are used, the cast to a sector_t causes the partition sizes to be invalid: # cat /proc/paritions | grep sdan 66 112 2146435072 sdan 66 115 9223372036853660736 sdan3 66 120 9223372036853660736 sdan8 This patch switches the sector count to an unsigned int to fix this. Eric Sandeen also submitted the same patch. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-08Don't allow chmod() on the /proc/<pid>/ filesMarcel Holtmann
This just turns off chmod() on the /proc/<pid>/ files, since there is no good reason to allow it, and had we disallowed it originally, the nasty /proc race exploit wouldn't have been possible. The other patches already fixed the problem chmod() could cause, so this is really just some final mop-up.. This particular version is based off a patch by Eugene and Marcel which had much better naming than my original equivalent one. Signed-off-by: Eugene Teo <eteo@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-07[PATCH] md: Make sure bi_max_vecs is set properly in bio_splitNeil Brown
Else a subsequent bio_clone might make a mess. Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-14[CIFS] Allow cifsd to suspend if connection is lostRafael J. Wysocki
Make cifsd allow us to suspend if it has lost the connection with a server Ref: http://bugzilla.kernel.org/show_bug.cgi?id=6811 Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-14[CIFS] Fix typo in earlier cifs_unlink change and protect one extra path.Steve French
Since cifs_unlink can also be called from rename path and there was one report of oops am making the extra check for null inode. Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>