summaryrefslogtreecommitdiff
path: root/security
AgeCommit message (Collapse)Author
2015-04-27nick kvfree() from apparmorAl Viro
commit 39f1f78d53b9bcbca91967380c5f0f2305a5c55f upstream. too many places open-code it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-04-22selinux: fix sel_write_enforce broken return valueJoe Perches
commit 6436a123a147db51a0b06024a8350f4c230e73ff upstream. Return a negative error value like the rest of the entries in this function. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> [PM: tweaked subject line] Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-04-09Don't leak a key reference if request_key() tries to use a revoked keyringDavid Jeffery
commit d0709f1e66e8066c4ac6a54620ec116aa41937c0 upstream. If a request_key() call to allocate and fill out a key attempts to insert the key structure into a revoked keyring, the key will leak, using memory and part of the user's key quota until the system reboots. This is from a failure of construct_alloc_key() to decrement the key's reference count after the attempt to insert into the requested keyring is rejected. key_put() needs to be called in the link_prealloc_failed callpath to ensure the unused key is released. Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-02-16SELinux: fix selinuxfs policy file on big endian systemsEric Paris
commit b138004ea0382bdc6d02599c39392651b4f63889 upstream. The /sys/fs/selinux/policy file is not valid on big endian systems like ppc64 or s390. Let's see why: static int hashtab_cnt(void *key, void *data, void *ptr) { int *cnt = ptr; *cnt = *cnt + 1; return 0; } static int range_write(struct policydb *p, void *fp) { size_t nel; [...] /* count the number of entries in the hashtab */ nel = 0; rc = hashtab_map(p->range_tr, hashtab_cnt, &nel); if (rc) return rc; buf[0] = cpu_to_le32(nel); rc = put_entry(buf, sizeof(u32), 1, fp); So size_t is 64 bits. But then we pass a pointer to it as we do to hashtab_cnt. hashtab_cnt thinks it is a 32 bit int and only deals with the first 4 bytes. On x86_64 which is little endian, those first 4 bytes and the least significant, so this works out fine. On ppc64/s390 those first 4 bytes of memory are the high order bits. So at the end of the call to hashtab_map nel has a HUGE number. But the least significant 32 bits are all 0's. We then pass that 64 bit number to cpu_to_le32() which happily truncates it to a 32 bit number and does endian swapping. But the low 32 bits are all 0's. So no matter how many entries are in the hashtab, big endian systems always say there are 0 entries because I screwed up the counting. The fix is easy. Use a 32 bit int, as the hashtab_cnt expects, for nel. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-01-29move d_rcu from overlapping d_child to overlapping d_aliasAl Viro
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-01-29KEYS: close race between key lookup and freeingSasha Levin
commit a3a8784454692dd72e5d5d34dcdab17b4420e74c upstream. When a key is being garbage collected, it's key->user would get put before the ->destroy() callback is called, where the key is removed from it's respective tracking structures. This leaves a key hanging in a semi-invalid state which leaves a window open for a different task to try an access key->user. An example is find_keyring_by_name() which would dereference key->user for a key that is in the process of being garbage collected (where key->user was freed but ->destroy() wasn't called yet - so it's still present in the linked list). This would cause either a panic, or corrupt memory. Fixes CVE-2014-9529. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-01-07KEYS: Fix stale key registration at error pathTakashi Iwai
commit b26bdde5bb27f3f900e25a95e33a0c476c8c2c48 upstream. When loading encrypted-keys module, if the last check of aes_get_sizes() in init_encrypted() fails, the driver just returns an error without unregistering its key type. This results in the stale entry in the list. In addition to memory leaks, this leads to a kernel crash when registering a new key type later. This patch fixes the problem by swapping the calls of aes_get_sizes() and register_key_type(), and releasing resources properly at the error paths. Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=908163 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-11-13selinux: fix inode security list corruptionStephen Smalley
commit 923190d32de4428afbea5e5773be86bea60a9925 upstream. sb_finish_set_opts() can race with inode_free_security() when initializing inode security structures for inodes created prior to initial policy load or by the filesystem during ->mount(). This appears to have always been a possible race, but commit 3dc91d4 ("SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()") made it more evident by immediately reusing the unioned list/rcu element of the inode security structure for call_rcu() upon an inode_free_security(). But the underlying issue was already present before that commit as a possible use-after-free of isec. Shivnandan Kumar reported the list corruption and proposed a patch to split the list and rcu elements out of the union as separate fields of the inode_security_struct so that setting the rcu element would not affect the list element. However, this would merely hide the issue and not truly fix the code. This patch instead moves up the deletion of the list entry prior to dropping the sbsec->isec_lock initially. Then, if the inode is dropped subsequently, there will be no further references to the isec. Reported-by: Shivnandan Kumar <shivnandan.k@samsung.com> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-11-13evm: check xattr value length and type in evm_inode_setxattr()Dmitry Kasatkin
commit 3b1deef6b1289a99505858a3b212c5b50adf0c2f upstream. evm_inode_setxattr() can be called with no value. The function does not check the length so that following command can be used to produce the kernel oops: setfattr -n security.evm FOO. This patch fixes it. Changes in v3: * there is no reason to return different error codes for EVM_XATTR_HMAC and non EVM_XATTR_HMAC. Remove unnecessary test then. Changes in v2: * testing for validity of xattr type [ 1106.396921] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1106.398192] IP: [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48 [ 1106.399244] PGD 29048067 PUD 290d7067 PMD 0 [ 1106.399953] Oops: 0000 [#1] SMP [ 1106.400020] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse [ 1106.400020] CPU: 0 PID: 3635 Comm: setxattr Not tainted 3.16.0-kds+ #2936 [ 1106.400020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 1106.400020] task: ffff8800291a0000 ti: ffff88002917c000 task.ti: ffff88002917c000 [ 1106.400020] RIP: 0010:[<ffffffff812af7b8>] [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48 [ 1106.400020] RSP: 0018:ffff88002917fd50 EFLAGS: 00010246 [ 1106.400020] RAX: 0000000000000000 RBX: ffff88002917fdf8 RCX: 0000000000000000 [ 1106.400020] RDX: 0000000000000000 RSI: ffffffff818136d3 RDI: ffff88002917fdf8 [ 1106.400020] RBP: ffff88002917fd68 R08: 0000000000000000 R09: 00000000003ec1df [ 1106.400020] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800438a0a00 [ 1106.400020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1106.400020] FS: 00007f7dfa7d7740(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000 [ 1106.400020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1106.400020] CR2: 0000000000000000 CR3: 000000003763e000 CR4: 00000000000006f0 [ 1106.400020] Stack: [ 1106.400020] ffff8800438a0a00 ffff88002917fdf8 0000000000000000 ffff88002917fd98 [ 1106.400020] ffffffff812a1030 ffff8800438a0a00 ffff88002917fdf8 0000000000000000 [ 1106.400020] 0000000000000000 ffff88002917fde0 ffffffff8116d08a ffff88002917fdc8 [ 1106.400020] Call Trace: [ 1106.400020] [<ffffffff812a1030>] security_inode_setxattr+0x5d/0x6a [ 1106.400020] [<ffffffff8116d08a>] vfs_setxattr+0x6b/0x9f [ 1106.400020] [<ffffffff8116d1e0>] setxattr+0x122/0x16c [ 1106.400020] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45 [ 1106.400020] [<ffffffff8114d011>] ? __sb_start_write+0x10f/0x143 [ 1106.400020] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45 [ 1106.400020] [<ffffffff811687c0>] ? __mnt_want_write+0x48/0x4f [ 1106.400020] [<ffffffff8116d3e6>] SyS_setxattr+0x6e/0xb0 [ 1106.400020] [<ffffffff81529da9>] system_call_fastpath+0x16/0x1b [ 1106.400020] Code: c3 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 d5 41 54 49 89 fc 53 48 89 f3 48 c7 c6 d3 36 81 81 48 89 df e8 18 22 04 00 85 c0 75 07 <41> 80 7d 00 02 74 0d 48 89 de 4c 89 e7 e8 5a fe ff ff eb 03 83 [ 1106.400020] RIP [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48 [ 1106.400020] RSP <ffff88002917fd50> [ 1106.400020] CR2: 0000000000000000 [ 1106.428061] ---[ end trace ae08331628ba3050 ]--- Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-09-17CAPABILITIES: remove undefined caps from all processesEric Paris
commit 7d8b6c63751cfbbe5eef81a48c22978b3407a3ad upstream. This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744 plus fixing it a different way... We found, when trying to run an application from an application which had dropped privs that the kernel does security checks on undefined capability bits. This was ESPECIALLY difficult to debug as those undefined bits are hidden from /proc/$PID/status. Consider a root application which drops all capabilities from ALL 4 capability sets. We assume, since the application is going to set eff/perm/inh from an array that it will clear not only the defined caps less than CAP_LAST_CAP, but also the higher 28ish bits which are undefined future capabilities. The BSET gets cleared differently. Instead it is cleared one bit at a time. The problem here is that in security/commoncap.c::cap_task_prctl() we actually check the validity of a capability being read. So any task which attempts to 'read all things set in bset' followed by 'unset all things set in bset' will not even attempt to unset the undefined bits higher than CAP_LAST_CAP. So the 'parent' will look something like: CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffc000000000 All of this 'should' be fine. Given that these are undefined bits that aren't supposed to have anything to do with permissions. But they do... So lets now consider a task which cleared the eff/perm/inh completely and cleared all of the valid caps in the bset (but not the invalid caps it couldn't read out of the kernel). We know that this is exactly what the libcap-ng library does and what the go capabilities library does. They both leave you in that above situation if you try to clear all of you capapabilities from all 4 sets. If that root task calls execve() the child task will pick up all caps not blocked by the bset. The bset however does not block bits higher than CAP_LAST_CAP. So now the child task has bits in eff which are not in the parent. These are 'meaningless' undefined bits, but still bits which the parent doesn't have. The problem is now in cred_cap_issubset() (or any operation which does a subset test) as the child, while a subset for valid cap bits, is not a subset for invalid cap bits! So now we set durring commit creds that the child is not dumpable. Given it is 'more priv' than its parent. It also means the parent cannot ptrace the child and other stupidity. The solution here: 1) stop hiding capability bits in status This makes debugging easier! 2) stop giving any task undefined capability bits. it's simple, it you don't put those invalid bits in CAP_FULL_SET you won't get them in init and you won't get them in any other task either. This fixes the cap_issubset() tests and resulting fallout (which made the init task in a docker container untraceable among other things) 3) mask out undefined bits when sys_capset() is called as it might use ~0, ~0 to denote 'all capabilities' for backward/forward compatibility. This lets 'capsh --caps="all=eip" -- -c /bin/bash' run. 4) mask out undefined bit when we read a file capability off of disk as again likely all bits are set in the xattr for forward/backward compatibility. This lets 'setcap all+pe /bin/bash; /bin/bash' run Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Dan Walsh <dwalsh@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-06-23evm: prohibit userspace writing 'security.evm' HMAC valueMimi Zohar
commit 2fb1c9a4f2dbc2f0bd2431c7fa64d0b5483864e4 upstream. Calculating the 'security.evm' HMAC value requires access to the EVM encrypted key. Only the kernel should have access to it. This patch prevents userspace tools(eg. setfattr, cp --preserve=xattr) from setting/modifying the 'security.evm' HMAC value directly. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-06-23ima: introduce ima_kernel_read()Dmitry Kasatkin
commit 0430e49b6e7c6b5e076be8fefdee089958c9adad upstream. Commit 8aac62706 "move exit_task_namespaces() outside of exit_notify" introduced the kernel opps since the kernel v3.10, which happens when Apparmor and IMA-appraisal are enabled at the same time. ---------------------------------------------------------------------- [ 106.750167] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30 [ 106.750241] PGD 0 [ 106.750254] Oops: 0000 [#1] SMP [ 106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci pps_core [ 106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15 [ 106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08 09/19/2012 [ 106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti: ffff880400fca000 [ 106.750704] RIP: 0010:[<ffffffff811ec7da>] [<ffffffff811ec7da>] our_mnt+0x1a/0x30 [ 106.750725] RSP: 0018:ffff880400fcba60 EFLAGS: 00010286 [ 106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX: ffff8800d51523e7 [ 106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI: ffff880402d20020 [ 106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09: 0000000000000001 [ 106.750817] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800d5152300 [ 106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15: ffff8800d51523e7 [ 106.750871] FS: 0000000000000000(0000) GS:ffff88040d200000(0000) knlGS:0000000000000000 [ 106.750910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4: 00000000001407e0 [ 106.750962] Stack: [ 106.750981] ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18 0000000000000000 [ 106.751037] ffff8800de804920 ffffffff8101b9b9 0001800000000000 0000000000000100 [ 106.751093] 0000010000000000 0000000000000002 000000000000000e ffff8803eb8df500 [ 106.751149] Call Trace: [ 106.751172] [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430 [ 106.751199] [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10 [ 106.751225] [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170 [ 106.751250] [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80 [ 106.751276] [<ffffffff8134aa73>] aa_file_perm+0x33/0x40 [ 106.751301] [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0 [ 106.751327] [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20 [ 106.751355] [<ffffffff8130c853>] security_file_permission+0x23/0xa0 [ 106.751382] [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0 [ 106.751407] [<ffffffff811c789d>] vfs_read+0x6d/0x170 [ 106.751432] [<ffffffff811cda31>] kernel_read+0x41/0x60 [ 106.751457] [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280 [ 106.751483] [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280 [ 106.751509] [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160 [ 106.751536] [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10 [ 106.751562] [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0 [ 106.751587] [<ffffffff81352824>] ima_update_xattr+0x34/0x60 [ 106.751612] [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0 [ 106.751637] [<ffffffff811c9635>] __fput+0xd5/0x300 [ 106.751662] [<ffffffff811c98ae>] ____fput+0xe/0x10 [ 106.751687] [<ffffffff81086774>] task_work_run+0xc4/0xe0 [ 106.751712] [<ffffffff81066fad>] do_exit+0x2bd/0xa90 [ 106.751738] [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b [ 106.751763] [<ffffffff8106780c>] do_group_exit+0x4c/0xc0 [ 106.751788] [<ffffffff81067894>] SyS_exit_group+0x14/0x20 [ 106.751814] [<ffffffff8174522d>] system_call_fastpath+0x1a/0x1f [ 106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3 0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89 e5 5d <48> 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00 [ 106.752185] RIP [<ffffffff811ec7da>] our_mnt+0x1a/0x30 [ 106.752214] RSP <ffff880400fcba60> [ 106.752236] CR2: 0000000000000018 [ 106.752258] ---[ end trace 3c520748b4732721 ]--- ---------------------------------------------------------------------- The reason for the oops is that IMA-appraisal uses "kernel_read()" when file is closed. kernel_read() honors LSM security hook which calls Apparmor handler, which uses current->nsproxy->mnt_ns. The 'guilty' commit changed the order of cleanup code so that nsproxy->mnt_ns was not already available for Apparmor. Discussion about the issue with Al Viro and Eric W. Biederman suggested that kernel_read() is too high-level for IMA. Another issue, except security checking, that was identified is mandatory locking. kernel_read honors it as well and it might prevent IMA from calculating necessary hash. It was suggested to use simplified version of the function without security and locking checks. This patch introduces special version ima_kernel_read(), which skips security and mandatory locking checking. It prevents the kernel oops to happen. Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-06-20ima: audit log files opened with O_DIRECT flagMimi Zohar
commit f9b2a735bdddf836214b5dca74f6ca7712e5a08c upstream. Files are measured or appraised based on the IMA policy. When a file, in policy, is opened with the O_DIRECT flag, a deadlock occurs. The first attempt at resolving this lockdep temporarily removed the O_DIRECT flag and restored it, after calculating the hash. The second attempt introduced the O_DIRECT_HAVELOCK flag. Based on this flag, do_blockdev_direct_IO() would skip taking the i_mutex a second time. The third attempt, by Dmitry Kasatkin, resolves the i_mutex locking issue, by re-introducing the IMA mutex, but uncovered another problem. Reading a file with O_DIRECT flag set, writes directly to userspace pages. A second patch allocates a user-space like memory. This works for all IMA hooks, except ima_file_free(), which is called on __fput() to recalculate the file hash. Until this last issue is addressed, do not 'collect' the measurement for measuring, appraising, or auditing files opened with the O_DIRECT flag set. Based on policy, permit or deny file access. This patch defines a new IMA policy rule option named 'permit_directio'. Policy rules could be defined, based on LSM or other criteria, to permit specific applications to open files with the O_DIRECT flag set. Changelog v1: - permit or deny file access based IMA policy rules Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-06-09device_cgroup: check if exception removal is allowedAristeu Rozanski
commit d2c2b11cfa134f4fbdcc34088824da26a084d8de upstream. [PATCH v3 1/2] device_cgroup: check if exception removal is allowed When the device cgroup hierarchy was introduced in bd2953ebbb53 - devcg: propagate local changes down the hierarchy a specific case was overlooked. Consider the hierarchy bellow: A default policy: ALLOW, exceptions will deny access \ B default policy: ALLOW, exceptions will deny access There's no need to verify when an new exception is added to B because in this case exceptions will deny access to further devices, which is always fine. Hierarchy in device cgroup only makes sure B won't have more access than A. But when an exception is removed (by writing devices.allow), it isn't checked if the user is in fact removing an inherited exception from A, thus giving more access to B. Example: # echo 'a' >A/devices.allow # echo 'c 1:3 rw' >A/devices.deny # echo $$ >A/B/tasks # echo >/dev/null -bash: /dev/null: Operation not permitted # echo 'c 1:3 w' >A/B/devices.allow # echo >/dev/null # This shouldn't be allowed and this patch fixes it by making sure to never allow exceptions in this case to be removed if the exception is partially or fully present on the parent. v3: missing '*' in function description v2: improved log message and formatting fixes Cc: cgroups@vger.kernel.org Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-06-09device_cgroup: rework device access check and exception checkingAristeu Rozanski
commit 79d719749d23234e9b725098aa49133f3ef7299d upstream. Whenever a device file is opened and checked against current device cgroup rules, it uses the same function (may_access()) as when a new exception rule is added by writing devices.{allow,deny}. And in both cases, the algorithm is the same, doesn't matter the behavior. First problem is having device access to be considered the same as rule checking. Consider the following structure: A (default behavior: allow, exceptions disallow access) \ B (default behavior: allow, exceptions disallow access) A new exception is added to B by writing devices.deny: c 12:34 rw When checking if that exception is allowed in may_access(): if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) { if (behavior == DEVCG_DEFAULT_ALLOW) { /* the exception will deny access to certain devices */ return true; Which is ok, since B is not getting more privileges than A, it doesn't matter and the rule is accepted Now, consider it's a device file open check and the process belongs to cgroup B. The access will be generated as: behavior: allow exception: c 12:34 rw The very same chunk of code will allow it, even if there's an explicit exception telling to do otherwise. A simple test case: # mkdir new_group # cd new_group # echo $$ >tasks # echo "c 1:3 w" >devices.deny # echo >/dev/null # echo $? 0 This is a serious bug and was introduced on c39a2a3018f8 devcg: prepare may_access() for hierarchy support To solve this problem, the device file open function was split from the new exception check. Second problem is how exceptions are processed by may_access(). The first part of the said function tries to match fully with an existing exception: list_for_each_entry_rcu(ex, &dev_cgroup->exceptions, list) { if ((refex->type & DEV_BLOCK) && !(ex->type & DEV_BLOCK)) continue; if ((refex->type & DEV_CHAR) && !(ex->type & DEV_CHAR)) continue; if (ex->major != ~0 && ex->major != refex->major) continue; if (ex->minor != ~0 && ex->minor != refex->minor) continue; if (refex->access & (~ex->access)) continue; match = true; break; } That means the new exception should be contained into an existing one to be considered a match: New exception Existing match? notes b 12:34 rwm b 12:34 rwm yes b 12:34 r b *:34 rw yes b 12:34 rw b 12:34 w no extra "r" b *:34 rw b 12:34 rw no too broad "*" b *:34 rw b *:34 rwm yes Which is fine in some cases. Consider: A (default behavior: deny, exceptions allow access) \ B (default behavior: deny, exceptions allow access) In this case the full match makes sense, the new exception cannot add more access than the parent allows But this doesn't always work, consider: A (default behavior: allow, exceptions disallow access) \ B (default behavior: deny, exceptions allow access) In this case, a new exception in B shouldn't match any of the exceptions in A, after all you can't allow something that was forbidden by A. But consider this scenario: New exception Existing in A match? outcome b 12:34 rw b 12:34 r no exception is accepted Because the new exception has "w" as extra, it doesn't match, so it'll be added to B's exception list. The same problem can happen during a file access check. Consider a cgroup with allow as default behavior: Access Exception match? b 12:34 rw b 12:34 r no In this case, the access didn't match any of the exceptions in the cgroup, which is required since exceptions will disallow access. To solve this problem, two new functions were created to match an exception either fully or partially. In the example above, a partial check will be performed and it'll produce a match since at least "b 12:34 r" from "b 12:34 rw" access matches. Cc: cgroups@vger.kernel.org Cc: Tejun Heo <tj@kernel.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13selinux: correctly label /proc inodes in use before the policy is loadedPaul Moore
commit f64410ec665479d7b4b77b7519e814253ed0f686 upstream. This patch is based on an earlier patch by Eric Paris, he describes the problem below: "If an inode is accessed before policy load it will get placed on a list of inodes to be initialized after policy load. After policy load we call inode_doinit() which calls inode_doinit_with_dentry() on all inodes accessed before policy load. In the case of inodes in procfs that means we'll end up at the bottom where it does: /* Default to the fs superblock SID. */ isec->sid = sbsec->sid; if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) { if (opt_dentry) { isec->sclass = inode_mode_to_security_class(...) rc = selinux_proc_get_sid(opt_dentry, isec->sclass, &sid); if (rc) goto out_unlock; isec->sid = sid; } } Since opt_dentry is null, we'll never call selinux_proc_get_sid() and will leave the inode labeled with the label on the superblock. I believe a fix would be to mimic the behavior of xattrs. Look for an alias of the inode. If it can't be found, just leave the inode uninitialized (and pick it up later) if it can be found, we should be able to call selinux_proc_get_sid() ..." On a system exhibiting this problem, you will notice a lot of files in /proc with the generic "proc_t" type (at least the ones that were accessed early in the boot), for example: # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }' system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax However, with this patch in place we see the expected result: # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }' system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-12SELinux: Increase ebitmap_node size for 64-bit configurationWaiman Long
commit a767f680e34bf14a36fefbbb6d85783eef99fd57 upstream. Currently, the ebitmap_node structure has a fixed size of 32 bytes. On a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being used as bitmaps. The overhead ratio is 1/4. On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes are left for bitmap purpose and the overhead ratio is 1/2. With a 3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit() function to be called about 9 million times. The average number of ebitmap_node traversal is about 3.7. This patch increases the size of the ebitmap_node structure to 64 bytes for 64-bit system to keep the overhead ratio at 1/4. This may also improve performance a little bit by making node to node traversal less frequent (< 2) as more bits are available in each node. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-12SELinux: Reduce overhead of mls_level_isvalid() function callWaiman Long
commit fee7114298cf54bbd221cdb2ab49738be8b94f4c upstream. While running the high_systime workload of the AIM7 benchmark on a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel (with HT on), it was found that a pretty sizable amount of time was spent in the SELinux code. Below was the perf trace of the "perf record -a -s" of a test run at 1500 users: 5.04% ls [kernel.kallsyms] [k] ebitmap_get_bit 1.96% ls [kernel.kallsyms] [k] mls_level_isvalid 1.95% ls [kernel.kallsyms] [k] find_next_bit The ebitmap_get_bit() was the hottest function in the perf-report output. Both the ebitmap_get_bit() and find_next_bit() functions were, in fact, called by mls_level_isvalid(). As a result, the mls_level_isvalid() call consumed 8.95% of the total CPU time of all the 24 virtual CPUs which is quite a lot. The majority of the mls_level_isvalid() function invocations come from the socket creation system call. Looking at the mls_level_isvalid() function, it is checking to see if all the bits set in one of the ebitmap structure are also set in another one as well as the highest set bit is no bigger than the one specified by the given policydb data structure. It is doing it in a bit-by-bit manner. So if the ebitmap structure has many bits set, the iteration loop will be done many times. The current code can be rewritten to use a similar algorithm as the ebitmap_contains() function with an additional check for the highest set bit. The ebitmap_contains() function was extended to cover an optional additional check for the highest set bit, and the mls_level_isvalid() function was modified to call ebitmap_contains(). With that change, the perf trace showed that the used CPU time drop down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the total which is about 100X less than before. 0.07% ls [kernel.kallsyms] [k] ebitmap_contains 0.05% ls [kernel.kallsyms] [k] ebitmap_get_bit 0.01% ls [kernel.kallsyms] [k] mls_level_isvalid 0.01% ls [kernel.kallsyms] [k] find_next_bit The remaining ebitmap_get_bit() and find_next_bit() functions calls are made by other kernel routines as the new mls_level_isvalid() function will not call them anymore. This patch also improves the high_systime AIM7 benchmark result, though the improvement is not as impressive as is suggested by the reduction in CPU time spent in the ebitmap functions. The table below shows the performance change on the 2-socket x86-64 system (with HT on) mentioned above. +--------------+---------------+----------------+-----------------+ | Workload | mean % change | mean % change | mean % change | | | 10-100 users | 200-1000 users | 1100-2000 users | +--------------+---------------+----------------+-----------------+ | high_systime | +0.1% | +0.9% | +2.6% | +--------------+---------------+----------------+-----------------+ Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05SELinux: bigendian problems with filename trans rulesEric Paris
commit 9085a6422900092886da8c404e1c5340c4ff1cbf upstream. When writing policy via /sys/fs/selinux/policy I wrote the type and class of filename trans rules in CPU endian instead of little endian. On x86_64 this works just fine, but it means that on big endian arch's like ppc64 and s390 userspace reads the policy and converts it from le32_to_cpu. So the values are all screwed up. Write the values in le format like it should have been to start. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-02-20SELinux: Fix kernel BUG on empty security contexts.Stephen Smalley
commit 2172fa709ab32ca60e86179dc67d0857be8e2c98 upstream. Setting an empty security context (length=0) on a file will lead to incorrectly dereferencing the type and other fields of the security context structure, yielding a kernel BUG. As a zero-length security context is never valid, just reject all such security contexts whether coming from userspace via setxattr or coming from the filesystem upon a getxattr request by SELinux. Setting a security context value (empty or otherwise) unknown to SELinux in the first place is only possible for a root process (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only if the corresponding SELinux mac_admin permission is also granted to the domain by policy. In Fedora policies, this is only allowed for specific domains such as livecd for setting down security contexts that are not defined in the build host policy. Reproducer: su setenforce 0 touch foo setfattr -n security.selinux foo Caveat: Relabeling or removing foo after doing the above may not be possible without booting with SELinux disabled. Any subsequent access to foo after doing the above will also trigger the BUG. BUG output from Matthew Thode: [ 473.893141] ------------[ cut here ]------------ [ 473.962110] kernel BUG at security/selinux/ss/services.c:654! [ 473.995314] invalid opcode: 0000 [#6] SMP [ 474.027196] Modules linked in: [ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I 3.13.0-grsec #1 [ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0 07/29/10 [ 474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti: ffff8805f50cd488 [ 474.183707] RIP: 0010:[<ffffffff814681c7>] [<ffffffff814681c7>] context_struct_compute_av+0xce/0x308 [ 474.219954] RSP: 0018:ffff8805c0ac3c38 EFLAGS: 00010246 [ 474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX: 0000000000000100 [ 474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI: ffff8805e8aaa000 [ 474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09: 0000000000000006 [ 474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12: 0000000000000006 [ 474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15: 0000000000000000 [ 474.453816] FS: 00007f2e75220800(0000) GS:ffff88061fc00000(0000) knlGS:0000000000000000 [ 474.489254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4: 00000000000207f0 [ 474.556058] Stack: [ 474.584325] ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98 ffff8805f1190a40 [ 474.618913] ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990 ffff8805e8aac860 [ 474.653955] ffff8805c0ac3cb8 000700068113833a ffff880606c75060 ffff8805c0ac3d94 [ 474.690461] Call Trace: [ 474.723779] [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a [ 474.778049] [<ffffffff81468824>] security_compute_av+0xf4/0x20b [ 474.811398] [<ffffffff8196f419>] avc_compute_av+0x2a/0x179 [ 474.843813] [<ffffffff8145727b>] avc_has_perm+0x45/0xf4 [ 474.875694] [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31 [ 474.907370] [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e [ 474.938726] [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22 [ 474.970036] [<ffffffff811b057d>] vfs_getattr+0x19/0x2d [ 475.000618] [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91 [ 475.030402] [<ffffffff811b063b>] vfs_lstat+0x19/0x1b [ 475.061097] [<ffffffff811b077e>] SyS_newlstat+0x15/0x30 [ 475.094595] [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3 [ 475.148405] [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b [ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48 8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7 75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8 [ 475.255884] RIP [<ffffffff814681c7>] context_struct_compute_av+0xce/0x308 [ 475.296120] RSP <ffff8805c0ac3c38> [ 475.328734] ---[ end trace f076482e9d754adc ]--- Reported-by: Matthew Thode <mthode@mthode.org> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13SELinux: Fix memory leak upon loading policyTetsuo Handa
commit 8ed814602876bec9bad2649ca17f34b499357a1c upstream. Hello. I got below leak with linux-3.10.0-54.0.1.el7.x86_64 . [ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak) Below is a patch, but I don't know whether we need special handing for undoing ebitmap_set_bit() call. ---------- >>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Mon, 6 Jan 2014 16:30:21 +0900 Subject: SELinux: Fix memory leak upon loading policy Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not check return value from hashtab_insert() in filename_trans_read(). It leaks memory if hashtab_insert() returns error. unreferenced object 0xffff88005c9160d0 (size 8): comm "systemd", pid 1, jiffies 4294688674 (age 235.265s) hex dump (first 8 bytes): 57 0b 00 00 6b 6b 6b a5 W...kkk. backtrace: [<ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360 [<ffffffff812aec5d>] policydb_read+0xd1d/0xf70 [<ffffffff812b345c>] security_load_policy+0x6c/0x500 [<ffffffff812a623c>] sel_write_load+0xac/0x750 [<ffffffff811eb680>] vfs_write+0xc0/0x1f0 [<ffffffff811ec08c>] SyS_write+0x4c/0xa0 [<ffffffff81690419>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff However, we should not return EEXIST error to the caller, or the systemd will show below message and the boot sequence freezes. systemd[1]: Failed to load SELinux policy. Freezing. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-25SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()Steven Rostedt
commit 3dc91d4338d698ce77832985f9cb183d8eeaf6be upstream. While running stress tests on adding and deleting ftrace instances I hit this bug: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: selinux_inode_permission+0x85/0x160 PGD 63681067 PUD 7ddbe067 PMD 0 Oops: 0000 [#1] PREEMPT CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000 RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160 RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840 RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000 RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54 R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000 R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000 FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0 Call Trace: security_inode_permission+0x1c/0x30 __inode_permission+0x41/0xa0 inode_permission+0x18/0x50 link_path_walk+0x66/0x920 path_openat+0xa6/0x6c0 do_filp_open+0x43/0xa0 do_sys_open+0x146/0x240 SyS_open+0x1e/0x20 system_call_fastpath+0x16/0x1b Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff RIP selinux_inode_permission+0x85/0x160 CR2: 0000000000000020 Investigating, I found that the inode->i_security was NULL, and the dereference of it caused the oops. in selinux_inode_permission(): isec = inode->i_security; rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd); Note, the crash came from stressing the deletion and reading of debugfs files. I was not able to recreate this via normal files. But I'm not sure they are safe. It may just be that the race window is much harder to hit. What seems to have happened (and what I have traced), is the file is being opened at the same time the file or directory is being deleted. As the dentry and inode locks are not held during the path walk, nor is the inodes ref counts being incremented, there is nothing saving these structures from being discarded except for an rcu_read_lock(). The rcu_read_lock() protects against freeing of the inode, but it does not protect freeing of the inode_security_struct. Now if the freeing of the i_security happens with a call_rcu(), and the i_security field of the inode is not changed (it gets freed as the inode gets freed) then there will be no issue here. (Linus Torvalds suggested not setting the field to NULL such that we do not need to check if it is NULL in the permission check). Note, this is a hack, but it fixes the problem at hand. A real fix is to restructure the destroy_inode() to call all the destructor handlers from the RCU callback. But that is a major job to do, and requires a lot of work. For now, we just band-aid this bug with this fix (it works), and work on a more maintainable solution in the future. Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-09selinux: process labeled IPsec TCP SYN-ACK packets properly in ↵Paul Moore
selinux_ip_postroute() commit c0828e50485932b7e019df377a6b0a8d1ebd3080 upstream. Due to difficulty in arriving at the proper security label for TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets while/before they are undergoing XFRM transforms instead of waiting until afterwards so that we can determine the correct security label. Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-09selinux: look for IPsec labels on both inbound and outbound packetsPaul Moore
commit 817eff718dca4e54d5721211ddde0914428fbb7c upstream. Previously selinux_skb_peerlbl_sid() would only check for labeled IPsec security labels on inbound packets, this patch enables it to check both inbound and outbound traffic for labeled IPsec security labels. Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-09selinux: selinux_setprocattr()->ptrace_parent() needs rcu_read_lock()Oleg Nesterov
commit c0c1439541f5305b57a83d599af32b74182933fe upstream. selinux_setprocattr() does ptrace_parent(p) under task_lock(p), but task_struct->alloc_lock doesn't pin ->parent or ->ptrace, this looks confusing and triggers the "suspicious RCU usage" warning because ptrace_parent() does rcu_dereference_check(). And in theory this is wrong, spin_lock()->preempt_disable() doesn't necessarily imply rcu_read_lock() we need to access the ->parent. Reported-by: Evan McNabb <emcnabb@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-09selinux: fix broken peer recv checkChad Hanson
commit 46d01d63221c3508421dd72ff9c879f61053cffc upstream. Fix a broken networking check. Return an error if peer recv fails. If secmark is active and the packet recv succeeds the peer recv error is ignored. Signed-off-by: Chad Hanson <chanson@trustedcs.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-20selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()Paul Moore
commit 446b802437f285de68ffb8d6fac3c44c3cab5b04 upstream. In selinux_ip_postroute() we perform access checks based on the packet's security label. For locally generated traffic we get the packet's security label from the associated socket; this works in all cases except for TCP SYN-ACK packets. In the case of SYN-ACK packet's the correct security label is stored in the connection's request_sock, not the server's socket. Unfortunately, at the point in time when selinux_ip_postroute() is called we can't query the request_sock directly, we need to recreate the label using the same logic that originally labeled the associated request_sock. See the inline comments for more explanation. Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-20selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()Paul Moore
commit 47180068276a04ed31d24fe04c673138208b07a9 upstream. In selinux_ip_output() we always label packets based on the parent socket. While this approach works in almost all cases, it doesn't work in the case of TCP SYN-ACK packets when the correct label is not the label of the parent socket, but rather the label of the larval socket represented by the request_sock struct. Unfortunately, since the request_sock isn't queued on the parent socket until *after* the SYN-ACK packet is sent, we can't lookup the request_sock to determine the correct label for the packet; at this point in time the best we can do is simply pass/NF_ACCEPT the packet. It must be said that simply passing the packet without any explicit labeling action, while far from ideal, is not terrible as the SYN-ACK packet will inherit any IP option based labeling from the initial connection request so the label *should* be correct and all our access controls remain in place so we shouldn't have to worry about information leaks. Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04selinux: correct locking in selinux_netlbl_socket_connect)Paul Moore
commit 42d64e1add3a1ce8a787116036163b8724362145 upstream. The SELinux/NetLabel glue code has a locking bug that affects systems with NetLabel enabled, see the kernel error message below. This patch corrects this problem by converting the bottom half socket lock to a more conventional, and correct for this call-path, lock_sock() call. =============================== [ INFO: suspicious RCU usage. ] 3.11.0-rc3+ #19 Not tainted ------------------------------- net/ipv4/cipso_ipv4.c:1928 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by ping/731: #0: (slock-AF_INET/1){+.-...}, at: [...] selinux_netlbl_socket_connect #1: (rcu_read_lock){.+.+..}, at: [<...>] netlbl_conn_setattr stack backtrace: CPU: 1 PID: 731 Comm: ping Not tainted 3.11.0-rc3+ #19 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000001 ffff88006f659d28 ffffffff81726b6a ffff88003732c500 ffff88006f659d58 ffffffff810e4457 ffff88006b845a00 0000000000000000 000000000000000c ffff880075aa2f50 ffff88006f659d90 ffffffff8169bec7 Call Trace: [<ffffffff81726b6a>] dump_stack+0x54/0x74 [<ffffffff810e4457>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff8169bec7>] cipso_v4_sock_setattr+0x187/0x1a0 [<ffffffff8170f317>] netlbl_conn_setattr+0x187/0x190 [<ffffffff8170f195>] ? netlbl_conn_setattr+0x5/0x190 [<ffffffff8131ac9e>] selinux_netlbl_socket_connect+0xae/0xc0 [<ffffffff81303025>] selinux_socket_connect+0x135/0x170 [<ffffffff8119d127>] ? might_fault+0x57/0xb0 [<ffffffff812fb146>] security_socket_connect+0x16/0x20 [<ffffffff815d3ad3>] SYSC_connect+0x73/0x130 [<ffffffff81739a85>] ? sysret_check+0x22/0x5d [<ffffffff810e5e2d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [<ffffffff81373d4e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff815d52be>] SyS_connect+0xe/0x10 [<ffffffff81739a59>] system_call_fastpath+0x16/0x1b Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29Revert "ima: policy for RAMFS"Mimi Zohar
commit 08de59eb144d7c41351a467442f898d720f0f15f upstream. This reverts commit 4c2c392763a682354fac65b6a569adec4e4b5387. Everything in the initramfs should be measured and appraised, but until the initramfs has extended attribute support, at least measured. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-16apparmor: fix bad lock balance when introspecting policyJohn Johansen
BugLink: http://bugs.launchpad.net/bugs/1235977 The profile introspection seq file has a locking bug when policy is viewed from a virtual root (task in a policy namespace), introspection from the real root is not affected. The test for root while (parent) { is correct for the real root, but incorrect for tasks in a policy namespace. This allows the task to walk backup the policy tree past its virtual root causing it to be unlocked before the virtual root should be in the p_stop fn. This results in the following lockdep back trace: [ 78.479744] [ BUG: bad unlock balance detected! ] [ 78.479792] 3.11.0-11-generic #17 Not tainted [ 78.479838] ------------------------------------- [ 78.479885] grep/2223 is trying to release lock (&ns->lock) at: [ 78.479952] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10 [ 78.480002] but there are no more locks to release! [ 78.480037] [ 78.480037] other info that might help us debug this: [ 78.480037] 1 lock held by grep/2223: [ 78.480037] #0: (&p->lock){+.+.+.}, at: [<ffffffff812111bd>] seq_read+0x3d/0x3d0 [ 78.480037] [ 78.480037] stack backtrace: [ 78.480037] CPU: 0 PID: 2223 Comm: grep Not tainted 3.11.0-11-generic #17 [ 78.480037] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 78.480037] ffffffff817bf3be ffff880007763d60 ffffffff817b97ef ffff8800189d2190 [ 78.480037] ffff880007763d88 ffffffff810e1c6e ffff88001f044730 ffff8800189d2190 [ 78.480037] ffffffff817bf3be ffff880007763e00 ffffffff810e5bd6 0000000724fe56b7 [ 78.480037] Call Trace: [ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff817b97ef>] dump_stack+0x54/0x74 [ 78.480037] [<ffffffff810e1c6e>] print_unlock_imbalance_bug+0xee/0x100 [ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff810e5bd6>] lock_release_non_nested+0x226/0x300 [ 78.480037] [<ffffffff817bf2fe>] ? __mutex_unlock_slowpath+0xce/0x180 [ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff810e5d5c>] lock_release+0xac/0x310 [ 78.480037] [<ffffffff817bf2b3>] __mutex_unlock_slowpath+0x83/0x180 [ 78.480037] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff81376c91>] p_stop+0x51/0x90 [ 78.480037] [<ffffffff81211408>] seq_read+0x288/0x3d0 [ 78.480037] [<ffffffff811e9d9e>] vfs_read+0x9e/0x170 [ 78.480037] [<ffffffff811ea8cc>] SyS_read+0x4c/0xa0 [ 78.480037] [<ffffffff817ccc9d>] system_call_fastpath+0x1a/0x1f Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-10-16apparmor: fix memleak of the profile hashJohn Johansen
BugLink: http://bugs.launchpad.net/bugs/1235523 This fixes the following kmemleak trace: unreferenced object 0xffff8801e8c35680 (size 32): comm "apparmor_parser", pid 691, jiffies 4294895667 (age 13230.876s) hex dump (first 32 bytes): e0 d3 4e b5 ac 6d f4 ed 3f cb ee 48 1c fd 40 cf ..N..m..?..H..@. 5b cc e9 93 00 00 00 00 00 00 00 00 00 00 00 00 [............... backtrace: [<ffffffff817a97ee>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811ca9f3>] __kmalloc+0x103/0x290 [<ffffffff8138acbc>] aa_calc_profile_hash+0x6c/0x150 [<ffffffff8138074d>] aa_unpack+0x39d/0xd50 [<ffffffff8137eced>] aa_replace_profiles+0x3d/0xd80 [<ffffffff81376937>] profile_replace+0x37/0x50 [<ffffffff811e9f2d>] vfs_write+0xbd/0x1e0 [<ffffffff811ea96c>] SyS_write+0x4c/0xa0 [<ffffffff817ccb1d>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-10-04selinux: remove 'flags' parameter from avc_audit()Linus Torvalds
Now avc_audit() has no more users with that parameter. Remove it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-04selinux: avc_has_perm_flags has no more usersLinus Torvalds
.. so get rid of it. The only indirect users were all the avc_has_perm() callers which just expanded to have a zero flags argument. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-04selinux: remove 'flags' parameter from inode_has_permLinus Torvalds
Every single user passes in '0'. I think we had non-zero users back in some stone age when selinux_inode_permission() was implemented in terms of inode_has_perm(), but that complicated case got split up into a totally separate code-path so that we could optimize the much simpler special cases. See commit 2e33405785d3 ("SELinux: delay initialization of audit data in selinux_inode_permission") for example. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30apparmor: fix suspicious RCU usage warning in policy.c/policy.hJohn Johansen
The recent 3.12 pull request for apparmor was missing a couple rcu _protected access modifiers. Resulting in the follow suspicious RCU usage [ 29.804534] [ INFO: suspicious RCU usage. ] [ 29.804539] 3.11.0+ #5 Not tainted [ 29.804541] ------------------------------- [ 29.804545] security/apparmor/include/policy.h:363 suspicious rcu_dereference_check() usage! [ 29.804548] [ 29.804548] other info that might help us debug this: [ 29.804548] [ 29.804553] [ 29.804553] rcu_scheduler_active = 1, debug_locks = 1 [ 29.804558] 2 locks held by apparmor_parser/1268: [ 29.804560] #0: (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29 [ 29.804576] #1: (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c [ 29.804589] [ 29.804589] stack backtrace: [ 29.804595] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5 [ 29.804599] Hardware name: ASUSTeK Computer Inc. UL50VT /UL50VT , BIOS 217 03/01/2010 [ 29.804602] 0000000000000000 ffff8800b95a1d90 ffffffff8144eb9b ffff8800b94db540 [ 29.804611] ffff8800b95a1dc0 ffffffff81087439 ffff880138cc3a18 ffff880138cc3a18 [ 29.804619] ffff8800b9464a90 ffff880138cc3a38 ffff8800b95a1df0 ffffffff811f5084 [ 29.804628] Call Trace: [ 29.804636] [<ffffffff8144eb9b>] dump_stack+0x4e/0x82 [ 29.804642] [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105 [ 29.804649] [<ffffffff811f5084>] __aa_update_replacedby+0x53/0x7f [ 29.804655] [<ffffffff811f5408>] __replace_profile+0x11f/0x1ed [ 29.804661] [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c [ 29.804668] [<ffffffff811f16d4>] profile_replace+0x35/0x4c [ 29.804674] [<ffffffff81120fa3>] vfs_write+0xad/0x113 [ 29.804680] [<ffffffff81121609>] SyS_write+0x44/0x7a [ 29.804687] [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b [ 29.804691] [ 29.804694] =============================== [ 29.804697] [ INFO: suspicious RCU usage. ] [ 29.804700] 3.11.0+ #5 Not tainted [ 29.804703] ------------------------------- [ 29.804706] security/apparmor/policy.c:566 suspicious rcu_dereference_check() usage! [ 29.804709] [ 29.804709] other info that might help us debug this: [ 29.804709] [ 29.804714] [ 29.804714] rcu_scheduler_active = 1, debug_locks = 1 [ 29.804718] 2 locks held by apparmor_parser/1268: [ 29.804721] #0: (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29 [ 29.804733] #1: (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c [ 29.804744] [ 29.804744] stack backtrace: [ 29.804750] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5 [ 29.804753] Hardware name: ASUSTeK Computer Inc. UL50VT /UL50VT , BIOS 217 03/01/2010 [ 29.804756] 0000000000000000 ffff8800b95a1d80 ffffffff8144eb9b ffff8800b94db540 [ 29.804764] ffff8800b95a1db0 ffffffff81087439 ffff8800b95b02b0 0000000000000000 [ 29.804772] ffff8800b9efba08 ffff880138cc3a38 ffff8800b95a1dd0 ffffffff811f4f94 [ 29.804779] Call Trace: [ 29.804786] [<ffffffff8144eb9b>] dump_stack+0x4e/0x82 [ 29.804791] [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105 [ 29.804798] [<ffffffff811f4f94>] aa_free_replacedby_kref+0x4d/0x62 [ 29.804804] [<ffffffff811f4f47>] ? aa_put_namespace+0x17/0x17 [ 29.804810] [<ffffffff811f4f0b>] kref_put+0x36/0x40 [ 29.804816] [<ffffffff811f5423>] __replace_profile+0x13a/0x1ed [ 29.804822] [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c [ 29.804829] [<ffffffff811f16d4>] profile_replace+0x35/0x4c [ 29.804835] [<ffffffff81120fa3>] vfs_write+0xad/0x113 [ 29.804840] [<ffffffff81121609>] SyS_write+0x44/0x7a [ 29.804847] [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b Reported-by: miles.lane@gmail.com CC: paulmck@linux.vnet.ibm.com Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-09-30apparmor: Use shash crypto API interface for profile hashesTyler Hicks
Use the shash interface, rather than the hash interface, when hashing AppArmor profiles. The shash interface does not use scatterlists and it is a better fit for what AppArmor needs. This fixes a kernel paging BUG when aa_calc_profile_hash() is passed a buffer from vmalloc(). The hash interface requires callers to handle vmalloc() buffers differently than what AppArmor was doing. Due to vmalloc() memory not being physically contiguous, each individual page behind the buffer must be assigned to a scatterlist with sg_set_page() and then the scatterlist passed to crypto_hash_update(). The shash interface does not have that limitation and allows vmalloc() and kmalloc() buffers to be handled in the same manner. BugLink: https://launchpad.net/bugs/1216294/ BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=62261 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-09-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace changes from Eric Biederman: "This is an assorted mishmash of small cleanups, enhancements and bug fixes. The major theme is user namespace mount restrictions. nsown_capable is killed as it encourages not thinking about details that need to be considered. A very hard to hit pid namespace exiting bug was finally tracked and fixed. A couple of cleanups to the basic namespace infrastructure. Finally there is an enhancement that makes per user namespace capabilities usable as capabilities, and an enhancement that allows the per userns root to nice other processes in the user namespace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Kill nsown_capable it makes the wrong thing easy capabilities: allow nice if we are privileged pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD userns: Allow PR_CAPBSET_DROP in a user namespace. namespaces: Simplify copy_namespaces so it is clear what is going on. pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup sysfs: Restrict mounting sysfs userns: Better restrictions on when proc and sysfs can be mounted vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces kernel/nsproxy.c: Improving a snippet of code. proc: Restrict mounting the proc filesystem vfs: Lock in place mounts from more privileged users
2013-09-07Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Nothing major for this kernel, just maintenance updates" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits) apparmor: add the ability to report a sha1 hash of loaded policy apparmor: export set of capabilities supported by the apparmor module apparmor: add the profile introspection file to interface apparmor: add an optional profile attachment string for profiles apparmor: add interface files for profiles and namespaces apparmor: allow setting any profile into the unconfined state apparmor: make free_profile available outside of policy.c apparmor: rework namespace free path apparmor: update how unconfined is handled apparmor: change how profile replacement update is done apparmor: convert profile lists to RCU based locking apparmor: provide base for multiple profiles to be replaced at once apparmor: add a features/policy dir to interface apparmor: enable users to query whether apparmor is enabled apparmor: remove minimum size check for vmalloc() Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes Smack: network label match fix security: smack: add a hash table to quicken smk_find_entry() security: smack: fix memleak in smk_write_rules_list() xattr: Constify ->name member of "struct xattr". ...
2013-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking changes from David Miller: "Noteworthy changes this time around: 1) Multicast rejoin support for team driver, from Jiri Pirko. 2) Centralize and simplify TCP RTT measurement handling in order to reduce the impact of bad RTO seeding from SYN/ACKs. Also, when both timestamps and local RTT measurements are available prefer the later because there are broken middleware devices which scramble the timestamp. From Yuchung Cheng. 3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel memory consumed to queue up unsend user data. From Eric Dumazet. 4) Add a "physical port ID" abstraction for network devices, from Jiri Pirko. 5) Add a "suppress" operation to influence fib_rules lookups, from Stefan Tomanek. 6) Add a networking development FAQ, from Paul Gortmaker. 7) Extend the information provided by tcp_probe and add ipv6 support, from Daniel Borkmann. 8) Use RCU locking more extensively in openvswitch data paths, from Pravin B Shelar. 9) Add SCTP support to openvswitch, from Joe Stringer. 10) Add EF10 chip support to SFC driver, from Ben Hutchings. 11) Add new SYNPROXY netfilter target, from Patrick McHardy. 12) Compute a rate approximation for sending in TCP sockets, and use this to more intelligently coalesce TSO frames. Furthermore, add a new packet scheduler which takes advantage of this estimate when available. From Eric Dumazet. 13) Allow AF_PACKET fanouts with random selection, from Daniel Borkmann. 14) Add ipv6 support to vxlan driver, from Cong Wang" Resolved conflicts as per discussion. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits) openvswitch: Fix alignment of struct sw_flow_key. netfilter: Fix build errors with xt_socket.c tcp: Add missing braces to do_tcp_setsockopt caif: Add missing braces to multiline if in cfctrl_linkup_request bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize vxlan: Fix kernel panic on device delete. net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls net: mvneta: properly disable HW PHY polling and ensure adjust_link() works icplus: Use netif_running to determine device state ethernet/arc/arc_emac: Fix huge delays in large file copies tuntap: orphan frags before trying to set tx timestamp tuntap: purge socket error queue on detach qlcnic: use standard NAPI weights ipv6:introduce function to find route for redirect bnx2x: VF RSS support - VF side bnx2x: VF RSS support - PF side vxlan: Notify drivers for listening UDP port changes net: usbnet: update addr_assign_type if appropriate driver/net: enic: update enic maintainers and driver driver/net: enic: Exposing symbols for Cisco's low latency driver ...
2013-09-04Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "Minor fixes mainly, including a potential use-after-free on remove found by CONFIG_DEBUG_KOBJECT_RELEASE which may be theoretical" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: Fix mod->mkobj.kobj potentially freed too early kernel/params.c: use scnprintf() instead of sprintf() kernel/module.c: use scnprintf() instead of sprintf() module/lsm: Have apparmor module parameters work with no args module: Add NOARG flag for ops with param_set_bool_enable_only() set function module: Add flag to allow mod params to have no arguments modules: add support for soft module dependencies scripts/mod/modpost.c: permit '.cranges' secton for sh64 architecture. module: fix sprintf format specifier in param_get_byte()
2013-09-03Merge branch 'for-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot of activities on the cgroup front. Most changes aren't visible to userland at all at this point and are laying foundation for the planned unified hierarchy. - The biggest change is decoupling the lifetime management of css (cgroup_subsys_state) from that of cgroup's. Because controllers (cpu, memory, block and so on) will need to be dynamically enabled and disabled, css which is the association point between a cgroup and a controller may come and go dynamically across the lifetime of a cgroup. Till now, css's were created when the associated cgroup was created and stayed till the cgroup got destroyed. Assumptions around this tight coupling permeated through cgroup core and controllers. These assumptions are gradually removed, which consists bulk of patches, and css destruction path is completely decoupled from cgroup destruction path. Note that decoupling of creation path is relatively easy on top of these changes and the patchset is pending for the next window. - cgroup has its own event mechanism cgroup.event_control, which is only used by memcg. It is overly complex trying to achieve high flexibility whose benefits seem dubious at best. Going forward, new events will simply generate file modified event and the existing mechanism is being made specific to memcg. This pull request contains prepatory patches for such change. - Various fixes and cleanups" Fixed up conflict in kernel/cgroup.c as per Tejun. * 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits) cgroup: fix cgroup_css() invocation in css_from_id() cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp() cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup cgroup: implement CFTYPE_NO_PREFIX cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax cgroup: fix cgroup_write_event_control() cgroup: fix subsystem file accesses on the root cgroup cgroup: change cgroup_from_id() to css_from_id() cgroup: use css_get() in cgroup_create() to check CSS_ROOT cpuset: remove an unncessary forward declaration cgroup: RCU protect each cgroup_subsys_state release cgroup: move subsys file removal to kill_css() cgroup: factor out kill_css() cgroup: decouple cgroup_subsys_state destruction from cgroup destruction cgroup: replace cgroup->css_kill_cnt with ->nr_css cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item cgroup: move cgroup->subsys[] assignment to online_css() cgroup: reorganize css init / exit paths cgroup: add __rcu modifier to cgroup->subsys[] ...
2013-08-30capabilities: allow nice if we are privilegedSerge Hallyn
We allow task A to change B's nice level if it has a supserset of B's privileges, or of it has CAP_SYS_NICE. Also allow it if A has CAP_SYS_NICE with respect to B - meaning it is root in the same namespace, or it created B's namespace. Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-08-30userns: Allow PR_CAPBSET_DROP in a user namespace.Eric W. Biederman
As the capabilites and capability bounding set are per user namespace properties it is safe to allow changing them with just CAP_SETPCAP permission in the user namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-08-23Merge branch 'smack-for-3.12' of git://git.gitorious.org/smack-next/kernel ↵James Morris
into ra-next
2013-08-20module/lsm: Have apparmor module parameters work with no argsSteven Rostedt
The apparmor module parameters for param_ops_aabool and param_ops_aalockpolicy are both based off of the param_ops_bool, and can handle a NULL value passed in as val. Have it enable the new KERNEL_PARAM_FL_NOARGS flag to allow the parameters to be set without having to state "=y" or "=1". Cc: John Johansen <john.johansen@canonical.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-14apparmor: add the ability to report a sha1 hash of loaded policyJohn Johansen
Provide userspace the ability to introspect a sha1 hash value for each profile currently loaded. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2013-08-14apparmor: export set of capabilities supported by the apparmor moduleJohn Johansen
Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2013-08-14apparmor: add the profile introspection file to interfaceJohn Johansen
Add the dynamic namespace relative profiles file to the interace, to allow introspection of loaded profiles and their modes. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Kees Cook <kees@ubuntu.com>