summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2016-01-04tracing: Fix setting of start_index in find_next()Qiu Peiyang
When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel panic at t_show. general protection fault: 0000 [#1] PREEMPT SMP CPU: 0 PID: 2957 Comm: sh Tainted: G W O 3.14.55-x86_64-01062-gd4acdc7 #2 RIP: 0010:[<ffffffff811375b2>] [<ffffffff811375b2>] t_show+0x22/0xe0 RSP: 0000:ffff88002b4ebe80 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1 RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0 R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570 FS: 0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0 Call Trace: [<ffffffff811dc076>] seq_read+0x2f6/0x3e0 [<ffffffff811b749b>] vfs_read+0x9b/0x160 [<ffffffff811b7f69>] SyS_read+0x49/0xb0 [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13 ---[ end trace 5bd9eb630614861e ]--- Kernel panic - not syncing: Fatal exception When the first time find_next calls find_next_mod_format, it should iterate the trace_bprintk_fmt_list to find the first print format of the module. However in current code, start_index is smaller than *pos at first, and code will not iterate the list. Latter container_of will get the wrong address with former v, which will cause mod_fmt be a meaningless object and so is the returned mod_fmt->fmt. This patch will fix it by correcting the start_index. After fixed, when the first time calls find_next_mod_format, start_index will be equal to *pos, and code will iterate the trace_bprintk_fmt_list to get the right module printk format, so is the returned mod_fmt->fmt. Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com Cc: stable@vger.kernel.org # 3.12+ Fixes: 102c9323c35a8 "tracing: Add __tracepoint_string() to export string pointers" Signed-off-by: Qiu Peiyang <peiyangx.qiu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-08Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree includes four core perf fixes for misc bugs, three fixes to x86 PMU drivers, and two updates to old email addresses" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Do not send exit event twice perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell perf: Fix PERF_EVENT_IOC_PERIOD deadlock treewide: Remove old email address perf/x86: Fix LBR call stack save/restore perf: Update email address in MAINTAINERS perf/core: Robustify the perf_cgroup_from_task() RCU checks perf/core: Fix RCU problem with cgroup context switching code
2015-12-01tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filterSteven Rostedt (Red Hat)
The set_event_pid filter relies on attaching to the sched_switch and sched_wakeup tracepoints to see if it should filter the tracing on schedule tracepoints. By adding the callbacks to sched_wakeup, pids in the set_event_pid file will trace the wakeups of those tasks with those pids. But sched_wakeup_new and sched_waking were missed. These two should also be traced. Luckily, these tracepoints share the same class as sched_wakeup which means they can use the same pre and post callbacks as sched_wakeup does. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24ring-buffer: Put back the length if crossed page with add_timestampSteven Rostedt (Red Hat)
Commit fcc742eaad7c "ring-buffer: Add event descriptor to simplify passing data" added a descriptor that holds various data instead of passing around several variables through parameters. The problem was that one of the parameters was modified in a function and the code was designed not to have an effect on that modified parameter. Now that the parameter is a descriptor and any modifications to it are non-volatile, the size of the data could be unnecessarily expanded. Remove the extra space added if a timestamp was added and the event went across the page. Cc: stable@vger.kernel.org # 4.3+ Fixes: fcc742eaad7c "ring-buffer: Add event descriptor to simplify passing data" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24ring-buffer: Update read stamp with first real commit on pageSteven Rostedt (Red Hat)
Do not update the read stamp after swapping out the reader page from the write buffer. If the reader page is swapped out of the buffer before an event is written to it, then the read_stamp may get an out of date timestamp, as the page timestamp is updated on the first commit to that page. rb_get_reader_page() only returns a page if it has an event on it, otherwise it will return NULL. At that point, check if the page being returned has events and has not been read yet. Then at that point update the read_stamp to match the time stamp of the reader page. Cc: stable@vger.kernel.org # 2.6.30+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-23treewide: Remove old email addressPeter Zijlstra
There were still a number of references to my old Red Hat email address in the kernel source. Remove these while keeping the Red Hat copyright notices intact. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12Merge tag 'trace-v4.4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull trace cleanups from Steven Rostedt: "This contains three more clean up patches. One patch is needed to make tracing work without debugfs now that tracing uses its own tracefs. The second is removing an unused variable. The third is fixing a warning about unused variables when MAX_TRACER is not configured. Note, this warning shows up in gcc 6.0, but does not show up in gcc 4.9, as it seems that gcc does not complain about constants not being used" * tag 'trace-v4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: #ifdef out uses of max trace when CONFIG_TRACER_MAX_TRACE is not set tracing: Remove unused ftrace_cpu_disabled per cpu variable tracing: Make tracing work when debugfs is not configured in
2015-11-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet. 2) Several spots need to get to the original listner for SYN-ACK packets, most spots got this ok but some were not. Whilst covering the remaining cases, create a helper to do this. From Eric Dumazet. 3) Missiing check of return value from alloc_netdev() in CAIF SPI code, from Rasmus Villemoes. 4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich. 5) Use after free in mvneta driver, from Justin Maggard. 6) Fix race on dst->flags access in dst_release(), from Eric Dumazet. 7) Add missing ZLIB_INFLATE dependency for new qed driver. From Arnd Bergmann. 8) Fix multicast getsockopt deadlock, from WANG Cong. 9) Fix deadlock in btusb, from Kuba Pawlak. 10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6 counter state. From Sabrina Dubroca. 11) Fix packet_bind() race, which can cause lost notifications, from Francesco Ruggeri. 12) Fix MAC restoration in qlcnic driver during bonding mode changes, from Jarod Wilson. 13) Revert bridging forward delay change which broke libvirt and other userspace things, from Vlad Yasevich. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) Revert "bridge: Allow forward delay to be cfgd when STP enabled" bpf_trace: Make dependent on PERF_EVENTS qed: select ZLIB_INFLATE net: fix a race in dst_release() net: mvneta: Fix memory use after free. net: Documentation: Fix default value tcp_limit_output_bytes macvtap: Resolve possible __might_sleep warning in macvtap_do_read() mvneta: add FIXED_PHY dependency net: caif: check return value of alloc_netdev net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA drivers: net: xgene: fix RGMII 10/100Mb mode netfilter: nft_meta: use skb_to_full_sk() helper net_sched: em_meta: use skb_to_full_sk() helper sched: cls_flow: use skb_to_full_sk() helper netfilter: xt_owner: use skb_to_full_sk() helper smack: use skb_to_full_sk() helper net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid() bpf: doc: correct arch list for supported eBPF JIT dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put" bonding: fix panic on non-ARPHRD_ETHER enslave failure ...
2015-11-10bpf_trace: Make dependent on PERF_EVENTSSteven Rostedt
Arnd Bergmann reported: In my ARM randconfig tests, I'm getting a build error for newly added code in bpf_perf_event_read and bpf_perf_event_output whenever CONFIG_PERF_EVENTS is disabled: kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read': kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member named 'oncpu' if (event->oncpu != smp_processor_id() || ^ kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member named 'pmu' event->pmu->count) This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT is disabled. I'm not sure if that is a configuration we care about, otherwise we could prevent this case from occuring by adding Kconfig dependencies. Looking at this further, it's really that UPROBE_EVENT enables PERF_EVENTS. By just having BPF_EVENTS depend on PERF_EVENTS, then all is fine. Link: http://lkml.kernel.org/r/4525348.Aq9YoXkChv@wuerfel Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-10tracing: #ifdef out uses of max trace when CONFIG_TRACER_MAX_TRACE is not setChen Gang
tracing_max_lat_fops is used only when TRACER_MAX_TRACE enabled, so also swith the related code. The related warning with defconfig under x86_64: CC kernel/trace/trace.o kernel/trace/trace.c:5466:37: warning: ‘tracing_max_lat_fops’ defined but not used [-Wunused-const-variable] static const struct file_operations tracing_max_lat_fops = { Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-07tracing: Remove unused ftrace_cpu_disabled per cpu variableDmitry Safonov
Since the ring buffer is lockless, there is no need to disable ftrace on CPU. And no one doing so: after commit 68179686ac67cb ("tracing: Remove ftrace_disable/enable_cpu()") ftrace_cpu_disabled stays the same after initialization, nothing changes it. ftrace_cpu_disabled shouldn't be used by any external module since it disables only function and graph_function tracers but not any other tracer. Link: http://lkml.kernel.org/r/1446836846-22239-1-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-06Merge tag 'trace-v4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracking updates from Steven Rostedt: "Most of the changes are clean ups and small fixes. Some of them have stable tags to them. I searched through my INBOX just as the merge window opened and found lots of patches to pull. I ran them through all my tests and they were in linux-next for a few days. Features added this release: ---------------------------- - Module globbing. You can now filter function tracing to several modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov) - Tracer specific options are now visible even when the tracer is not active. It was rather annoying that you can only see and modify tracer options after enabling the tracer. Now they are in the options/ directory even when the tracer is not active. Although they are still only visible when the tracer is active in the trace_options file. - Trace options are now per instance (although some of the tracer specific options are global) - New tracefs file: set_event_pid. If any pid is added to this file, then all events in the instance will filter out events that are not part of this pid. sched_switch and sched_wakeup events handle next and the wakee pids" * tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits) tracefs: Fix refcount imbalance in start_creating() tracing: Put back comma for empty fields in boot string parsing tracing: Apply tracer specific options from kernel command line. tracing: Add some documentation about set_event_pid ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark tracing: Allow dumping traces without tracking trace started cpus ring_buffer: Fix more races when terminating the producer in the benchmark ring_buffer: Do no not complete benchmark reader too early tracing: Remove redundant TP_ARGS redefining tracing: Rename max_stack_lock to stack_trace_max_lock tracing: Allow arch-specific stack tracer recordmcount: arm64: Replace the ignored mcount call into nop recordmcount: Fix endianness handling bug for nop_mcount tracepoints: Fix documentation of RCU lockdep checks tracing: ftrace_event_is_function() can return boolean tracing: is_legal_op() can return boolean ring-buffer: rb_event_is_commit() can return boolean ring-buffer: rb_per_cpu_empty() can return boolean ring_buffer: ring_buffer_empty{cpu}() can return boolean ring-buffer: rb_is_reader_page() can return boolean ...
2015-11-06tracing: Make tracing work when debugfs is not configured inJiaxing Wang
Currently tracing_init_dentry() returns -ENODEV when debugfs is not configured in, which causes tracefs not populated with tracing files and directories, so we will get an empty directory even after we manually mount tracefs. We can make tracing_init_dentry() return NULL if debugfs is not configured in and can manually mount tracefs. But return -ENODEV if debugfs is configured in but not initialized or failed to create automount point as that would break backward compatibility with older tools. Link: http://lkml.kernel.org/r/1446797056-11683-1-git-send-email-hello.wjx@gmail.com Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-04Merge branch 'for-4.4/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "This is the core block pull request for 4.4. I've got a few more topic branches this time around, some of them will layer on top of the core+drivers changes and will come in a separate round. So not a huge chunk of changes in this round. This pull request contains: - Enable blk-mq page allocation tracking with kmemleak, from Catalin. - Unused prototype removal in blk-mq from Christoph. - Cleanup of the q->blk_trace exchange, using cmpxchg instead of two xchg()'s, from Davidlohr. - A plug flush fix from Jeff. - Also from Jeff, a fix that means we don't have to update shared tag sets at init time unless we do a state change. This cuts down boot times on thousands of devices a lot with scsi/blk-mq. - blk-mq waitqueue barrier fix from Kosuke. - Various fixes from Ming: - Fixes for segment merging and splitting, and checks, for the old core and blk-mq. - Potential blk-mq speedup by marking ctx pending at the end of a plug insertion batch in blk-mq. - direct-io no page dirty on kernel direct reads. - A WRITE_SYNC fix for mpage from Roman" * 'for-4.4/core' of git://git.kernel.dk/linux-block: blk-mq: avoid excessive boot delays with large lun counts blktrace: re-write setting q->blk_trace blk-mq: mark ctx as pending at batch in flush plug path blk-mq: fix for trace_block_plug() block: check bio_mergeable() early before merging blk-mq: check bio_mergeable() early before merging block: avoid to merge splitted bio block: setup bi_phys_segments after splitting block: fix plug list flushing for nomerge queues blk-mq: remove unused blk_mq_clone_flush_request prototype blk-mq: fix waitqueue_active without memory barrier in block/blk-mq-tag.c fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read fs/mpage.c: forgotten WRITE_SYNC in case of data integrity write block: kmemleak: Track the page allocations for struct request
2015-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: Changes of note: 1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell. 2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from David Ahern. 3) Allow the user to ask for the statistics to be filtered out of ipv4/ipv6 address netlink dumps. From Sowmini Varadhan. 4) More work to pass the network namespace context around deep into various packet path APIs, starting with the netfilter hooks. From Eric W Biederman. 5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas Richter. 6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng. 7) Support Very High Throughput in wireless MESH code, from Bob Copeland. 8) Allow setting the ageing_time in switchdev/rocker. From Scott Feldman. 9) Properly autoload L2TP type modules, from Stephen Hemminger. 10) Fix and enable offload features by default in 8139cp driver, from David Woodhouse. 11) Support both ipv4 and ipv6 sockets in a single vxlan device, from Jiri Benc. 12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning Opstad. 13) Fix IPSEC flowcache overflows on large systems, from Steffen Klassert. 14) Convert bridging to track VLANs using rhashtable entries rather than a bitmap. From Nikolay Aleksandrov. 15) Make TCP listener handling completely lockless, this is a major accomplishment. Incoming request sockets now live in the established hash table just like any other socket too. From Eric Dumazet. 15) Provide more bridging attributes to netlink, from Nikolay Aleksandrov. 16) Use hash based algorithm for ipv4 multipath routing, this was very long overdue. From Peter Nørlund. 17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann. 18) Allow non-root execution of EBPF programs, from Alexei Starovoitov. 19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This influences the port binding selection logic used by SO_REUSEPORT. 20) Add ipv6 support to VRF, from David Ahern. 21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko. 22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen. 23) Implement RACK loss recovery in TCP, from Yuchung Cheng. 24) Support multipath routes in MPLS, from Roopa Prabhu. 25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric Dumazet. 26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and Sudarsana Kalluru. 27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic Sowa. 28) Support ipv6 geneve tunnels, from John W Linville. 29) Add flood control support to switchdev layer, from Ido Schimmel. 30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from Hannes Frederic Sowa. 31) Support persistent maps and progs in bpf, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits) sh_eth: use DMA barriers switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion net: sched: kill dead code in sch_choke.c irda: Delete an unnecessary check before the function call "irlmp_unregister_service" net: dsa: mv88e6xxx: include DSA ports in VLANs net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports net/core: fix for_each_netdev_feature vlan: Invoke driver vlan hooks only if device is present arcnet/com20020: add LEDS_CLASS dependency bpf, verifier: annotate verbose printer with __printf dp83640: Only wait for timestamps for packets with timestamping enabled. ptp: Change ptp_class to a proper bitmask dp83640: Prune rx timestamp list before reading from it dp83640: Delay scheduled work. dp83640: Include hash in timestamp/packet matching ipv6: fix tunnel error handling net/mlx5e: Fix LSO vlan insertion net/mlx5e: Re-eanble client vlan TX acceleration net/mlx5e: Return error in case mlx5e_set_features() fails net/mlx5e: Don't allow more than max supported channels ...
2015-11-03tracing: Put back comma for empty fields in boot string parsingSteven Rostedt (Red Hat)
Both early_enable_events() and apply_trace_boot_options() parse a boot string that may get parsed later on. They both use strsep() which converts a comma into a nul character. To still allow the boot string to be parsed again the same way, the nul character gets converted back to a comma after the token is processed. The problem is that these two functions check for an empty parameter (two commas in a row ",,"), and continue the loop if the parameter is empty, but fails to place the comma back. In this case, the second parsing will end at this blank field, and not process fields afterward. In most cases, users should not have an empty field, but if its going to be checked, the code might as well be correct. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Apply tracer specific options from kernel command line.Jiaxing Wang
Currently, the trace_options parameter is only applied in tracer_alloc_buffers() when global_trace.current_trace is nop_trace, so a tracer specific option will not be applied even when the specific tracer is also enabled from kernel command line. For example, the 'func_stack_trace' option can't be enabled with the following kernel parameter: ftrace=function ftrace_filter=kfree trace_options=func_stack_trace We can enable tracer specific options by simply apply the options again if the specific tracer is also supplied from command line and started in register_tracer(). To make trace_boot_options_buf can be parsed again, a comma and a space is put back if they were replaced by strsep and strstrip respectively. Also make register_tracer() be __init to access the __init data, and in fact register_tracer is only called from __init code. Link: http://lkml.kernel.org/r/1446599669-9294-1-git-send-email-hello.wjx@gmail.com Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle were: - sched/fair load tracking fixes and cleanups (Byungchul Park) - Make load tracking frequency scale invariant (Dietmar Eggemann) - sched/deadline updates (Juri Lelli) - stop machine fixes, cleanups and enhancements for bugs triggered by CPU hotplug stress testing (Oleg Nesterov) - scheduler preemption code rework: remove PREEMPT_ACTIVE and related cleanups (Peter Zijlstra) - Rework the sched_info::run_delay code to fix races (Peter Zijlstra) - Optimize per entity utilization tracking (Peter Zijlstra) - ... misc other fixes, cleanups and smaller updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop() sched: Start stopper early stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark() stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() stop_machine: Ensure that a queued callback will be called before cpu_stop_park() sched/x86: Fix typo in __switch_to() comments sched/core: Remove a parameter in the migrate_task_rq() function sched/core: Drop unlikely behind BUG_ON() sched/core: Fix task and run queue sched_info::run_delay inconsistencies sched/numa: Fix task_tick_fair() from disabling numa_balancing sched/core: Add preempt_count invariant check sched/core: More notrace annotations sched/core: Kill PREEMPT_ACTIVE sched/core, sched/x86: Kill thread_info::saved_preempt_count sched/core: Simplify preempt_count tests sched/core: Robustify preemption leak checks sched/core: Stop setting PREEMPT_ACTIVE ...
2015-11-03ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmarkSteven Rostedt (Red Hat)
wake_up_process() has a memory barrier before doing anything, thus adding a memory barrier before calling it is redundant. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Allow dumping traces without tracking trace started cpusSasha Levin
We don't init iter->started when dumping the ftrace buffer, and there's no real need to do so - so allow skipping that check if the iter doesn't have an initialized ->started cpumask. Link: http://lkml.kernel.org/r/1441385156-27279-1-git-send-email-sasha.levin@oracle.com Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03ring_buffer: Fix more races when terminating the producer in the benchmarkPetr Mladek
The commit b44754d8262d3aab8 ("ring_buffer: Allow to exit the ring buffer benchmark immediately") added a hack into ring_buffer_producer() that set @kill_test when kthread_should_stop() returned true. It improved the situation a lot. It stopped the kthread in most cases because the producer spent most of the time in the patched while cycle. But there are still few possible races when kthread_should_stop() is set outside of the cycle. Then we do not set @kill_test and some other checks pass. This patch adds a better fix. It renames @test_kill/TEST_KILL() into a better descriptive @test_error/TEST_ERROR(). Also it introduces break_test() function that checks for both @test_error and kthread_should_stop(). The new function is used in the producer when the check for @test_error is not enough. It is not used in the consumer because its state is manipulated by the producer via the "reader_finish" variable. Also we add a missing check into ring_buffer_producer_thread() between setting TASK_INTERRUPTIBLE and calling schedule_timeout(). Otherwise, we might miss a wakeup from kthread_stop(). Link: http://lkml.kernel.org/r/1441629518-32712-3-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03ring_buffer: Do no not complete benchmark reader too earlyPetr Mladek
It seems that complete(&read_done) might be called too early in some situations. 1st scenario: ------------- CPU0 CPU1 ring_buffer_producer_thread() wake_up_process(consumer); wait_for_completion(&read_start); ring_buffer_consumer_thread() complete(&read_start); ring_buffer_producer() # producing data in # the do-while cycle ring_buffer_consumer(); # reading data # got error # set kill_test = 1; set_current_state( TASK_INTERRUPTIBLE); if (reader_finish) # false schedule(); # producer still in the middle of # do-while cycle if (consumer && !(cnt % wakeup_interval)) wake_up_process(consumer); # spurious wakeup while (!reader_finish && !kill_test) # leaving because # kill_test == 1 reader_finish = 0; complete(&read_done); 1st BANG: We might access uninitialized "read_done" if this is the the first round. # producer finally leaving # the do-while cycle because kill_test == 1; if (consumer) { reader_finish = 1; wake_up_process(consumer); wait_for_completion(&read_done); 2nd BANG: This will never complete because consumer already did the completion. 2nd scenario: ------------- CPU0 CPU1 ring_buffer_producer_thread() wake_up_process(consumer); wait_for_completion(&read_start); ring_buffer_consumer_thread() complete(&read_start); ring_buffer_producer() # CPU3 removes the module <--- difference from # and stops producer <--- the 1st scenario if (kthread_should_stop()) kill_test = 1; ring_buffer_consumer(); while (!reader_finish && !kill_test) # kill_test == 1 => we never go # into the top level while() reader_finish = 0; complete(&read_done); # producer still in the middle of # do-while cycle if (consumer && !(cnt % wakeup_interval)) wake_up_process(consumer); # spurious wakeup while (!reader_finish && !kill_test) # leaving because kill_test == 1 reader_finish = 0; complete(&read_done); BANG: We are in the same "bang" situations as in the 1st scenario. Root of the problem: -------------------- ring_buffer_consumer() must complete "read_done" only when "reader_finish" variable is set. It must not be skipped due to other conditions. Note that we still must keep the check for "reader_finish" in a loop because there might be spurious wakeups as described in the above scenarios. Solution: ---------- The top level cycle in ring_buffer_consumer() will finish only when "reader_finish" is set. The data will be read in "while-do" cycle so that they are not read after an error (kill_test == 1) or a spurious wake up. In addition, "reader_finish" is manipulated by the producer thread. Therefore we add READ_ONCE() to make sure that the fresh value is read in each cycle. Also we add the corresponding barrier to synchronize the sleep check. Next we set the state back to TASK_RUNNING for the situation where we did not sleep. Just from paranoid reasons, we initialize both completions statically. This is safer, in case there are other races that we are unaware of. As a side effect we could remove the memory barrier from ring_buffer_producer_thread(). IMHO, this was the reason for the barrier. ring_buffer_reset() uses spin locks that should provide the needed memory barrier for using the buffer. Link: http://lkml.kernel.org/r/1441629518-32712-2-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Remove redundant TP_ARGS redefiningDmitry Safonov
TP_ARGS is not used anywhere in trace.h nor trace_entries.h Firstly, I left just #undef TP_ARGS and had no errors - remove it. Link: http://lkml.kernel.org/r/1446576560-14085-1-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Rename max_stack_lock to stack_trace_max_lockSteven Rostedt (Red Hat)
Now that max_stack_lock is a global variable, it requires a naming convention that is unlikely to collide. Rename it to the same naming convention that the other stack_trace variables have. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03tracing: Allow arch-specific stack tracerAKASHI Takahiro
A stack frame may be used in a different way depending on cpu architecture. Thus it is not always appropriate to slurp the stack contents, as current check_stack() does, in order to calcurate a stack index (height) at a given function call. At least not on arm64. In addition, there is a possibility that we will mistakenly detect a stale stack frame which has not been overwritten. This patch makes check_stack() a weak function so as to later implement arch-specific version. Link: http://lkml.kernel.org/r/1446182741-31019-5-git-send-email-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: ftrace_event_is_function() can return booleanYaowei Bai
Make ftrace_event_is_function() return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-9-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: is_legal_op() can return booleanYaowei Bai
Make is_legal_op() return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-8-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02ring-buffer: rb_event_is_commit() can return booleanYaowei Bai
Make rb_event_is_commit() return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-7-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02ring-buffer: rb_per_cpu_empty() can return booleanYaowei Bai
Makes rb_per_cpu_empty() return bool to improve readability. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-6-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02ring_buffer: ring_buffer_empty{cpu}() can return booleanYaowei Bai
Make ring_buffer_empty() and ring_buffer_empty_cpu() return bool. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-5-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02ring-buffer: rb_is_reader_page() can return booleanYaowei Bai
Make rb_is_reader_page() return bool to improve readability due to this particular function only using either true or false as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-4-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: report_latency() in trace_irqsoff.c can return booleanYaowei Bai
This patch makes report_latency return bool due to this particular function only using either one or zero as its return value. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-3-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: report_latency() in trace_sched_wakeup.c can return booleanYaowei Bai
This patch makes report_latency return bool to improve readability, indicating whether this new latency should be reported/recorded. No functional change. Link: http://lkml.kernel.org/r/1443537816-5788-2-git-send-email-bywxiaobai@163.com Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: Update instance_rmdir() to use tracefs_remove_recursiveJiaxing Wang
Update instancd_rmdir to use tracefs_remove_recursive instead of debugfs_remove_recursive.This was left in the transition from debugfs to tracefs. Link: http://lkml.kernel.org/r/1445169490-18315-2-git-send-email-hello.wjx@gmail.com Cc: stable@vger.kernel.org # 4.1+ Fixes: 8434dc9340cd2 ("tracing: Convert the tracing facility over to use tracefs") Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: Only benchmark the time tracepoints take if tracing is onChunyan Zhang
There's no need to record the time tracepoints take when tracing is off. This is because: 1) We cannot see these records since ring_buffer record is off at that moment. 2) If tracing is off and benchmark tracepoint is enabled, the time tracepoint takes is fewer than the same situation when tracing is on, since the tracepoints need to be wrote into ring_buffer, it would take more time. If turn on tracing at this moment, the average and standard deviation cannot exactly present the time that tracepoints take to write data into ring_buffer. Link: http://lkml.kernel.org/r/1445947933-27955-1-git-send-email-zhang.chunyan@linaro.org Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02tracing: Call on_each_cpu() when adding or removing single pids from ↵Steven Rostedt (Red Hat)
set_event_pid For the case where pids are already in set_event_pid, and one is added or removed then each CPU should be checked to make sure that the new or old pid is on or not on a CPU. For example: # echo 123 >> set_event_pid or # echo '!123' >> set_event_pid Link: http://lkml.kernel.org/r/20151030061643.GA19480@cac Suggested-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-10-30blktrace: re-write setting q->blk_traceDavidlohr Bueso
This is really about simplifying the double xchg patterns into a single cmpxchg, with the same logic. Other than the immediate cleanup, there are some subtleties this change deals with: (i) While the load of the old bt is fully ordered wrt everything, ie: old_bt = xchg(&q->blk_trace, bt); [barrier] if (old_bt) (void) xchg(&q->blk_trace, old_bt); [barrier] blk_trace could still be changed between the xchg and the old_bt load. Note that this description is merely theoretical and afaict very small, but doing everything in a single context with cmpxchg closes this potential race. (ii) Ordering guarantees are obviously kept with cmpxchg. (iii) Gets rid of the hacky-by-nature (void)xchg pattern. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> eviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-26bpf: make tracing helpers gpl onlyAlexei Starovoitov
exported perf symbols are GPL only, mark eBPF helper functions used in tracing as GPL only as well. Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-26bpf: fix bpf_perf_event_read() helperAlexei Starovoitov
Fix safety checks for bpf_perf_event_read(): - only non-inherited events can be added to perf_event_array map (do this check statically at map insertion time) - dynamically check that event is local and !pmu->count Otherwise buggy bpf program can cause kernel splat. Also fix error path after perf_event_attrs() and remove redundant 'extern'. Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-26tracing: Fix sparse RCU warningSteven Rostedt (Red Hat)
p_start() and p_stop() are seq_file functions that match. Teach sparse to know that rcu_read_lock_sched() that is taken by p_start() is released by p_stop. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-25tracing: Check all tasks on each CPU when filtering pidsSteven Rostedt (Red Hat)
My tests found that if a task is running but not filtered when set_event_pid is modified, then it can still be traced. Call on_each_cpu() to check if the current running task should be filtered and update the per cpu flags of tr->data appropriately. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-25tracing: Implement event pid filteringSteven Rostedt (Red Hat)
Add the necessary hooks to use the pids loaded in set_event_pid to filter all the events enabled in the tracing instance that match the pids listed. Two probes are added to both sched_switch and sched_wakeup tracepoints to be called before other probes are called and after the other probes are called. The first is used to set the necessary flags to let the probes know to test if they should be traced or not. The sched_switch pre probe will set the "ignore_pid" flag if neither the previous or next task has a matching pid. The sched_switch probe will set the "ignore_pid" flag if the next task does not match the matching pid. The pre probe allows for probes tracing sched_switch to be traced if necessary. The sched_wakeup pre probe will set the "ignore_pid" flag if neither the current task nor the wakee task has a matching pid. The sched_wakeup post probe will set the "ignore_pid" flag if the current task does not have a matching pid. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-25tracing: Add set_event_pid directory for future useSteven Rostedt (Red Hat)
Create a tracing directory called set_event_pid, which currently has no function, but will be used to filter all events for the tracing instance or the pids that are added to the file. The reason no functionality is added with this commit is that this commit focuses on the creation and removal of the pids in a safe manner. And tests can be made against this change to make sure things are correct before hooking features to the list of pids. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-22bpf: introduce bpf_perf_event_output() helperAlexei Starovoitov
This helper is used to send raw data from eBPF program into special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event. User space needs to perf_event_open() it (either for one or all cpus) and store FD into perf_event_array (similar to bpf_perf_event_read() helper) before eBPF program can send data into it. Today the programs triggered by kprobe collect the data and either store it into the maps or print it via bpf_trace_printk() where latter is the debug facility and not suitable to stream the data. This new helper replaces such bpf_trace_printk() usage and allows programs to have dedicated channel into user space for post-processing of the raw data collected. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tracing: Remove {start,stop}_branch_traceDmitry Safonov
Both start_branch_trace() and stop_branch_trace() are used in only one location, and are both static. As they are small functions there is no need to keep them separated out. Link: http://lkml.kernel.org/r/1445000689-32596-1-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-20tracing: gpio: Add Kconfig option for enabling/disabling trace eventsTal Shorer
Add a new options to trace Kconfig, CONFIG_TRACING_EVENTS_GPIO, that is used for enabling/disabling compilation of gpio function trace events. Link: http://lkml.kernel.org/r/1438432079-11704-4-git-send-email-tal.shorer@gmail.com Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Tal Shorer <tal.shorer@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-20tracing: Do not allow stack_tracer to record stack in NMISteven Rostedt (Red Hat)
The code in stack tracer should not be executed within an NMI as it grabs spinlocks and stack tracing an NMI gives the possibility of causing a deadlock. Although this is safe on x86_64, because it does not perform stack traces when the task struct stack is not in use (interrupts and NMIs), it may be an issue for NMIs on i386 and other archs that use the same stack as the NMI. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-20ftrace: add module globbingDmitry Safonov
Extend module command for function filter selection with globbing. It uses the same globbing as function filter. sh# echo '*alloc*:mod:*' > set_ftrace_filter Will trace any function with the letters 'alloc' in the name in any module but not in kernel. sh# echo '!*alloc*:mod:ipv6' >> set_ftrace_filter Will prevent from tracing functions with 'alloc' in the name from module ipv6 (do not forget to append to set_ftrace_filter file). sh# echo '*alloc*:mod:!ipv6' > set_ftrace_filter Will trace functions with 'alloc' in the name from kernel and any module except ipv6. sh# echo '*alloc*:mod:!*' > set_ftrace_filter Will trace any function with the letters 'alloc' in the name only from kernel, but not from any module. sh# echo '*:mod:!*' > set_ftrace_filter or sh# echo ':mod:!' > set_ftrace_filter Will trace every function in the kernel, but will not trace functions from any module. sh# echo '*:mod:*' > set_ftrace_filter or sh# echo ':mod:' > set_ftrace_filter As the opposite will trace all functions from all modules, but not from kernel. sh# echo '*:mod:*snd*' > set_ftrace_filter Will trace your sound drivers only (if any). Link: http://lkml.kernel.org/r/1443545176-3215-4-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> [ Made format changes ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-20ftrace: Introduce ftrace_glob structureDmitry Safonov
ftrace_match parameters are very related and I reduce the number of local variables & parameters with it. This is also preparation for module globbing as it would introduce more realated variables & parameters. Link: http://lkml.kernel.org/r/1443545176-3215-3-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> [ Made some formatting changes ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>