summaryrefslogtreecommitdiff
path: root/arch/s390/kernel/perf_cpum_sf.c
AgeCommit message (Collapse)Author
2020-01-12s390/cpum_sf: Avoid SBD overflow condition in irq handlerThomas Richter
[ Upstream commit 0539ad0b22877225095d8adef0c376f52cc23834 ] The s390 CPU Measurement sampling facility has an overflow condition which fires when all entries in a SBD are used. The measurement alert interrupt is triggered and reads out all samples in this SDB. It then tests the successor SDB, if this SBD is not full, the interrupt handler does not read any samples at all from this SDB The design waits for the hardware to fill this SBD and then trigger another meassurement alert interrupt. This scheme works nicely until an perf_event_overflow() function call discards the sample due to a too high sampling rate. The interrupt handler has logic to read out a partially filled SDB when the perf event overflow condition in linux common code is met. This causes the CPUM sampling measurement hardware and the PMU device driver to operate on the same SBD's trailer entry. This should not happen. This can be seen here using this trace: cpumsf_pmu_add: tear:0xb5286000 hw_perf_event_update: sdbt 0xb5286000 full 1 over 0 flush_all:0 hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0 above shows 1. interrupt hw_perf_event_update: sdbt 0xb5286008 full 1 over 0 flush_all:0 hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0 above shows 2. interrupt ... this goes on fine until... hw_perf_event_update: sdbt 0xb5286068 full 1 over 0 flush_all:0 perf_push_sample1: overflow one or more samples read from the IRQ handler are rejected by perf_event_overflow() and the IRQ handler advances to the next SDB and modifies the trailer entry of a partially filled SDB. hw_perf_event_update: sdbt 0xb5286070 full 0 over 0 flush_all:1 timestamp: 14:32:52.519953 Next time the IRQ handler is called for this SDB the trailer entry shows an overflow count of 19 missed entries. hw_perf_event_update: sdbt 0xb5286070 full 1 over 19 flush_all:1 timestamp: 14:32:52.970058 Remove access to a follow on SDB when event overflow happened. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12s390/cpum_sf: Adjust sampling interval to avoid hitting sample limitsThomas Richter
[ Upstream commit 39d4a501a9ef55c57b51e3ef07fc2aeed7f30b3b ] Function perf_event_ever_overflow() and perf_event_account_interrupt() are called every time samples are processed by the interrupt handler. However function perf_event_account_interrupt() has checks to avoid being flooded with interrupts (more then 1000 samples are received per task_tick). Samples are then dropped and a PERF_RECORD_THROTTLED is added to the perf data. The perf subsystem limit calculation is: maximum sample frequency := 100000 --> 1 samples per 10 us task_tick = 10ms = 10000us --> 1000 samples per task_tick The work flow is measurement_alert() uses SDBT head and each SBDT points to 511 SDB pages, each with 126 sample entries. After processing 8 SBDs and for each valid sample calling: perf_event_overflow() perf_event_account_interrupts() there is a considerable amount of samples being dropped, especially when the sample frequency is very high and near the 100000 limit. To avoid the high amount of samples being dropped near the end of a task_tick time frame, increment the sampling interval in case of dropped events. The CPU Measurement sampling facility on the s390 supports only intervals, specifiing how many CPU cycles have to be executed before a sample is generated. Increase the interval when the samples being generated hit the task_tick limit. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04s390/cpum_sf: Check for SDBT and SDB consistencyThomas Richter
[ Upstream commit 247f265fa502e7b17a0cb0cc330e055a36aafce4 ] Each SBDT is located at a 4KB page and contains 512 entries. Each entry of a SDBT points to a SDB, a 4KB page containing sampled data. The last entry is a link to another SDBT page. When an event is created the function sequence executed is: __hw_perf_event_init() +--> allocate_buffers() +--> realloc_sampling_buffers() +---> alloc_sample_data_block() Both functions realloc_sampling_buffers() and alloc_sample_data_block() allocate pages and the allocation can fail. This is handled correctly and all allocated pages are freed and error -ENOMEM is returned to the top calling function. Finally the event is not created. Once the event has been created, the amount of initially allocated SDBT and SDB can be too low. This is detected during measurement interrupt handling, where the amount of lost samples is calculated. If the number of lost samples is too high considering sampling frequency and already allocated SBDs, the number of SDBs is enlarged during the next execution of cpumsf_pmu_enable(). If more SBDs need to be allocated, functions realloc_sampling_buffers() +---> alloc-sample_data_block() are called to allocate more pages. Page allocation may fail and the returned error is ignored. A SDBT and SDB setup already exists. However the modified SDBTs and SDBs might end up in a situation where the first entry of an SDBT does not point to an SDB, but another SDBT, basicly an SBDT without payload. This can not be handled by the interrupt handler, where an SDBT must have at least one entry pointing to an SBD. Add a check to avoid SDBTs with out payload (SDBs) when enlarging the buffer setup. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-28s390/perf: Return error when debug_register failsThomas Richter
[ Upstream commit ec0c0bb489727de0d4dca6a00be6970ab8a3b30a ] Return an error when the function debug_register() fails allocating the debug handle. Also remove the registered debug handle when the initialization fails later on. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-05-22s390/cpum_sf: ensure sample frequency of perf event attributes is non-zeroHendrik Brueckner
commit 4bbaf2584b86b0772413edeac22ff448f36351b1 upstream. Correct a trinity finding for the perf_event_open() system call with a perf event attribute structure that uses a frequency but has the sampling frequency set to zero. This causes a FP divide exception during the sample rate initialization for the hardware sampling facility. Fixes: 8c069ff4bd606 ("s390/perf: add support for the CPU-Measurement Sampling Facility") Cc: stable@vger.kernel.org # 3.14+ Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-29Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the next part of the hotplug rework. - Convert all notifiers with a priority assigned - Convert all CPU_STARTING/DYING notifiers The final removal of the STARTING/DYING infrastructure will happen when the merge window closes. Another 700 hundred line of unpenetrable maze gone :)" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) timers/core: Correct callback order during CPU hot plug leds/trigger/cpu: Move from CPU_STARTING to ONLINE level powerpc/numa: Convert to hotplug state machine arm/perf: Fix hotplug state machine conversion irqchip/armada: Avoid unused function warnings ARC/time: Convert to hotplug state machine clocksource/atlas7: Convert to hotplug state machine clocksource/armada-370-xp: Convert to hotplug state machine clocksource/exynos_mct: Convert to hotplug state machine clocksource/arm_global_timer: Convert to hotplug state machine rcu: Convert rcutree to hotplug state machine KVM/arm/arm64/vgic-new: Convert to hotplug state machine smp/cfd: Convert core to hotplug state machine x86/x2apic: Convert to CPU hotplug state machine profile: Convert to hotplug state machine timers/core: Convert to hotplug state machine hrtimer: Convert to hotplug state machine x86/tboot: Convert to hotplug state machine arm64/armv8 deprecated: Convert to hotplug state machine hwtracing/coresight-etm4x: Convert to hotplug state machine ...
2016-07-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Unified UDP encapsulation offload methods for drivers, from Alexander Duyck. 2) Make DSA binding more sane, from Andrew Lunn. 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli. 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar. 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX packets as soon as the device sees them, with the option to mirror the packet on TX via the same interface. From Brenden Blanco and others. 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet. 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli. 8) Simplify netlink conntrack entry layout, from Florian Westphal. 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido Schimmel, Yotam Gigi, and Jiri Pirko. 10) Add SKB array infrastructure and convert tun and macvtap over to it. From Michael S Tsirkin and Jason Wang. 11) Support qdisc packet injection in pktgen, from John Fastabend. 12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy. 13) Add NV congestion control support to TCP, from Lawrence Brakmo. 14) Add GSO support to SCTP, from Marcelo Ricardo Leitner. 15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni. 16) Support MPLS over IPV4, from Simon Horman. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) xgene: Fix build warning with ACPI disabled. be2net: perform temperature query in adapter regardless of its interface state l2tp: Correctly return -EBADF from pppol2tp_getname. net/mlx5_core/health: Remove deprecated create_singlethread_workqueue net: ipmr/ip6mr: update lastuse on entry change macsec: ensure rx_sa is set when validation is disabled tipc: dump monitor attributes tipc: add a function to get the bearer name tipc: get monitor threshold for the cluster tipc: make cluster size threshold for monitoring configurable tipc: introduce constants for tipc address validation net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() MAINTAINERS: xgene: Add driver and documentation path Documentation: dtb: xgene: Add MDIO node dtb: xgene: Add MDIO node drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset drivers: net: xgene: Use exported functions drivers: net: xgene: Enable MDIO driver drivers: net: xgene: Add backward compatibility drivers: net: phy: xgene: Add MDIO driver ...
2016-07-15perf, events: add non-linear data support for raw recordsDaniel Borkmann
This patch adds support for non-linear data on raw records. It extends raw records to have one or multiple fragments that will be written linearly into the ring slot, where each fragment can optionally have a custom callback handler to walk and extract complex, possibly non-linear data. If a callback handler is provided for a fragment, then the new __output_custom() will be used instead of __output_copy() for the perf_output_sample() part. perf_prepare_sample() does all the size calculation only once, so perf_output_sample() doesn't need to redo the same work anymore, meaning real_size and padding will be cached in the raw record. The raw record becomes 32 bytes in size without holes; to not increase it further and to avoid doing unnecessary recalculations in fast-path, we can reuse next pointer of the last fragment, idea here is borrowed from ZERO_OR_NULL_PTR(), which should keep the perf_output_sample() path for PERF_SAMPLE_RAW minimal. This facility is needed for BPF's event output helper as a first user that will, in a follow-up, add an additional perf_raw_frag to its perf_raw_record in order to be able to more efficiently dump skb context after a linear head meta data related to it. skbs can be non-linear and thus need a custom output function to dump buffers. Currently, the skb data needs to be copied twice; with the help of __output_custom() this work only needs to be done once. Future users could be things like XDP/BPF programs that work on different context though and would thus also have a different callback function. The few users of raw records are adapted to initialize their frag data from the raw record itself, no change in behavior for them. The code is based upon a PoC diff provided by Peter Zijlstra [1]. [1] http://thread.gmane.org/gmane.linux.network/421294 Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14s390/perf: Convert the hotplug notifier to state machine callbacks (Sampling)Sebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-s390@vger.kernel.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153334.518084858@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-28s390/perf: remove perf_release/reserver_sampling functionsHeiko Carstens
Now that the oprofile sampling code is gone there is only one user of the sampling facility left. Therefore the reserve and release functions can be removed. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-03s390/cpum_sf: Remove superfluous SMP function callAnna-Maria Gleixner
Since commit 3b9d6da67e11 ("cpu/hotplug: Fix rollback during error-out in __cpu_disable()") it is ensured that callbacks of CPU_ONLINE and CPU_DOWN_PREPARE are processed on the hotplugged CPU. Due to this SMP function calls are no longer required. Replace smp_call_function_single() with a direct call of setup_pmc_cpu(). To keep the calling convention, interrupts are explicitly disabled around the call. Cc: linux-s390@vger.kernel.org Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-17s390/cpum_sf: Fix cpu hotplug notifier transitionsAnna-Maria Gleixner
The cpumf_pmu_notfier() hotplug callback lacks handling of the CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the PMC of the CPU is not setup again. Furthermore the CPU_ONLINE_FROZEN case will never be processed because of masking the switch expression with CPU_TASKS_FROZEN. Add handling for CPU_DOWN_FAILED transition to setup the PMC of the CPU. Remove CPU_ONLINE_FROZEN case. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02s390/cpumf: Improve guest detection heuristicsChristian Borntraeger
commit e22cf8ca6f75 ("s390/cpumf: rework program parameter setting to detect guest samples") requires guest changes to get proper guest/host. We can do better: We can use the primary asn value, which is set on all Linux variants to compare this with the host pp value. We now have the following cases: 1. Guest using PP host sample: gpp == 0, asn == hpp --> host guest sample: gpp != 0 --> guest 2. Guest not using PP host sample: gpp == 0, asn == hpp --> host guest sample: gpp == 0, asn != hpp --> guest As soon as the host no longer sets CR4, we must back out this heuristics - let's add a comment in switch_to. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14s390/cpumf: rework program parameter setting to detect guest samplesChristian Borntraeger
The program parameter can be used to mark hardware samples with some token. Previously, it was used to mark guest samples only. Improve the program parameter doubleword by combining two parts, the leftmost LPP part and the rightmost PID part. Set the PID part for processes by using the task PID. To distinguish host and guest samples for the kernel (PID part is zero), the guest must always set the program paramater to a non-zero value. Use the leftmost bit in the LPP part of the program parameter to be able to detect guest kernel samples. [brueckner@linux.vnet.ibm.com]: Split __LC_CURRENT and introduced __LC_LPP. Corrected __LC_CURRENT users and adjusted assembler parts. And updated the commit message accordingly. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-08-03KVM: s390: use pid of cpu thread for sampling taggingChristian Borntraeger
Right now we use the address of the sie control block as tag for the sampling data. This is hard to get for users. Let's just use the PID of the cpu thread to mark the hardware samples. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-05-28kernel/params: constify struct kernel_param_ops usesLuis R. Rodriguez
Most code already uses consts for the struct kernel_param_ops, sweep the kernel for the last offending stragglers. Other than include/linux/moduleparam.h and kernel/params.c all other changes were generated with the following Coccinelle SmPL patch. Merge conflicts between trees can be handled with Coccinelle. In the future git could get Coccinelle merge support to deal with patch --> fail --> grammar --> Coccinelle --> new patch conflicts automatically for us on patches where the grammar is available and the patch is of high confidence. Consider this a feature request. Test compiled on x86_64 against: * allnoconfig * allmodconfig * allyesconfig @ const_found @ identifier ops; @@ const struct kernel_param_ops ops = { }; @ const_not_found depends on !const_found @ identifier ops; @@ -struct kernel_param_ops ops = { +const struct kernel_param_ops ops = { }; Generated-by: Coccinelle SmPL Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Junio C Hamano <gitster@pobox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: cocci@systeme.lip6.fr Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-13s390/cpum_sf: add diagnostic sampling event only if it is authorizedHendrik Brueckner
The SF_CYCLES_BASIC_DIAG is always registered even if it is turned of in the current hardware configuration. Because diagnostic-sampling is typically not turned on in the hardware configuration, do not register this perf event by default. Enable it only if the diagnostic-sampling function is authorized. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "The most notable change for this pull request is the ftrace rework from Heiko. It brings a small performance improvement and the ground work to support a new gcc option to replace the mcount blocks with a single nop. Two new s390 specific system calls are added to emulate user space mmio for PCI, an artifact of the how PCI memory is accessed. Two patches for the memory management with changes to common code. For KVM mm_forbids_zeropage is added which disables the empty zero page for an mm that is used by a KVM process. And an optimization, pmdp_get_and_clear_full is added analog to ptep_get_and_clear_full. Some micro optimization for the cmpxchg and the spinlock code. And as usual bug fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits) s390/cputime: fix 31-bit compile s390/scm_block: make the number of reqs per HW req configurable s390/scm_block: handle multiple requests in one HW request s390/scm_block: allocate aidaw pages only when necessary s390/scm_block: use mempool to manage aidaw requests s390/eadm: change timeout value s390/mm: fix memory leak of ptlock in pmd_free_tlb s390: use local symbol names in entry[64].S s390/ptrace: always include vector registers in core files s390/simd: clear vector register pointer on fork/clone s390: translate cputime magic constants to macros s390/idle: convert open coded idle time seqcount s390/idle: add missing irq off lockdep annotation s390/debug: avoid function call for debug_sprintf_* s390/kprobes: fix instruction copy for out of line execution s390: remove diag 44 calls from cpu_relax() s390/dasd: retry partition detection s390/dasd: fix list corruption for sleep_on requests s390/dasd: fix infinite term I/O loop s390/dasd: remove unused code ...
2014-11-03s390/cpum_sf: Remove initialization of PMU event indexHendrik Brueckner
The git commit c719f56092add9b3d4192f57c64ce7af11105130 "perf: Fix and clean up initialization of pmu::event_idx" removed the PMU event index callback for all architectures but x86, remove the initialization of the event index as well. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-28perf: Fix and clean up initialization of pmu::event_idxPeter Zijlstra
Andy reported that the current state of event_idx is rather confused. So remove all but the x86_pmu implementation and change the default to return 0 (the safe option). Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Cody P Schafer <dev@codyps.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Himangi Saraogi <himangi774@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com> Cc: Thomas Huth <thuth@linux.vnet.ibm.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux390@de.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-26s390: Replace __get_cpu_var usesChristoph Lameter
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to this_cpu_inc(y) Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: linux390@de.ibm.com Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-04-03s390/irq: Use defines for external interruption codesThomas Huth
Use the new defines for external interruption codes to get rid of "magic" numbers in the s390 source code. And while we're at it, also rename the (un-)register_external_interrupt function to something shorter so that this patch does not exceed the 80 columns all over the place. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-13s390: delete new instances of __cpuinit usagePaul Gortmaker
The patch "s390/perf: add support for the CPU-Measurement Sampling Facility" added a new instance of the __cpuinit macro usage. We removed this a couple versions ago; we now want to remove the compat no-op stubs. Introducing new users is not what we want to see at this point in time, as it will break once the stubs are gone. Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/cpum_sf: Add flag to process full SDBs onlyHendrik Brueckner
Add the PERF_CPUM_SF_FULL_BLOCKS flag to process only sample-data-blocks that have the block-full-indicator bit set. Sample-data-blocks that are partially filled are discarded. Use this flag if the sampling buffer is likely to be shared among perf events that use different sampling modes. In such environments, flushing sample-data-blocks that are not completely filled, might cause invalid-data-formats. Setting PERF_CPUM_SF_FULL_BLOCKS prevents potentially invalid sampling data to be processed but, in contrast, also discards valid samples in partially filled sample-data-blocks. Note that sample-data-blocks might not become full for small sampling frequencies or for workload that is scheduled for tiny intervals. To sample with the PERF_CPUM_SF_FULL_BLOCKS flag, set the perf->attr.config1 to 0x0004. For example: perf record -e cpum_sf/config=0xB000,config1=0x0004/ Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/cpum_sf: Add raw data sampling to support the diagnostic-sampling functionHendrik Brueckner
Also support the diagnostic-sampling function in addition to the basic-sampling function. Diagnostic-sampling data entries contain hardware model specific sampling data and additional programs are required to analyze the data. To deliver diagnostic-sampling, as well, as basis-sampling data entries to user space, introduce support for sampling "raw data". If this particular perf sampling type (PERF_SAMPLE_RAW) is used, sampling data entries are copied to user space. External programs can then analyze these data. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/cpum_sf: Filter perf events based event->attr.exclude_* settingsHendrik Brueckner
Introduce the perf_exclude_event() function to filter perf samples according to event->attr.exclude_* settings. During event initialization, reset event exclude settings that are not supported. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/cpum_sf: Detect KVM guest samplesHendrik Brueckner
The host-program-parameter (hpp) value of basic sample-data-entries designates a SIE control block that is set by the LPP instruction in sie64a(). Non-zero values indicate guest samples, a value of zero indicates a host sample. For perf samples, host and guest samples are distinguished using particular PERF_MISC_* flags. The perf layer calls perf_misc_flags() to set the flags based on the pt_regs content. For each sample-data-entry, the cpum_sf PMU creates a pt_regs structure with the sample-data information. An additional flag structure is added to easily distinguish between host and guest samples. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/cpum_sf: Add helper to read TOD from trailer entriesHendrik Brueckner
The trailer entry contains a timestamp of the time when the sample-data-block became full. The timestamp specifies a TOD (time-of-day) value in either the STCK or STCKE format. Provide a helper function to return the TOD value depending on the setting of time format indicator. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/cpum_sf: Atomically reset trailer entry fields of sample-data-blocksHendrik Brueckner
Ensure to reset the sample-data-block full indicator and the overflow counter at the same time. This must be done atomically because the sampling hardware is still active while full sample-data-block is processed. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/cpum_sf: Dynamically extend the sampling buffer if overflows occurHendrik Brueckner
Improve the sampling buffer allocation and add a function to reallocate and increase the sampling buffer structure. The number of allocated buffer elements (sample-data-blocks) are accounted. You can control the minimum and maximum number these sample-data-blocks through the cpum_sfb_size kernel parameter. The number hardware sample overflows (if any) are also accounted and stored per perf event. During the PMU disable/enable calls, the accumulated overflow counter is analyzed and, if necessary, the sampling buffer is dynamically increased. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/perf,oprofile: Share sampling facilityHendrik Brueckner
Introduce reserve/release functions to share the sampling facility between perf and oprofile. Also improve error handling for the sampling facility support in perf. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/perf: Improve PMU selection for PERF_COUNT_HW_CPU_CYCLES eventsHendrik Brueckner
The cpum_cf (counter facility) PMU does not support sampling events. With cpum_sf (sampling facility), a PMU for sampling CPU cycles is available. Make cpum_sf the "default" PMU for PERF_COUNT_HW_CPU_CYCLES sampling events but use the more precise cpum_cf PMU for non-sampling events. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16s390/perf: add support for the CPU-Measurement Sampling FacilityHendrik Brueckner
Introduce a perf PMU, "cpum_sf", to support the CPU-Measurement Sampling Facility. You can control the sampling facility through this perf PMU interfaces. Perf sampling events are created for hardware samples. For details about the CPU-Measurement Sampling Facility, see "The Load-Program-Parameter and the CPU-Measurement Facilities" (SA23-2260). Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>