summaryrefslogtreecommitdiff
path: root/drivers/staging/zcache/ramster/ramster-howto.txt
diff options
context:
space:
mode:
Diffstat (limited to 'drivers/staging/zcache/ramster/ramster-howto.txt')
-rw-r--r--drivers/staging/zcache/ramster/ramster-howto.txt366
1 files changed, 0 insertions, 366 deletions
diff --git a/drivers/staging/zcache/ramster/ramster-howto.txt b/drivers/staging/zcache/ramster/ramster-howto.txt
deleted file mode 100644
index 7b1ee3bbfdd5..000000000000
--- a/drivers/staging/zcache/ramster/ramster-howto.txt
+++ /dev/null
@@ -1,366 +0,0 @@
- RAMSTER HOW-TO
-
-Author: Dan Magenheimer
-Ramster maintainer: Konrad Wilk <konrad.wilk@oracle.com>
-
-This is a HOWTO document for ramster which, as of this writing, is in
-the kernel as a subdirectory of zcache in drivers/staging, called ramster.
-(Zcache can be built with or without ramster functionality.) If enabled
-and properly configured, ramster allows memory capacity load balancing
-across multiple machines in a cluster. Further, the ramster code serves
-as an example of asynchronous access for zcache (as well as cleancache and
-frontswap) that may prove useful for future transcendent memory
-implementations, such as KVM and NVRAM. While ramster works today on
-any network connection that supports kernel sockets, its features may
-become more interesting on future high-speed fabrics/interconnects.
-
-Ramster requires both kernel and userland support. The userland support,
-called ramster-tools, is known to work with EL6-based distros, but is a
-set of poorly-hacked slightly-modified cluster tools based on ocfs2, which
-includes an init file, a config file, and a userland binary that interfaces
-to the kernel. This state of userland support reflects the abysmal userland
-skills of this suitably-embarrassed author; any help/patches to turn
-ramster-tools into more distributable rpms/debs useful for a wider range
-of distros would be appreciated. The source RPM that can be used as a
-starting point is available at:
- http://oss.oracle.com/projects/tmem/files/RAMster/
-
-As a result of this author's ignorance, userland setup described in this
-HOWTO assumes an EL6 distro and is described in EL6 syntax. Apologies
-if this offends anyone!
-
-Kernel support has only been tested on x86_64. Systems with an active
-ocfs2 filesystem should work, but since ramster leverages a lot of
-code from ocfs2, there may be latent issues. A kernel configuration that
-includes CONFIG_OCFS2_FS should build OK, and should certainly run OK
-if no ocfs2 filesystem is mounted.
-
-This HOWTO demonstrates memory capacity load balancing for a two-node
-cluster, where one node called the "local" node becomes overcommitted
-and the other node called the "remote" node provides additional RAM
-capacity for use by the local node. Ramster is capable of more complex
-topologies; see the last section titled "ADVANCED RAMSTER TOPOLOGIES".
-
-If you find any terms in this HOWTO unfamiliar or don't understand the
-motivation for ramster, the following LWN reading is recommended:
--- Transcendent Memory in a Nutshell (lwn.net/Articles/454795)
--- The future calculus of memory management (lwn.net/Articles/475681)
-And since ramster is built on top of zcache, this article may be helpful:
--- In-kernel memory compression (lwn.net/Articles/545244)
-
-Now that you've memorized the contents of those articles, let's get started!
-
-A. PRELIMINARY
-
-1) Install two x86_64 Linux systems that are known to work when
- upgraded to a recent upstream Linux kernel version.
-
-On each system:
-
-2) Configure, build and install, then boot Linux, just to ensure it
- can be done with an unmodified upstream kernel. Confirm you booted
- the upstream kernel with "uname -a".
-
-3) If you plan to do any performance testing or unless you plan to
- test only swapping, the "WasActive" patch is also highly recommended.
- (Search lkml.org for WasActive, apply the patch, rebuild your kernel.)
- For a demo or simple testing, the patch can be ignored.
-
-4) Install ramster-tools as root. An x86_64 rpm for EL6-based systems
- can be found at:
- http://oss.oracle.com/projects/tmem/files/RAMster/
- (Sorry but for now, non-EL6 users must recreate ramster-tools on
- their own from source. See above.)
-
-5) Ensure that debugfs is mounted at each boot. Examples below assume it
- is mounted at /sys/kernel/debug.
-
-B. BUILDING RAMSTER INTO THE KERNEL
-
-Do the following on each system:
-
-1) Using the kernel configuration mechanism of your choice, change
- your config to include:
-
- CONFIG_CLEANCACHE=y
- CONFIG_FRONTSWAP=y
- CONFIG_STAGING=y
- CONFIG_CONFIGFS_FS=y # NOTE: MUST BE y, not m
- CONFIG_ZCACHE=y
- CONFIG_RAMSTER=y
-
- For a linux-3.10 or later kernel, you should also set:
-
- CONFIG_ZCACHE_DEBUG=y
- CONFIG_RAMSTER_DEBUG=y
-
- Before building the kernel please doublecheck your kernel config
- file to ensure all of the settings are correct.
-
-2) Build this kernel and change your boot file (e.g. /etc/grub.conf)
- so that the new kernel will boot.
-
-3) Add "zcache" and "ramster" as kernel boot parameters for the new kernel.
-
-4) Reboot each system approximately simultaneously.
-
-5) Check dmesg to ensure there are some messages from ramster, prefixed
- by "ramster:"
-
- # dmesg | grep ramster
-
- You should also see a lot of files in:
-
- # ls /sys/kernel/debug/zcache
- # ls /sys/kernel/debug/ramster
-
- These are mostly counters for various zcache and ramster activities.
- You should also see files in:
-
- # ls /sys/kernel/mm/ramster
-
- These are sysfs files that control ramster as we shall see.
-
- Ramster now will act as a single-system zcache on each system
- but doesn't yet know anything about the cluster so can't yet do
- anything remotely.
-
-C. CONFIGURING THE RAMSTER CLUSTER
-
-This part can be error prone unless you are familiar with clustering
-filesystems. We need to describe the cluster in a /etc/ramster.conf
-file and the init scripts that parse it are extremely picky about
-the syntax.
-
-1) Create a /etc/ramster.conf file and ensure it is identical on both
- systems. This file mimics the ocfs2 format and there is a good amount
- of documentation that can be searched for ocfs2.conf, but you can use:
-
- cluster:
- name = ramster
- node_count = 2
- node:
- name = system1
- cluster = ramster
- number = 0
- ip_address = my.ip.ad.r1
- ip_port = 7777
- node:
- name = system2
- cluster = ramster
- number = 1
- ip_address = my.ip.ad.r2
- ip_port = 7777
-
- You must ensure that the "name" field in the file exactly matches
- the output of "hostname" on each system; if "hostname" shows a
- fully-qualified hostname, ensure the name is fully qualified in
- /etc/ramster.conf. Obviously, substitute my.ip.ad.rx with proper
- ip addresses.
-
-2) Enable the ramster service and configure it. If you used the
- EL6 ramster-tools, this would be:
-
- # chkconfig --add ramster
- # service ramster configure
-
- Set "load on boot" to "y", cluster to start is "ramster" (or whatever
- name you chose in ramster.conf), heartbeat dead threshold as "500",
- network idle timeout as "1000000". Leave the others as default.
-
-3) Reboot both systems. After reboot, try (assuming EL6 ramster-tools):
-
- # service ramster status
-
- You should see "Checking RAMSTER cluster "ramster": Online". If you do
- not, something is wrong and ramster will not work. Note that you
- should also see that the driver for "configfs" is loaded and mounted,
- the driver for ocfs2_dlmfs is not loaded, and some numbers for network
- parameters. You will also see "Checking RAMSTER heartbeat: Not active".
- That's all OK.
-
-4) Now you need to start the cluster heartbeat; the cluster is not "up"
- until all nodes detect a heartbeat. In a real cluster, heartbeat detection
- is done via a cluster filesystem, but ramster doesn't require one. Some
- hack-y kernel code in ramster can start the heartbeat for you though if
- you tell it what nodes are "up". To enable the heartbeat, do:
-
- # echo 0 > /sys/kernel/mm/ramster/manual_node_up
- # echo 1 > /sys/kernel/mm/ramster/manual_node_up
-
- This must be done on BOTH nodes and, to avoid timeouts, must be done
- approximately concurrently on both nodes. On an EL6 system, it is
- convenient to put these lines in /etc/rc.local. To confirm that the
- cluster is now up, on both systems do:
-
- # dmesg | grep ramster
-
- You should see ramster "Accepted connection" messages in dmesg on both
- nodes after this. Note that if you check userland status again with
-
- # service ramster status
-
- you will still see "Checking RAMSTER heartbeat: Not active". That's
- still OK... the ramster kernel heartbeat hack doesn't communicate to
- userland.
-
-5) You now must tell each node the node to which it should "remotify" pages.
- On this two node cluster, we will assume the "local" node, node 0, has
- memory overcommitted and will use ramster to utilize RAM capacity on
- the "remote node", node 1. To configure this, on node 0, you do:
-
- # echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum
-
- You should see "ramster: node 1 set as remotification target" in dmesg
- on node 0. Again, on EL6, /etc/rc.local is a good place to put this
- on node 0 so you don't forget to do it at each boot.
-
-6) One more step: By default, the ramster code does not "remotify" any
- pages; this is primarily for testing purposes, but sometimes it is
- useful. This may change in the future, but for now, on node 0, you do:
-
- # echo 1 > /sys/kernel/mm/ramster/pers_remotify_enable
- # echo 1 > /sys/kernel/mm/ramster/eph_remotify_enable
-
- The first enables remotifying swap (persistent, aka frontswap) pages,
- the second enables remotifying of page cache (ephemeral, cleancache)
- pages.
-
- On EL6, these lines can also be put in /etc/rc.local (AFTER the
- node_up lines), or at the beginning of a script that runs a workload.
-
-7) Note that most testing has been done with both/all machines booted
- roughly simultaneously to avoid cluster timeouts. Ideally, you should
- do this too unless you are trying to break ramster rather than just
- use it. ;-)
-
-D. TESTING RAMSTER
-
-1) Note that ramster has no value unless pages get "remotified". For
- swap/frontswap/persistent pages, this doesn't happen unless/until
- the workload would cause swapping to occur, at which point pages
- are put into frontswap/zcache, and the remotification thread starts
- working. To get to the point where the system swaps, you either
- need a workload for which the working set exceeds the RAM in the
- system; or you need to somehow reduce the amount of RAM one of
- the system sees. This latter is easy when testing in a VM, but
- harder on physical systems. In some cases, "mem=xxxM" on the
- kernel command line restricts memory, but for some values of xxx
- the kernel may fail to boot. One may also try creating a fixed
- RAMdisk, doing nothing with it, but ensuring that it eats up a fixed
- amount of RAM.
-
-2) To see if ramster is working, on the "remote node", node 1, try:
-
- # grep . /sys/kernel/debug/ramster/foreign_*
- # # note, that is space-dot-space between grep and the pathname
-
- to monitor the number (and max) ephemeral and persistent pages
- that ramster has sent. If these stay at zero, ramster is not working
- either because the workload on the local node (node 0) isn't creating
- enough memory pressure or because "remotifying" isn't working. On the
- local system, node 0, you can watch lots of useful information also.
- Try:
-
- grep . /sys/kernel/debug/zcache/*pageframes* \
- /sys/kernel/debug/zcache/*zbytes* \
- /sys/kernel/debug/zcache/*zpages* \
- /sys/kernel/debug/ramster/*remote*
-
- Of particular note are the remote_*_pages_succ_get counters. These
- show how many disk reads and/or disk writes have been avoided on the
- overcommitted local system by storing pages remotely using ramster.
-
- At the risk of information overload, you can also grep:
-
- /sys/kernel/debug/cleancache/* and /sys/kernel/debug/frontswap/*
-
- These show, for example, how many disk reads and/or disk writes have
- been avoided by using zcache to optimize RAM on the local system.
-
-
-AUTOMATIC SWAP REPATRIATION
-
-You may notice that while the systems are idle, the foreign persistent
-page count on the remote machine slowly decreases. This is because
-ramster implements "frontswap selfshrinking": When possible, swap
-pages that have been remotified are slowly repatriated to the local
-machine. This is so that local RAM can be used when possible and
-so that, in case of remote machine crash, the probability of loss
-of data is reduced.
-
-REBOOTING / POWEROFF
-
-If a system is shut down while some of its swap pages still reside
-on a remote system, the system may lock up during the shutdown
-sequence. This will occur if the network is shut down before the
-swap mechansim is shut down, which is the default ordering on many
-distros. To avoid this annoying problem, simply shut off the swap
-subsystem before starting the shutdown sequence, e.g.:
-
- # swapoff -a
- # reboot
-
-Ideally, this swapoff-before-ifdown ordering should be enforced permanently
-using shutdown scripts.
-
-KNOWN PROBLEMS
-
-1) You may periodically see messages such as:
-
- ramster_r2net, message length problem
-
- This is harmless but indicates that a node is sending messages
- containing compressed pages that exceed the maximum for zcache
- (PAGE_SIZE*15/16). The sender side needs to be fixed.
-
-2) If you see a "No longer connected to node..." message or a "No connection
- established with node X after N seconds", it is possible you may
- be in an unrecoverable state. If you are certain all of the
- appropriate cluster configuration steps described above have been
- performed, try rebooting the two servers concurrently to see if
- the cluster starts.
-
- Note that "Connection to node... shutdown, state 7" is an intermediate
- connection state. As long as you later see "Accepted connection", the
- intermediate states are harmless.
-
-3) There are known issues in counting certain values. As a result
- you may see periodic warnings from the kernel. Almost always you
- will see "ramster: bad accounting for XXX". There are also "WARN_ONCE"
- messages. If you see kernel warnings with a tombstone, please report
- them. They are harmless but reflect bugs that need to be eventually fixed.
-
-ADVANCED RAMSTER TOPOLOGIES
-
-The kernel code for ramster can support up to eight nodes in a cluster,
-but no testing has been done with more than three nodes.
-
-In the example described above, the "remote" node serves as a RAM
-overflow for the "local" node. This can be made symmetric by appropriate
-settings of the sysfs remote_target_nodenum file. For example, by setting:
-
- # echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum
-
-on node 0, and
-
- # echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
-
-on node 1, each node can serve as a RAM overflow for the other.
-
-For more than two nodes, a "RAM server" can be configured. For a
-three node system, set:
-
- # echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
-
-on node 1, and
-
- # echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
-
-on node 2. Then node 0 is a RAM server for node 1 and node 2.
-
-In this implementation of ramster, any remote node is potentially a single
-point of failure (SPOF). Though the probability of failure is reduced
-by automatic swap repatriation (see above), a proposed future enhancement
-to ramster improves high-availability for the cluster by sending a copy
-of each page of date to two other nodes. Patches welcome!