Notice
This document is for a development version of Ceph.
Crimson developer documentation
See Crimson’s User Guide for more information.
Building Crimson
Crimson is not enabled by default. Enable it at build time by running:
$ WITH_CRIMSON=true ./install-deps.sh
$ ./do_cmake.sh -DWITH_CRIMSON=ON
Please note, ASan is enabled by default if Crimson is built from a source
cloned using git.
vstart.sh
Note
vstart.sh enables crimson Crimson Required Flags automatically when --crimson is used.
The following options can be used with vstart.sh.
--crimsonStart
crimson-osdinstead ofceph-osd.--nodaemonDo not daemonize the service.
--redirect-outputRedirect the
stdoutandstderrtoout/$type.$num.stdout.--osd-argsPass extra command line options to
crimson-osdorceph-osd. This is useful for passing Seastar options tocrimson-osd. For example, one can supply--osd-args "--memory 2G"to set the amount of memory to use. Please refer to the output of:crimson-osd --help-seastar
for additional Seastar-specific command line options.
--crimson-smpThe number of cores to use for each OSD. If BlueStore is used, the balance of available cores (as determined by nproc) will be assigned to the object store.
--bluestoreUse the alienized BlueStore as the object store backend. This is the default (see above section on the Crimson’s Object Store Backends for more details)
--cyanstoreUse CyanStore as the object store backend.
--memstoreUse the alienized MemStore as the object store backend.
--seastoreUse SeaStore as the back end object store.
--seastore-devsSpecify the block device used by SeaStore.
--seastore-secondary-devsOptional. SeaStore supports multiple devices. Enable this feature by passing the block device to this option.
--seastore-secondary-devs-typeOptional. Specify the type of secondary devices. When the secondary device is slower than main device passed to
--seastore-devs, cold data on the faster device will be migrated to the slower devices over time. Valid types includeHDD,SSD``(default), ``ZNS, andRANDOM_BLOCK_SSDNote secondary devices should not be faster than the main device.
To start a cluster with a single Crimson node, run:
$ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh \
--without-dashboard --bluestore --crimson \
--redirect-output
Another SeaStore example:
$ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh -n -x \
--without-dashboard --seastore \
--crimson --redirect-output \
--seastore-devs /dev/sda \
--seastore-secondary-devs /dev/sdb \
--seastore-secondary-devs-type HDD
Stop this vstart cluster by running:
$ ../src/stop.sh --crimson
daemonize
Unlike ceph-osd, crimson-osd does not daemonize itself even if the
daemonize option is enabled. In order to read this option, crimson-osd
needs to ready its config sharded service, but this sharded service lives
in the Seastar reactor. If we fork a child process and exit the parent after
starting the Seastar engine, that will leave us with a single thread which is
a replica of the thread that called fork(). Tackling this problem in Crimson
would unnecessarily complicate the code.
Since supported GNU/Linux distributions use systemd, which is able to
daemonize processes, there is no need to daemonize ourselves.
Those using sysvinit can use start-stop-daemon to daemonize crimson-osd.
If this is does not work out, a helper utility may be devised.
logging
Crimson-osd currently uses the logging utility offered by Seastar. See
src/common/dout.h for the mapping between Ceph logging levels to
the severity levels in Seastar. For instance, messages sent to derr
will be issued using logger::error(), and the messages with a debug level
greater than 20 will be issued using logger::trace().
ceph |
seastar |
< 0 |
error |
0 |
warn |
[1, 6) |
info |
[6, 20] |
debug |
> 20 |
trace |
Note that crimson-osd
does not send log messages directly to a specified log_file. It writes
the logging messages to stdout and/or syslog. This behavior can be
changed using --log-to-stdout and --log-to-syslog command line
options. By default, --log-to-stdout is enabled, and --log-to-syslog is disabled.
Profiling Crimson
Fio
crimson-store-nbd exposes configurable FuturizedStore internals as an
NBD server for use with fio.
In order to use fio to test crimson-store-nbd, perform the below steps.
You will need to install
libnbd, and compile it intofioapt-get install libnbd-dev git clone git://git.kernel.dk/fio.git cd fio ./configure --enable-libnbd make
Build
crimson-store-nbdcd build ninja crimson-store-nbd
Run the
crimson-store-nbdserver with a block device. Specify the path to the raw device, for example/dev/nvme1n1, in place of the created file for testing with a block device.export disk_img=/tmp/disk.img export unix_socket=/tmp/store_nbd_socket.sock rm -f $disk_img $unix_socket truncate -s 512M $disk_img ./bin/crimson-store-nbd \ --device-path $disk_img \ --smp 1 \ --mkfs true \ --type transaction_manager \ --uds-path ${unix_socket} &
Below are descriptions of these command line arguments:
--smpThe number of CPU cores to use (Symmetric MultiProcessor)
--mkfsInitialize the device first.
--typeThe back end to use. If
transaction_manageris specified, SeaStore’sTransactionManagerandBlockSegmentManagerare used to emulate a block device. Otherwise, this option is used to choose a backend ofFuturizedStore, where the whole “device” is divided into multiple fixed-size objects whose size is specified by--object-size. So, if you are only interested in testing the lower-level implementation of SeaStore like logical address translation layer and garbage collection without the object store semantics,transaction_managerwould be a better choice.
Create a
fiojob file namednbd.fio[global] ioengine=nbd uri=nbd+unix:///?socket=${unix_socket} rw=randrw time_based runtime=120 group_reporting iodepth=1 size=512M [job0] offset=0
Test the Crimson object store, using the custom
fiobuilt just now./fio nbd.fio
CBT
We can use cbt for performance tests:
$ git checkout main
$ make crimson-osd
$ ../src/script/run-cbt.sh --cbt ~/dev/cbt -a /tmp/baseline ../src/test/crimson/cbt/radosbench_4K_read.yaml
$ git checkout yet-another-pr
$ make crimson-osd
$ ../src/script/run-cbt.sh --cbt ~/dev/cbt -a /tmp/yap ../src/test/crimson/cbt/radosbench_4K_read.yaml
$ ~/dev/cbt/compare.py -b /tmp/baseline -a /tmp/yap -v
19:48:23 - INFO - cbt - prefill/gen8/0: bandwidth: (or (greater) (near 0.05)):: 0.183165/0.186155 => accepted
19:48:23 - INFO - cbt - prefill/gen8/0: iops_avg: (or (greater) (near 0.05)):: 46.0/47.0 => accepted
19:48:23 - WARNING - cbt - prefill/gen8/0: iops_stddev: (or (less) (near 0.05)):: 10.4403/6.65833 => rejected
19:48:23 - INFO - cbt - prefill/gen8/0: latency_avg: (or (less) (near 0.05)):: 0.340868/0.333712 => accepted
19:48:23 - INFO - cbt - prefill/gen8/1: bandwidth: (or (greater) (near 0.05)):: 0.190447/0.177619 => accepted
19:48:23 - INFO - cbt - prefill/gen8/1: iops_avg: (or (greater) (near 0.05)):: 48.0/45.0 => accepted
19:48:23 - INFO - cbt - prefill/gen8/1: iops_stddev: (or (less) (near 0.05)):: 6.1101/9.81495 => accepted
19:48:23 - INFO - cbt - prefill/gen8/1: latency_avg: (or (less) (near 0.05)):: 0.325163/0.350251 => accepted
19:48:23 - INFO - cbt - seq/gen8/0: bandwidth: (or (greater) (near 0.05)):: 1.24654/1.22336 => accepted
19:48:23 - INFO - cbt - seq/gen8/0: iops_avg: (or (greater) (near 0.05)):: 319.0/313.0 => accepted
19:48:23 - INFO - cbt - seq/gen8/0: iops_stddev: (or (less) (near 0.05)):: 0.0/0.0 => accepted
19:48:23 - INFO - cbt - seq/gen8/0: latency_avg: (or (less) (near 0.05)):: 0.0497733/0.0509029 => accepted
19:48:23 - INFO - cbt - seq/gen8/1: bandwidth: (or (greater) (near 0.05)):: 1.22717/1.11372 => accepted
19:48:23 - INFO - cbt - seq/gen8/1: iops_avg: (or (greater) (near 0.05)):: 314.0/285.0 => accepted
19:48:23 - INFO - cbt - seq/gen8/1: iops_stddev: (or (less) (near 0.05)):: 0.0/0.0 => accepted
19:48:23 - INFO - cbt - seq/gen8/1: latency_avg: (or (less) (near 0.05)):: 0.0508262/0.0557337 => accepted
19:48:23 - WARNING - cbt - 1 tests failed out of 16
Here we compile and run the same test against two branches: main and yet-another-pr.
We then compare the results. Along with every test case, a set of rules is defined to check for
performance regressions when comparing the sets of test results. If a possible regression is found, the rule and
corresponding test results are highlighted.
Hacking Crimson
Seastar Documents
See Seastar Tutorial . Or build a browsable version and start an HTTP server:
$ cd seastar
$ ./configure.py --mode debug
$ ninja -C build/debug docs
$ python3 -m http.server -d build/debug/doc/html
Install pandoc and other dependencies beforehand.
Debugging Crimson
Debugging with GDB
The tips for debugging Scylla also apply to Crimson.
Human-readable backtraces with addr2line
When a Seastar application crashes, it leaves us with a backtrace of addresses, like:
Segmentation fault.
Backtrace:
0x00000000108254aa
0x00000000107f74b9
0x00000000105366cc
0x000000001053682c
0x00000000105d2c2e
0x0000000010629b96
0x0000000010629c31
0x00002a02ebd8272f
0x00000000105d93ee
0x00000000103eff59
0x000000000d9c1d0a
/lib/x86_64-linux-gnu/libc.so.6+0x000000000002409a
0x000000000d833ac9
Segmentation fault
The seastar-addr2line utility provided by Seastar can be used to map these
addresses to functions. The script expects input on stdin,
so we need to copy and paste the above addresses, then send EOF by inputting
control-D in the terminal. One might use echo or cat instead:
$ ../src/seastar/scripts/seastar-addr2line -e bin/crimson-osd
0x00000000108254aa
0x00000000107f74b9
0x00000000105366cc
0x000000001053682c
0x00000000105d2c2e
0x0000000010629b96
0x0000000010629c31
0x00002a02ebd8272f
0x00000000105d93ee
0x00000000103eff59
0x000000000d9c1d0a
0x00000000108254aa
[Backtrace #0]
seastar::backtrace_buffer::append_backtrace() at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:1136
seastar::print_with_backtrace(seastar::backtrace_buffer&) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:1157
seastar::print_with_backtrace(char const*) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:1164
seastar::sigsegv_action() at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5119
seastar::install_oneshot_signal_handler<11, &seastar::sigsegv_action>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5105
seastar::install_oneshot_signal_handler<11, &seastar::sigsegv_action>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5101
?? ??:0
seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5418
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/kefu/dev/ceph/build/../src/seastar/src/core/app-template.cc:173 (discriminator 5)
main at /home/kefu/dev/ceph/build/../src/crimson/osd/main.cc:131 (discriminator 1)
Note that seastar-addr2line is able to extract addresses from
its input, so you can also paste the log messages as below:
2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr:Backtrace:
2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e78dbc
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e7f0
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e8b8
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e985
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: /lib64/libpthread.so.0+0x0000000000012dbf
Unlike the classic ceph-osd, Crimson does not print a human-readable backtrace when it
handles fatal signals like SIGSEGV or SIGABRT. It is also more complicated
with a stripped binary. So instead of planting a signal handler for
those signals into Crimson, we can use script/ceph-debug-docker.sh to map
addresses in the backtrace:
# assuming you are under the source tree of ceph
$ ./src/script/ceph-debug-docker.sh --flavor crimson master:27e237c137c330ebb82627166927b7681b20d0aa centos:8
....
[root@3deb50a8ad51 ~]# wget -q https://raw.githubusercontent.com/scylladb/seastar/master/scripts/seastar-addr2line
[root@3deb50a8ad51 ~]# dnf install -q -y file
[root@3deb50a8ad51 ~]# python3 seastar-addr2line -e /usr/bin/crimson-osd
# paste the backtrace here
Code Walkthroughs
Contents
Brought to you by the Ceph Foundation
The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.