Birch Street Computing -

How I work on go-ceph

Or, how I go-ceph myself.

This write-up is something I promised a coworker and I thought that I might as well make it public. I didn't make it part of go-ceph's documentation or anything because I feel this is a little to specific to how I work. I do hope that anyone reading this can make use of parts of this, or be inspired by it, or perhaps just laugh at it. I just don't want to put it on go-ceph and make it sound like this is the way you are supposed to work on go-ceph.

First off, I tend toward a somewhat minimal setup. For most projects I use vim with very few plugins and mostly just some vimscript stuff I hacked together myself. Vim is my editor and bash is my IDE. :-) So I don't do much more fancy than using ctags and grep for code navigation.

On the average go-ceph PR we're adding either one or some small number of function calls to go-ceph. I will often open my local copy of the ceph library headers for reference. We have a project standard to copy the C function declaration under a "Implements:" line in our doc comments. So, keeping the file open makes it easy to copy that over too.

The more interesting parts are the build and test workflow. The repo includes a Makefile that can build and run test containers. These containers are used by our CI but are pretty easily to run locally. The makefile will automatically select between podman or docker CLI commands. I prefer podman of course. The make ci-image command will create a new container image, based on a ceph-containers image. You can choose what version of ceph by setting CEPH_VERSION to something like "nautilus" or "octopus". I made it possible to use different images in parallel. This didn't matter in our CI but is helpful when running locally.

Now, if you want to run the entire test suite you can run make test-container and run all the test suites for rados, rbd, and cephfs, as well as our internal helper packages. It also runs a minimalistic ceph cluster before executing the tests. This is convenient as you can basically do exactly as the CI locally, but it's a bit slow if you're iterating on a certain subset of things.

I've adapted my workflow to do a customized version of the command run by make test-container. This is something I started off doing by hand, and then made a simple shell script, and eventually a more complex tool in Python. That's a pretty normal progression for me. I won't share the script here because it's pretty me-specific but I will talk a bit about how it works. Effectively, it just runs a tweaked version of the docker/podman command seen in the makefile with a few volume mount options (-v) that I want. In particular I make the results directory, which includes the coverage report and some of the ceph cluster config. An example follows:

podman run --device /dev/fuse \
  --cap-add SYS_ADMIN \
  --cap-add SYS_PTRACE \
  --security-opt=label=disable \
  --rm \
  -it \
  --network=host \
  --name=go-ceph-ci \
  -v /home/jmulliga/devel/go-ceph:/go/src/github.com/ceph/go-ceph \
  -v /home/jmulliga/devel/go-ceph/_results/ceph:/tmp/ceph \
  -v /home/jmulliga/devel/go-ceph/_results:/results \
  -e GOPROXY=http://localhost:8081 \
  go-ceph-ci:octopus

The coverage report is useful because I like to view the html coverage report to make sure all the non-error conditions are tested and all of the testable error conditions are too. I'll go into how I make use of the ceph config shortly.

The container has an entrypoint.sh script that takes a number of useful options. Currently, the majority of them are about what tests get run. I won't go over every single one. The script has a working (last time I checked) --help option. I call my wrapper script with additional args that are passed on to the container, such as --test-pkg=cephfs. This causes the entrypoint.sh script to only run tests for the cephfs subpackage. If my work is only touching cephfs code then this makes the overall job faster by only testing the relevant package. There's also the --test-run=VALUE option which is passed along to the go test command's -run option. Using this option can now reduce the number of tests to a specific subset. For the vast majority of cases I use --test-pkg with a fair portion also using --test-run. I do generally run at least once without --test-run before pushing the code to create a PR. That's also often the step where I will double check the coverage report and eyeball the coverage percentages and skim over my new or changed functions.

Despite the above working for many cases, let's say 90%, there are times when I want to go even faster or I need to debug things and using the container to run the tests is more hassle than its worth. In these cases I run the container with the options --test-run=NONE --pause. I run this container in the background. This will cause the tiny ceph cluster to get set up, but it will skip running tests, and then the script will just sleep forever. Once I have my little ceph environment going I can start testing stuff against this cluster from outside the container. This is why I have the ceph config dir in a shared directory. I can now set the environment variable in my shell to export CEPH_CONF=$PWD/_results/ceph/ceph.conf and then run the tests that make use of the ceph cluster using the standard go tooling, such as go test -v ./rados. Now, I don't need to wait for ceph to start every time I want to execute my test(s).

One word of warning, if you do want to try this at home. Not all our tests are clean and properly idempotent. I certainly want that to be true, but there are times where I hit a condition that leaves state behind on the cluster which might interfere to later test runs. Caveat emptor, patches welcome, and all that. :-)

This is getting longer than I expected it would so I want to wrap up, but I will mention one more thing I've found handly lately. Sometimes if I can't figure out what's going on with something in the Ceph code itself, say with an enexpected test failure, I want to enable debug logging in the ceph code. For this, I usually combine the above techinque of starting Ceph in the container and then I'll edit the shared ceph.conf file. To the [global] section I'll add stuff like:

log to stderr = true
err to stderr = true
debug client = 20
debug mgrc = 20

I refer to the Logging and Debugging page in the ceph docs to help me pick what subsystems to enable debug for.

It's turned into a bit of a long-winded but general overview of a few techniques I use when working on the go-ceph project. I hope it was interesting.

Respond to How I work on go-ceph
Created: 2020-08-26 14:57 EDT
Modified: 2020-08-27 11:16 EDT
Tags: Programming, Technology, Work

Birch Street Computing -

hotspots

about me

How I work on go-ceph

Birch Street Computing -

hotspots

about me

promo

How I work on go-ceph