Friday, March 28, 2014

Benchmarking Disk Latency: Setup

One of the topics that comes up over and over at Datastax is how to choose the right disk systems to go with Cassandra. While Datastax provides guidance on this issue, a lot of customers still want to know why: why not use 3T SATA drives? Why are SSDs so great? What is the difference between 15k NL-SAS and a consumer-grade SSD? To answer these questions, I'm running a bunch of benchmarks on the disks with a focus on latency.

One of the most important restrictions I've placed on the project is that all of the tests in the first series are going to be run on the same machine. It is a single-socket Xeon workstation I've had for a couple years. I moved most of my personal data off of this machine for this project because this system is the closest thing to a server that I can tolerate sharing a room with.



ComponentQuantityBrand/Model
Case1Cooler Master HAF XB EVO
Power Supply1COOLMAX CU series CU-700B 700W
Motherboard1Intel S1200BTL
CPU1Intel Xeon E31270 3.4Ghz
Memory4Kingston KVR1333D3E9SK2
Root Drive2Seagate ST9500530NS Enterprise SATA
Graphics1XFX Radeon 6450 2GB
PCIe SSD1FusionIO ioDrive II
PCIe SAS1LSI Logic LSI00346 9300-4i SGL
SAS chassis1Thermaltake RC1400101A MAX-1542

Here's /proc/cpuinfo, lspci -v, and dmesg

That's the base spec. For SATA drives, I attached an eSATA port to one of the 6G/s ports and have an tray set up for quick swapping. So far, the hotplugging in Linux seems to work fine. In the picture above, it's sitting left of the red air can.

On my machines at home, I run Arch Linux. This machine is not an exception. It might make sense to use a more common server distro, but for now I'm sticking with Arch and will not be updating the system once testing begins.

I don't have all the drives yet.  The FusionIO card is a loaner courtesy of FusionIO. Most of the spinning drives are from my personal collection. Datastax paid for the Samsung SSD. I'll be buying a few more once I get through the drives I have so I can cover a couple more SAS options and a couple more SSDs. Here's the lineup so far:

Samsung 840 Pro 128GB (2.5" SATA SSD)
FusionIO ioDrive II (PCIe 8x SSD)
Western Digital WD2500KS (3.5" SATA 7,200RPM)
Seagate ST9500430SS (2.5" SAS, will also test RAID10 on these)
Western Digital Velociraptor WD3000BLFS (2.5" SATA 10,000RPM)
Western Digital WD5002AALX (3.5" SATA 7,200RPM, also RAID1)

For giggles, I'm going to add in a couple USB drives, a CF card, and some older 5400RPM laptop drives.

At this point everything is ready to go. I'll be tweaking the machine a little more over the weekend and start testing on Monday. I figure it'll take a couple days to dial in the benchmarks. If you have any benchmark or drive suggestions, let me @AlTobey or tobert@gmail.com know.





Monday, March 24, 2014

Mirrored FAT32 EFI Boot Partitions

Most modern hardware ships with UEFI firmware and can be booted in EFI mode automatically when a disk is found with EFI setup. When I recently put Arch Linux on my desktop machine, I went with gummiboot to try something new and I really like the results.

Now that I'm bringing my Xeon machine back online for some benchmarking, I want to use gummiboot there too. The twist is that while my desktop has a single SSD for root+data, this machine has a few more drives installed.


The root drives actually aren't visible in the photo. They're 2.5" SATA drives behind the side panel on the left side, directly behind the red and black SAS tray.

Since I'm installing a mirrored root on btrfs and using EFI, I want to have /boot mirrored to both drives so the system will still boot if one of them fails. The easy way would be to format both and rsync with a cron job. While that would catch 99% of updates, I figure since I'm using this machine for crazy disk stuff I might as well try mirroring the EFI filesystem.

Because of the way EFI works, FAT32 is pretty much the only decent choice for a filesystem on the EFI partition (code ef02). Since /boot only needs to hold initramfs, kernels, and the EFI configuration, I'll simply mount it on /boot as vfat.

This is how it's set up on my desktop. But now I want mirroring. I tried mdraid, but even with metadata 1.0 the fat filesystem can't be direct mounted. No big deal. The Linux LVM is actually a frontend to a kernel disk abstraction called device mapper. It includes a mirror target, so all I had to do was spin up a quick mirrored LV then dump the device mapper table.

Here's the breakdown of what the device mapper table says in English:

0 8192 mirror core 1 1024 2 253:0 0 253:1 0 1 handle_errors

Present blocks 0 to 8192 as a mirror with in-core replication log of the size 1024 with 2 devices, 253:0 and 253:1 both starting at offset 0 with one argument of 'handle_errors'.  The syntax is terse and the documentation is incomplete, so that's as far as I can tell. Device mapper can do a lot more than this, but this is all I need for now.

With the knowledge of what an LVM-created device mapper table looks like, writing a script that sets up the mirror is pretty easy. I'll throw this into a systemd unit file when I'm done with the setup.

With that script written the rest is mostly by the book (wiki), but I'll go ahead and test that the partitions are usable alone.

And with that, my workstation has redundant boot drives and can be set up to boot with gummitboot per the wiki instructions.

Edit: I may have spoke too soon. Will update again when I figure out why gummiboot won't run.

Friday, March 21, 2014

Running at Night

I've been running regularly for more than 6 months and see no reason why that will ever stop. With a 10k behind me, I'm training to do a half marathon. The twist in my regimen is that I usually run at night with my samoyed.


Running at night is a different experience from running under the day star. There are less people, less traffic, it's cooler, and more dangerous.

Tip 1: stay on the right-hand side of the road.

If you're going against traffic, the headlights will constantly ruin your night vision. Most cars are set up to shine the headlights a little to the right, which means you're getting far more lights shined in your eyes if you're on the left side of the road.

Tip 2: thin-soled shoes are dangerous.

Since I started running, I've become a huge fan of Vibram Five-fingers shoes. My first model was the Treksport. I step on a few liquidambar seed pods every time I go out. With the Treksports I feel them but it does not hurt much. The one time I wore my pair of KSO's, I hit a rock at full force and have not worn them since. When the strap on my Treksports broke, I bought a pair of Speed and have found the protection satisfactory, but not quite as good as the Treksport.

Tip 3: try using a metronome instead of music.

I tried listening to upbeat music when I started out, but it kept distracting me. For me, running is more mental than physical, so this is rather important. I installed a metronome app on my phone and started at around 80BPM, taking two steps to a beat to reach 160 steps per minute. It's hard at first, but once you get into you it becomes easy to line up your steps & breathing and really pack the miles on. It's also handy on race day to keep yourself from overdoing it early in the race.

Another advantage over music is that the tick of the metronome doesn't drown out the sounds around you. The additional sensory information can be crucial at night.

Tip 4: have a regular route and stick to it.

This is an unusual one for me. Any time I run during the day, I take off in a random direction and try to find new routes every time. At night, I pretty much always take the same 5 mile route. I know where all of the sidewalk bulges are. I know where people smoke in their driveways. It really helps lighten the mental load and keep me safe.


Wednesday, March 19, 2014

Primitive Tooling

I regularly see posts on Twitter about some new whiz-bang shell or vim plugin that automates some task, making it more pleasant. I almost never use any of them.

The reasons why can be traced back to my first job as a systems administrator, where I had to learn Solaris on the fly. I learned most of my Unix-fu on Linux, where I learned on bash. The shop I was working at used the Bourne shell that shipped with Solaris 2.6 and 2.7. This was a frustrating time because many things that were easy in bash didn't work at all in /bin/sh, so I started 'downgrading' to the Bourne shell way of doing things.

A couple years passed and I had gotten to the point where I pretty much used only basic /bin/sh and was mostly happy with that. Then along came HP-UX and its POSIX shell, which is just like Bourne shell only maddeningly different in a few places. Since the environment I was working in had both Linux and HP-UX in production, I usually wrote scripts to support both and it was tedious. I talked my team into switching to pdksh everywhere and things were mostly better. I got a stable environment across operating systems while the experienced HP-UX admins got an experience more like what they were used to on Linux. Win-win for the win.

Eventually, AT&T opened up the code to the original Korn shell and the pdksh project stagnated. As far as I know, it still compiles on Linux, but it did not work very well on HP-UX / Itanium without patches. I wrote the patches, but if you know anything about HP-UX shops, you know that they tend to be conservative so pdksh was out. And now I'm back to writing POSIX shell code.

That's just one part of the story, but I think it demonstrates why I still stick to primitives a decade later. With every move between versions of shells, Linux distributions, FreeBSD, OpenSolaris, etc. I've simply found it easier to stick with primitives and avoid any heavily modified environments. These days, I do default to bash and use bash syntax regularly. I don't work on any operating systems that don't ship bash. I also start my shell scripts with #!/bin/bash instead of #!/bin/sh so the dash thing never bothered me a bit.

My vim setup is similarly minimal. My ~/.vimrc is around 30 lines and has changed very little in the last decade. I added Pathogen a couple years ago, but I only really use it for filetype and syntax plugins. I also only use about 5% of vim's features, mostly things that work in vi. Just like with the shells, I can log into almost any Unix system around and get up and running with vi without installing anything.

Over the years I have applied the approach to more things. I've always used the default clients for social media. I don't use browser plugins (except for rikaikun). I only install a half-dozen or so applications on my Mac. My Windows machines have no OS-level plugins. My preferred desktop on Linux is XCFE with a basically default config, no compositing, all shininess disabled. My phone has a stock ROM and I use the default UI settings most of the time.

It's not for everybody, but it does have some advantages:

1.) it takes me about 5 minutes to get set up on a new workstation
2.) I rarely have to install anything extra on machines I admin
3.) the few rcfiles I use are easy to move around with `cat > ~/.rcfile` and ctrl-d
4.) the few aliases I use are noops when not present (e.g. ls --color)
5.) I don't have to remember or think about trivial details very often

Like every decision, it's a tradeoff. I'm trading off small efficiencies every day to avoid having to memorize lots of things. My editor isn't as powerful as yours, but I always have my editor even when I'm not root. My shell is boring, but whether I'm root, atobey, or any other account it works the same.

I also tend to apply similar thinking to code, which is probably one of the reasons I like Go so much. A small syntax is easy to remember and apply consistently. I also don't put as much emphasis on code reuse as I see others do. The backflips required to make code actually reusable is rarely worth the effort and increases testing burden, so why bother? It's another tradeoff obviously and one I default to with increasing frequency as I grow in experience.


Monday, March 10, 2014

Open Source Video Production

I'm always looking for new ways to educate my audience and have recently started setting up my workstation for recording video to post on Youtube. While I could have gone down the easy road and used my Mac laptop for all of this, I wanted to see what it would take to make it all work on my Linux workstation.

There are lots of application choices for capturing and recording video on Linux. Unfortunately, most of the user interfaces I tried were buggy or totally inflexible. Being a long-time Unix user, I started looking around for more powerful tools on the command line. I was not disappointed.

There's VLC and cvlc for command-line work. In fact, VLC is probably the easiest choice for any kind of media and generally works out of the box. I made a couple of test recordings with VLC and it works fine. Since figuring out what to do with VLC took less than five minutes, I figured I'd look for something more challenging and, hopefully, a little more flexible.

Gstreamer is a framework and libraries for working with media on lots of platforms, but it really shines on Linux. It's meant to be used as a library, but it also comes with a useful pair of CLI tools in the form of gst-inspect and gst-launch. Working with these is really confusing at first since the intent behind the syntax of available examples is not parseable without in-depth knowledge of how gstreamer works. Initial experiments showed that I could do what I want and more. For example, here's how to record your webcam to a file:


I had this idea of recording an overhead camera simultaneously with a forward camera, so I decided to invest some time into learning gstreamer well enough to use it. I was having a hard time figuring out how to do complex flows in gst-launch because it requires an understanding of how the gstreamer API works. I decided to try the Python API to see if it was any easier. It's not Python's fault but that way lies madness. There are currently two major versions of the gstreamer API available, 0.10 and 1.0. This creates problems for gst-launch and python alike. The vast majority of examples are for 0.10. Packages are available for both. I couldn't find a usable combination so I gave up and Googled "golang gstreamer".

Even though the Go bindings for gstreamer are minimal, I did manage to get them working with gstreamer 1.2 and got an example program going.



My modified bindings for Go/gstreamer are available on Github at https://github.com/tobert/gst.

While I love programming in Go, this was another dead end. The bindings don't have all of the API endpoints I need and, more importantly, 'go build' with cgo involved was really slowing me down so I decided to try gst-launch again using what I've learned.



That works, but what I really want is to do a whole lot more. How about capturing both webcams, audio, and a terminal window all in one go? Yup, it can do that.


And finally, here's the result. I'll switch to a larger font in the terminal in the future, but you get the idea.