22 Jun 2015, 13:02

Postgres search in preview

It’s been stewing for way too long, moving in fits and starts over the last several months, as I’ve been busy with work stuff, real life stuff, and switching jobs stuff, but I’m pleased to announce that a new search backend is available for goiardi. Instead of just using the ersatz Solr trie-based search that goiardi’s had for a while, you can use a Postgres based search backend for Chef searches.

Since this is a pretty big thing, I’m giving it some time to stew in testing before merging it into master and making a release. Right now, it’s in the goiardi 0.10.0-dev branch.

Motivation

Goiardi’s search is fine for smaller workloads, but once you had a few hundred or so nodes in place it began to start chugging and getting very slow and the memory usage would begin getting too high. Solr, on the other hand, while a fine product, never seemed like it was quite the right tool for the job here. I thought that having search be done in Postgres was the way forward.

Testing it out

The postgres search is not yet in master (obviously), so you’ll need to install goiardi from source. Follow the instructions on the installation notes in the goardi documenation. After you run go get, you’ll need to cd into the goiardi directory (so something like cd $HOME/go/src/github.com/ctdk/goiardi), checkout the 0.10.0-dev branch with git checkout 0.10.0-dev, and then run go install github.com/ctdk/goiardi.

Once goiardi is installed, you’ll need to make the configuration file. Use the sample file in etc/ in the goiardi repository. The important settings are the postgres options (obviously) and the “Postgres and advanced search” options in that file. You’ll need to set the local-filestore-dir, and comment out or remove the data-file and index-file options.

To set up the database properly, install sqitch and checkout the pg-search branch of goiardi-schema (or use the not-finalized sqitch postgres bundle inside the goiardi source) as described in goiardi’s postgres documentation.

Once it’s up and running and you can connect to the goiardi instance, it’s time to start playing around. You could slowly create a bunch of nodes, roles, and what have you, but an even easier way to go about it is to use this node builder ruby script. It requires the chef-api (version 0.5.0) and fauxhai gems.

Take that script, fill in your hostname and path to your admin client’s key (making sure you can read it, of course), customize how many nodes you want to create on line 31, and let ‘er rip. It will run for a while, but once it’s done you can start throwing searches at it to see how it performs.

It’s also totally possible to just use it normally as you’d use a chef server, of course.

Caveats

  • Obviously this requires Postgres. The in-mem/file and MySQL data stores have to use the old search.
  • This should be able to handle all normal use cases for Chef search, and almost all weird cases. It’s totally possible to create a situation where you get weird results back, however. The big ones are that fuzzy and distance searches will not behave like you may have hoped, and that because ltree indices in Postgres cannot accept arbitrary characters, but only alphanumeric (plus ‘_‘) characters with ‘.’ as a path separator, goiardi has to convert attributes and search fields with forbidden characters to an acceptable alternative. This shouldn’t be a problem, but it’s true that if you were to have attributes named both “/dev/xvda1” and “dev_xvda1” you might not get the search results you expect. My advice at this moment is “don’t do that”.
  • The goiardi postgres search has been tested with up to 10,000 nodes generated with fauxhai without problems, but it’s very new. It’s quite likely that there are situations where the traditional Chef search with Solr is the better choice. Right now you need to use erchef for the Solr search, but now that it’s possible to add arbitrary search backends to goiardi real solr support may come someday.
  • Ideally, though, the pg-search should be able to handle any search query you toss at it (excepting the issues above with key names, and fuzzy and distance searches). If you find a bug in the Solr query parser where it chokes on a totally legitimate query, please report it as a bug.

Future

This search needn’t be limited to just goiardi. Per the proof of concept seen in the standalone goiardi universe server, this goiardi postgres search could also be carved out and run as a standalone service. Unfortunately that won’t really work until it gets integrated back into the 1.0.0 branch and development there restarts, but it’s an exciting concept. For multi-organization implementation, each organization will probably have their own search schema, rather than keeping them all together. It may or may not ever be a good fit for, say, Hosted Chef, but might work well in a self hosted installation.

Shoutouts

Major props are due for both @oker1, for providing the initial impetus to finally get this done and providing code to split the indexer and search apart from the backends, and @coderanger for giving me ideas and directing me towards using the ltree index instead of jsonb for searching.

05 Sep 2014, 19:58

Another Shovey Preview: The Shovey-Jobs Cookbook

Shovey still isn’t finished, but it’s come a long ways. To make it easier to play with and find potential issues, I’ve released a shovey-jobs cookbook to install and configure shovey on a node.

It’s a preliminary cookbook, but since shovey’s pretty preliminary still I’m OK with that. To use the shovey-jobs cookbook, you’ll need to set goiardi and the knife-shove plugin up as explained in this previous post on shovey first. Goiardi still only works with shovey in in-memory mode; the SQL support for shovey hasn’t been finished yet. After goiardi and serf on the goiardi server are set up, spin up a node and run the shovey-jobs cookbook on it, following the instructions in the shovey-jobs README. This cookbook has only been tested on Debian, but it may work with Ubuntu as well. At the moment it’s unlikely to work with RHEL and its derivatives or other operating systems, mostly because the init script is Debian specific.

This cookbook depends on having the serf and golang cookbooks uploaded and in the node’s run list. Serf may need some extra configuration to work correctly; I had to add this to the test node’s attributes for it to work right.

"serf" => {
    "agent" => { 
      "node_name" => "goiardi-test.local",
      "start_join" => [ "10.250.55.108" ],
      "advertise" => "10.34.10.15"
    }
  }

The main serf points are that the serf node’s name needs to be the same as the chef client name on the node, it needs to join the same serf cluster as goiardi’s serf agent is running on, and it needs to advertise the correct address. The last one came up in testing for me because goiardi was running on my dev box, while the shovey node’s serf was advertising the internal vagrant address rather than the shared network address. That should be enough to get you going running commands with shovey. See the knife-shove docs for possible commands there.

On another note, there’s some documentation now for the shovey API. The new thing there is how the shovey client now streams job output back to the server, and the server can in turn stream it to a client. Unfortunately the knife-shove plugin does not yet allow you to watch the job output stream by, but it’s on the list of things to do before the formal release.