22 Jun 2015, 13:02

Postgres search in preview

It’s been stewing for way too long, moving in fits and starts over the last several months, as I’ve been busy with work stuff, real life stuff, and switching jobs stuff, but I’m pleased to announce that a new search backend is available for goiardi. Instead of just using the ersatz Solr trie-based search that goiardi’s had for a while, you can use a Postgres based search backend for Chef searches.

Since this is a pretty big thing, I’m giving it some time to stew in testing before merging it into master and making a release. Right now, it’s in the goiardi 0.10.0-dev branch.

Motivation

Goiardi’s search is fine for smaller workloads, but once you had a few hundred or so nodes in place it began to start chugging and getting very slow and the memory usage would begin getting too high. Solr, on the other hand, while a fine product, never seemed like it was quite the right tool for the job here. I thought that having search be done in Postgres was the way forward.

Testing it out

The postgres search is not yet in master (obviously), so you’ll need to install goiardi from source. Follow the instructions on the installation notes in the goardi documenation. After you run go get, you’ll need to cd into the goiardi directory (so something like cd $HOME/go/src/github.com/ctdk/goiardi), checkout the 0.10.0-dev branch with git checkout 0.10.0-dev, and then run go install github.com/ctdk/goiardi.

Once goiardi is installed, you’ll need to make the configuration file. Use the sample file in etc/ in the goiardi repository. The important settings are the postgres options (obviously) and the “Postgres and advanced search” options in that file. You’ll need to set the local-filestore-dir, and comment out or remove the data-file and index-file options.

To set up the database properly, install sqitch and checkout the pg-search branch of goiardi-schema (or use the not-finalized sqitch postgres bundle inside the goiardi source) as described in goiardi’s postgres documentation.

Once it’s up and running and you can connect to the goiardi instance, it’s time to start playing around. You could slowly create a bunch of nodes, roles, and what have you, but an even easier way to go about it is to use this node builder ruby script. It requires the chef-api (version 0.5.0) and fauxhai gems.

Take that script, fill in your hostname and path to your admin client’s key (making sure you can read it, of course), customize how many nodes you want to create on line 31, and let ‘er rip. It will run for a while, but once it’s done you can start throwing searches at it to see how it performs.

It’s also totally possible to just use it normally as you’d use a chef server, of course.

Caveats

  • Obviously this requires Postgres. The in-mem/file and MySQL data stores have to use the old search.
  • This should be able to handle all normal use cases for Chef search, and almost all weird cases. It’s totally possible to create a situation where you get weird results back, however. The big ones are that fuzzy and distance searches will not behave like you may have hoped, and that because ltree indices in Postgres cannot accept arbitrary characters, but only alphanumeric (plus ‘_‘) characters with ‘.’ as a path separator, goiardi has to convert attributes and search fields with forbidden characters to an acceptable alternative. This shouldn’t be a problem, but it’s true that if you were to have attributes named both “/dev/xvda1” and “dev_xvda1” you might not get the search results you expect. My advice at this moment is “don’t do that”.
  • The goiardi postgres search has been tested with up to 10,000 nodes generated with fauxhai without problems, but it’s very new. It’s quite likely that there are situations where the traditional Chef search with Solr is the better choice. Right now you need to use erchef for the Solr search, but now that it’s possible to add arbitrary search backends to goiardi real solr support may come someday.
  • Ideally, though, the pg-search should be able to handle any search query you toss at it (excepting the issues above with key names, and fuzzy and distance searches). If you find a bug in the Solr query parser where it chokes on a totally legitimate query, please report it as a bug.

Future

This search needn’t be limited to just goiardi. Per the proof of concept seen in the standalone goiardi universe server, this goiardi postgres search could also be carved out and run as a standalone service. Unfortunately that won’t really work until it gets integrated back into the 1.0.0 branch and development there restarts, but it’s an exciting concept. For multi-organization implementation, each organization will probably have their own search schema, rather than keeping them all together. It may or may not ever be a good fit for, say, Hosted Chef, but might work well in a self hosted installation.

Shoutouts

Major props are due for both @oker1, for providing the initial impetus to finally get this done and providing code to split the indexer and search apart from the backends, and @coderanger for giving me ideas and directing me towards using the ltree index instead of jsonb for searching.

comments powered by Disqus