Odd Map-Reduce failure with MongoDB and Mongoid

by Martin Westin in


I just want to document this for future reference. Google was very unhelpful so this is likely a rare error condition.

 

While making an innocent change to the query portion of a Mongoid map-reduce job it started throwing an exception. This exception: 

failed with error "ns doesn't exist"

Internet wizdom suggested a missing collection. One hit suggested a missing field as the cause. But my collection most decidedly was there. Other map-reduce jobs ran over it just fine. Other map-reduce jobs even ran fine using the exact same query "constraint".

Rails log showed all being well. Even the exception (displaying the map-reduce)  showed the expected query. Nothing indicated what was wrong.

Not to go too deep into the troubleshooting, the cause turned out to be a laziness an scope issue. The value for the query was out of scope when the lazy Mongoid processing got to it.

This would throw the exception.

Modelname.mongo_session.command(
mapreduce: Modelname.collection.name,
map: map_function,
reduce: reduce_function,
query: {"value.deep_value" => current_user.associated_thing.some_value},
out: {inline:1}
)

While this would not. 

deep_value = current_user.associated_thing.some_value
Modelname.mongo_session.command(
mapreduce: Modelname.collection.name,
map: map_function,
reduce: reduce_function,
query: {"value.deep_value" => deep_value},
out: {inline:1}
)

My guess is that it failed because the results were just put into an instance variable and not accessed until we were out in the view. Weird, yes. Reading Mongoid's and Moped's source might confirm or provide some other reason. From a practical standpoint though, assigning a local variable worked and I have better things to do right now :)

 

 

 

Rails link_to with urls

by Martin Westin in


Being something I do seldom this little gotcha caught me today.

So, you want to put a link into an email template or a rendered pdf or something similar that requires the full url of your route. Being an email or a pdf it is also likely that the contents will be printed to just rendering our the entire url in the template can be a good thing. However...

# the basic
link_to something_url
# is not the same as
link_to something_url, something_url

The first one looks good. You get the full url including https:// and all. The gotcha is that Rails, being the opinionated and free-thinking framework it is, will strip out the url-part and just put the path part into the href. To make the visible link match the href you actually need to tell Rails explicitly that's what you want.


Writing Specifications

by Martin Westin in


I have been looking for a good workflow for writing structured text. Techincal specifications and other texts that will eventually become a pdf, Word document or similar. This post will be an overview of what I have that works pretty well. I will use Omni Outliner, MultiMarkdown, Pandoc and Rake in glorious harmony to produce the final document.

What I want is...

  • Good control over the structure
  • Flexible options for the writing environment
  • Automatically generated and updated table of contents
  • Separation of content and styling
  • Version control using Git
  • A painless "build process"

Let's see how I ended up working.

Omni Outliner

I have been outlining, and writing, in Omni Outliner for over 5 years and love it as a general purpose list-maker and organizer. In this context, each item is a heading and the notes for that heading is the body text. This works really well but is limiting on larger documents. I installed Fletcher Penney's MultiMarkdown exporter. This spits out an MD file more of less exactly as I want it. Any Markdown syntax I add into Omni Outliner will of-course be treated as markdown. Bold, italic, lists... all can be marked up in Omni Outliner.

Sadly the iOS version of Omni Outliner is more or less useless to me because of the limited file sharing support. I keep everything in Dropbox and without support for that the workflow becomes very problematic. I hope for this in a new version soon. (Yes I have used DropDAV. No good for me.)

Markdown Editors

I am still evaluating the iOS options. My favorite Markdown app for the Mac, though, is ByWord. It just feels like I am typing fast and am being productive when I am in ByWord.

Another essential (as will become evident soon) app is MultiMarkdown Composer by the aforementioned Fletcher. This app has the killer feature of being able to understand MultiMarkdown and to faithfully export that to many formats. It also has a very nice structure view in the sidebar that makes an OO-refugee feel more at home.

You can also do the trick of exporting you Markdown back to OO by exporting OPML from MMC and opening that again in OO... great stuff. Let's say that again: Using OPML as the file format allows me to instantly jump back and forth between MultiMarkdown Composer (for writing long form) and Omni Outliner (for structuring the document). Shuffling sections (chapters) around has never been easier.

The Multi in Markdown

MultiMarkdown is best explained as Markdown with a few things added to it to better support things that I have found I need in technical documents. One main feature is support for internal references allowing me to reference a section elsewhere in a large specification. E.G. "This uses Feature X which is described in [Common UI Features][]" Those brackets generate an internal link to that section and the extra brackets even allow be to be more casual in my naming of the reference and put the actual heading name inside the second pair of brackets.

Getting It Into Word, or not

I have found a good workflow where I can completely avoid using Microsoft Word. It has been a big win for me to be able to instead use a toolchain more familiar to a software developer.

If you really want to end up in Word

MultiMarkdown Composer does not handle internal references when exporting to a Word document. You have to do a round-trip into Libre Office first if you want to keep those. Yes Libre Office. OpenOffice.org couldn't open the file MMC exported. Neither could Pages or Word itself. NeoOffice cost money these days and I had no desire to pay just to try this one thing.

You open this Flat Open Document and immediately save it again as a Word document and it is ready for Word. The version I downloaded crashed every time when saving a modern Word docx document. Saving as the older doc format (for Word 2003) worked. Good Times™

Pandoc Pipeline With Rake

If you are not a Ruby developer you are probably wondering how I ended up writing about gardening. Rake is actually framework in Ruby to run tasks or macros. In this context it allows you to write pretty advanced automation for converting your Marcdown texts into formatted pdf documents with table of contents, nice design and all sorts of good stuff.

Pandoc is a document format converter. It can take Markdown, for example, and turn it into html, pdf and even a word document. Notably with a little help it can give you a pretty solid E-book build process by taking the pdf and processing it into Epub and Mobi formats for iPads and Kindles respectively.

My rake task locates all markdown documents in a certain folder and processes them all using Pandoc finally outputting nice looking pdfs in a separate folder.

Standing On The Sholders of Giants

I have based my Rake and Pandoc pipeline heavily on that used by Thoughtbot for their E-books. Go to their products page and buy any one of their books and you will get acces to the "source code". That is their entire markdown source text and their rake tools to build the books. Awesome.

Latex, The Not So Fun Part

I leaned heavily on the Latex documentation and examples to hack my way towards to document that looks the way I want. Latex is a horrible nasty markup language for electronic book layouts. It is very powerful but oh so strange and hard to read. If you grit your teeth enough you will be able to style your documents to look pretty much the way you want them to. It is just not a lot of fun but once done that template can be applied to any future document you write.

Version Control

Markdown is just text. Rake is just source code. Latex is also code/text. All this is very well suited to version control. Using Git I get all the conflict resulition help you can ask for and a solid system for tracking changes over time.

Quick Note On Pages

Pages does have a structure mode that looks pretty good. I havent played with it extensively but if you don't want the massive conversion mess described above, then Pages may support enough structure manipulation for you to be happy there... if the iOS integration is to your liking that is. Which is my other reason for going with Markdown.

Disclaimer

Some readers may shake their head and say: I can do all that, and more, in Microsoft Word. Well, good for you. I couldn't figure out that program if my life depended on it. If you know it well and can operate on the structure of your document, move things around, automatically re-assign formatting based on the relative position of something in the overall hierarchy... I'm happy for you and would actually love to know how you work. Personally, I cannot even get Word to properly paste text without going to through the menu using 4-5 clicks.


Shelling out from a ruby app to a ruby app without bundler conflicts

by Martin Westin in


My case is that I have a Rails app. It uses Bundler to manage its Gems. I also need to run some processing using an old version of one of my own libraries which also has dependencies and gems that need resolving. I chose to make this a small command line app with an executable ruby file. This little tool uses Bundler to manage its Gems.

I will shell out to this executable from my Rails app. Simple, right? I thought it would, but there was a gotcha. The Rails app executes the external ruby file within the same "bundle" as the Rails app. I.E. I got the current versions of my lib and all gems.

After much Googling I combined one Stack Overflow answer with a note on some blog (lost both references).

some_result_value = ""
Bundler.with_clean_env do
  Dir.chdir "/path/to/rubytool"
  some_result_value = `./bin/rubytool param1 param2`
end

The key details here are the clean environment AND that I change directory before executing. I don't understand why changing the folder would be significant. It may even have been bad late-night mojo.

Anyway, as long as the command line tool being called uses bundler correctly it works ar intended. It is run with its own bundle of gems.

I would like to find a way to control this from the receiving ruby script but I have not looked into that yet.


Looping through large and slow datasets in Ruby

by Martin Westin in ,


I recently had the pleasure of needing to load 30'000 records from MongoDB and then performing slow and memory intensive processing on them. Basically you can imagine it as a database of videos and MongoDB was holding the metadata and other bits but the actual video files were on disk somewhere. My parsing involved loading in the entire video in memory and doing "stuff" with it as part of my model object. This is how I did it.

Take 1

At first I just did the normal Model.all.each... This worked fine for smaller datasets but on larger sets the whole thing would crash after 40 to 60 minutes (I never timed this in detail). MongoDB had timed out and I figured out that my ODM (Mongoid) was keeping an open iterator in MongoDB and fetching one document at a time from the DB... and after an hour or so the DB had had enough.

Take 2

It was of-course trivial to force Mongoid to load the whole dataset in one go using Model.all.to_a.each... Before thinking further I set this version going. It crashed a lot faster than the first version. The reason is that each of my objects stay in the array, and in memory, and adding anywhere from 5 to 500 MB of videodata to each quickly ate all RAM I had.

Take 3

The small and funky change fixed this, making my script both time and ram "proof". This is how I will start out next time I have a long-running task.

all = Model.all.to_a # these are just simple Rails models
while one = all.pop # this is memory management
    one.do_heavy_processing # this loads in a ton of crap
end

By popping them off one by one and re-using the local variable Ruby's GC takes pretty good care of keeping the memory to a minimum.


Transitioning to more secure passwords

by Martin Westin in


With all the news of hacked databases (mostly at Sony) and the clear-text or poorly hashed passwords in their datasets, I thought I might offer my standard trick for transitioning to a more secure form of hashing. I think some sites don't change passwords security for fear of annoying users or the workload involved in managing a transition. This simple technique is completely invisible to the user and very low maintenance for the developer.

I will be giving examples from the Devise library for Rails apps, since I recently implemented it there.

The technique is very very simple

You configure your authentication to check passwords against both the old and the new form of hashed password. And when you find a match for the old hash you update your database with the version of the password encoded using the new hash. You keep this dual check in place until all (or most likely most) of your users have logged in and had their passwords changed. The unlucky few can use your password recovery feature if you have one.

Metacode of the basic principle:

if new_hash(password) == stored_password
  // ALLOW LOGIN USING AN UP-TO-DATE PASS
else
  if old_hash(password) == stored_password
    // UPDATE PASSWORD IN DB
    // ALLOW LOGIN
  else
    // DISALLOW LOGIN
  end
end

How to implement this transition in Devise

I implemented this by overriding the method valid_password? injected into your User model.

class User

  def valid_password?(incoming_password)
    result = super incoming_password
    if !result
      # try old encryptor during transition
      digest = Devise::Encryptors::LegacyEncryptor.digest(incoming_password, self.class.stretches, self.password_salt, self.class.pepper)
      result = Devise.secure_compare(digest, self.encrypted_password)
      if result
        # update password to use new encryptor when there is a match
        self.password = incoming_password
        self.save
      end
    end
    result
  end

end

Fairly simple. You may need to hard-code some parameters (salt, stretching, pepper) if they cause problems.

If you are changing from, say, sha1 to sha256, you can easily check the character lengths of the passwords in your database to check the "adoption rate" of the new hashes.

Implications on Security

You should realize that you ARE lowering your security level slightly by effectively allowing 2 different password checks. In reality this problem is small and only really matters if you have plain passwords you are transitioning from (and you really shouldn't have). The problem then becomes real since I could login using a stolen new (supposedly) secure hash as the given password. In this case I would definitely disallow any password of the same length as, or simple reg-ex match for, your new hashing system to avoid this hole.

You will also not fully benefit from the new hashing system until you remove the "dual check" after a reasonable period of time.

If you can live with that to gain the benefits of a clean migration for you and your users this is a nice technique. I know from reading and talking to developers that I am far from the only of the first to come up with something like this. Many apps and sites have used and continue to use this kind of technique to beef-up password hash-strength without bothering users.


Graylog2 on Mac OS X

by Martin Westin in


I have been playing with Graylog2 on my Mac today. Since the setup guides are all for Debian and not fully compatible with Mac OS X I thought I'd mention the changes I needed to make to get thing rolling smoothly. The guides are good, so go read them in the wikis on Github. I won't re-iterate them, only point out the minor changes and tweaks I had to make.

Graylog2 comes in two main parts. The server and the web interface. I'll start with the server component.

Install The Server

https://github.com/Graylog2/graylog2-server/wiki/Installing

Mac OS X has java bundled with the OS (for now). There is no need to install anything. The configuration file needs one non-obvious tweak.


mongodb_host = 127.0.0.1 # localhost

Java resolves localhost to the strangest thing. It tries to connect to the Bonjour name and external IP (e.g. Martin's Mac/192.168.0.2) instead of 127.0.0.1 which is what you want. Instead of opening MongoDB up to external access I changed the configuration to point to the loopback IP directly.

Starting The Server

https://github.com/Graylog2/graylog2-server/wiki/Starting-the-server

I didn't get the daemon script to start and did not investigate is since I run Graylog2 for evaluation and development and like seeing the output. Starting by running the jar file requires that you sudo.


sudo java -jar graylog2-server.jar debug

That gets Graylog2 running and spitting out a lot of fun info so you know you are logging thing as you expect.

Installing The Web Interface

https://github.com/Graylog2/graylog2-web-interface/wiki/Installing-the-web-interface-on-Debian-5.0

You can follow most of those steps if you don't have rails and Bundler and that stuff installed. For testing and development, I would suggest running the interface using Passenger Standalone instead of Apache. And, you install Passenger as a gem and not apt, of-course.

http://www.modrails.com/documentation/Users%20guide%20Standalone.html

The cool thing about installing passenger standalone is that it will compile and run itself the first time you call passenger start. It will take a few minutes that first time but after that it will start instantly.

Logging from your Rails app

https://github.com/Graylog2/graylog2_exceptions

In the Rails app I want to log from I installed Graylog2 Exceptions. It is a small Rack middleware with practically no configuration. Only problem is that it has not been updated to comply with the current version of the Graylog2 server. Until it is updated, you have to modify the source for it. A very small mod. For me it is ok as long as I am still on my Mac and not a server.

first

> cd /to/my/app/dir
> bundle open graylog2_exceptions

This should get you the installed gem open in your editor. In the file lib/graylog2_exceptions.rb you need to add the version parameter to the notification message. Possibly this should be added to the gelf gem instead. I am not sure how that version string is supposed to be used.

Here is the modified method that does the actual notification:

  def send_to_graylog2 err
    begin
      notifier = GELF::Notifier.new(@args[:hostname], @args[:port])
      puts notifier.notify!(
        :version => "1.0",
        :short_message => err.message, # <- this line is new!!!
        :full_message => err.backtrace.join("\n"),
        :level => @args[:level],
        :host => @args[:local_app_name],
        :file => err.backtrace[0].split(":")[0],
        :line => err.backtrace[0].split(":")[1]
      )
    rescue => i_err
      puts "Graylog2 Exception logger. Could not send message: " + i_err.message
    end
  end

So, that is it. Finally I get all my exceptions in Graylog2. To try it out you can just raise some dummy exception – raise "Dummy Exception Error" – here and there and see them pop up in Graylog2.


SOAP with Attachments in Ruby

by Martin Westin in


I found myself once again facing SOAP. This abomination of a protocol they even have the nerve to call "web services" is not my favorite type of API to interface with (how did you guess?). I think probably the only language with any decent support is Java and possibly .net. Neither rank among my favorite languages either. Funny that. My bigger problem is that the service I am interfacing with is noting as simple as sending an integer and getting an integer back. It requires that I post a multipart/mime SOAP message (aka SOAP with Attachments afaik). This is something that most SOAP libraries are not too keen on supporting.

What are multipart SOAP messages?

In short they are encoded a kind-of like email messages and their attachments but sent using http to a SOAP endpoint. The normal SOAP message becomes one of the mime parts and any other parts are called attachments and usually referenced from inside the SOAP message.

A little history

A few years ago in PHP I was stuck using NuSOAP and ended up basically bypassing most of NuSOAP and encoding the attachments and doing all that myself. The code was a real mess.

Last week I got to do it all over again. This time in Ruby. At work, we are porting our entire platform to Ruby, but detailing that process might be a post in itself. I was so happy when I found that soap4r has support for mime messages. Then I tried to use soap4r. Long story short. I liked it so much I chose to go with Savon instead... which has no mime support.

What I ended up with

The results of my efforts is not pretty by Ruby standards but a lot better than my old code in php. I patched Savon in two places. One to enable any namespace on the SOAP body (which is otherwise hard-coded to "wsdl") and has little to do with mime messages.

The other place was to intercept the output and check if the SOAP object had any attachments (parts) added to it. If so, it will take the intended output and encode that as a mime part and then encode the other parts and put it all together as a nice big http packet ready for posting.

I think it best if I just show the code now.

Any questions posted to the gist or here will be adressed to the best of my abilities.