Odd Map-Reduce failure with MongoDB and Mongoid

by Martin Westin in


I just want to document this for future reference. Google was very unhelpful so this is likely a rare error condition.

 

While making an innocent change to the query portion of a Mongoid map-reduce job it started throwing an exception. This exception: 

failed with error "ns doesn't exist"

Internet wizdom suggested a missing collection. One hit suggested a missing field as the cause. But my collection most decidedly was there. Other map-reduce jobs ran over it just fine. Other map-reduce jobs even ran fine using the exact same query "constraint".

Rails log showed all being well. Even the exception (displaying the map-reduce)  showed the expected query. Nothing indicated what was wrong.

Not to go too deep into the troubleshooting, the cause turned out to be a laziness an scope issue. The value for the query was out of scope when the lazy Mongoid processing got to it.

This would throw the exception.

Modelname.mongo_session.command(
mapreduce: Modelname.collection.name,
map: map_function,
reduce: reduce_function,
query: {"value.deep_value" => current_user.associated_thing.some_value},
out: {inline:1}
)

While this would not. 

deep_value = current_user.associated_thing.some_value
Modelname.mongo_session.command(
mapreduce: Modelname.collection.name,
map: map_function,
reduce: reduce_function,
query: {"value.deep_value" => deep_value},
out: {inline:1}
)

My guess is that it failed because the results were just put into an instance variable and not accessed until we were out in the view. Weird, yes. Reading Mongoid's and Moped's source might confirm or provide some other reason. From a practical standpoint though, assigning a local variable worked and I have better things to do right now :)

 

 

 

Rails link_to with urls

by Martin Westin in


Being something I do seldom this little gotcha caught me today.

So, you want to put a link into an email template or a rendered pdf or something similar that requires the full url of your route. Being an email or a pdf it is also likely that the contents will be printed to just rendering our the entire url in the template can be a good thing. However...

# the basic
link_to something_url
# is not the same as
link_to something_url, something_url

The first one looks good. You get the full url including https:// and all. The gotcha is that Rails, being the opinionated and free-thinking framework it is, will strip out the url-part and just put the path part into the href. To make the visible link match the href you actually need to tell Rails explicitly that's what you want.


Shelling out from a ruby app to a ruby app without bundler conflicts

by Martin Westin in


My case is that I have a Rails app. It uses Bundler to manage its Gems. I also need to run some processing using an old version of one of my own libraries which also has dependencies and gems that need resolving. I chose to make this a small command line app with an executable ruby file. This little tool uses Bundler to manage its Gems.

I will shell out to this executable from my Rails app. Simple, right? I thought it would, but there was a gotcha. The Rails app executes the external ruby file within the same "bundle" as the Rails app. I.E. I got the current versions of my lib and all gems.

After much Googling I combined one Stack Overflow answer with a note on some blog (lost both references).

some_result_value = ""
Bundler.with_clean_env do
  Dir.chdir "/path/to/rubytool"
  some_result_value = `./bin/rubytool param1 param2`
end

The key details here are the clean environment AND that I change directory before executing. I don't understand why changing the folder would be significant. It may even have been bad late-night mojo.

Anyway, as long as the command line tool being called uses bundler correctly it works ar intended. It is run with its own bundle of gems.

I would like to find a way to control this from the receiving ruby script but I have not looked into that yet.


Looping through large and slow datasets in Ruby

by Martin Westin in ,


I recently had the pleasure of needing to load 30'000 records from MongoDB and then performing slow and memory intensive processing on them. Basically you can imagine it as a database of videos and MongoDB was holding the metadata and other bits but the actual video files were on disk somewhere. My parsing involved loading in the entire video in memory and doing "stuff" with it as part of my model object. This is how I did it.

Take 1

At first I just did the normal Model.all.each... This worked fine for smaller datasets but on larger sets the whole thing would crash after 40 to 60 minutes (I never timed this in detail). MongoDB had timed out and I figured out that my ODM (Mongoid) was keeping an open iterator in MongoDB and fetching one document at a time from the DB... and after an hour or so the DB had had enough.

Take 2

It was of-course trivial to force Mongoid to load the whole dataset in one go using Model.all.to_a.each... Before thinking further I set this version going. It crashed a lot faster than the first version. The reason is that each of my objects stay in the array, and in memory, and adding anywhere from 5 to 500 MB of videodata to each quickly ate all RAM I had.

Take 3

The small and funky change fixed this, making my script both time and ram "proof". This is how I will start out next time I have a long-running task.

all = Model.all.to_a # these are just simple Rails models
while one = all.pop # this is memory management
    one.do_heavy_processing # this loads in a ton of crap
end

By popping them off one by one and re-using the local variable Ruby's GC takes pretty good care of keeping the memory to a minimum.


Quick Tip: iOS and web graphics in Illustrator

by Martin Westin in


I wanted to make a note of one of these things I keep forgetting in Adobe Illustrator. It is very simple, obvious and keeps me sane while drawing UI elements.

Make sure "Align to Pixel Grid" is actually selected!

That gets you out of so much trouble. Problem is, you need to keep an eye on the transform pane since this is a per-object setting. It is nog global to everything on your canvas.

Of-course the old-school technique of placing the strokes outside (or inside) the object bounds still works and can actually help you in transferring dimensions from the canvas into CSS (which puts borders "outside" when working correctly).

(I can't believe they still haven't fixed the rounding errors that randomly occur when drawing and moving objects around.)


Rails migration of indexes

by Martin Westin in ,


A small gotcha when changing indexes in a migration. To change an index one has to first remove it and then add it again. Removing an index is the tricky part. The documentation states: remove_index(table_name, index_name): Removes the index specified by index_name.

This is not strictly true as it turns out. The docs should probably say: remove_index(table_name, column_name)

The crux is that one cannot use this syntax to remove a named index. Rails assumes the index is named something like "tablename_columnname_index" or something similar.

To remove a named index one has to use the block syntax afaik:

change_table :tablename do |t|
  t.remove_index :name => :indexname
  t.index ["columnname"], :name => "indexname", :unique => true
end

Nginx + Wordpress caching that actually works

by Martin Westin in ,


I spent a lot of time yesterday trying to enable WP Super Cache, and subsequently W3 Total Cache for this website. SInce none of the hits I got on Google did the trick I thought I'd post my working settings for page caching with W3 Total Cache. I went with this plugin mainly because it uses a logical hierarchy of readable folders and files. WP Super Cache did not which is why I eventually dropped it and tried the Total Cache plugin.

I have Nginx and PHP FastCGI. No Apache. The "problem" with this setup is the rewrite rules needed to point visitors to cached pages if they exist. Installing the plugin is as simple as anything in WP these days so I wont go into that.

Here it is:

## W3 Total CACHE BEGIN
set $totalcache_file '';
set $totalcache_uri $request_uri;

if ($request_method = POST) {
  set $totalcache_uri '';
}

# Using pretty permalinks, so bypass the cache for any query string
if ($query_string) {
  set $totalcache_uri '';
}

if ($http_cookie ~* "comment_author_|wordpress|wp-postpass_" ) {
  set $totalcache_uri '';
}

# if we haven't bypassed the cache, specify our totalcache file
if ($totalcache_uri ~ ^(.+)$) {
  set $totalcache_file /wp-content/w3tc/pgcache/$1/_index.html;
}

# only rewrite to the totalcache file if it actually exists
if (-f $document_root$totalcache_file) {
  rewrite ^(.*)$ $totalcache_file;
  break;
}                 

##W3 Total CACHE END

If you are wondering what to do with these lined of code... I did not come up with them myself. I did a minor change to make them work with the current version of WP Total Cache and my installation.

The blueprint came from here: http://wpveda.com/nginx-rewrite-rules-for-w3-total-cache-plugin/

What I did was to change the filename. Also. If you, like me, have wp in a folder you would add that to the path as well.

set $totalcache_file /wp-content/w3tc/pgcache/$1/_index.html;
set $totalcache_file /your-wp-folder/wp-content/w3tc/pgcache/$1/_index.html;

There are a dozen other sites with variations on this code. None worked for me straight away... If you have similar experience, maybe my version will work for you.

When, or if, I get some of the other rewrite-dependent features working I'll add another post.


Javascript variable definitions

by Martin Westin in


The variable scope in javascript can still surprise me. This is a nice one I debugged today: A variable that is not defined by "var" have a global scope. This can be a real pain since the globalization is implicit and easy to forget.

I forgot the "var" when writing a for loop and could not for the life of me figure out what happened when the i-variable got mysteriously reset.

This is an example of a foor loop (in one()) that will never finnish since i is being reset in two(). Defining "var i=0" and all is well.

function one() {
	for (i=0;i<20;i++){
		if (i > 18) {
			two();
		}
	}
}
function two() {
	for (i=0;i<5;i++){
	}
	alert('two');
}

Finding ordered position of a row in MySQL

by Martin Westin in


This is slightly modified from something I found @ http://www.kirupa.com/forum/archive/index.php/t-263260.html Using high-score lists or similar you often have a large number of rows where you want to know who is in 7th place or how well is id:254 doing. This eliminates the slow looping in php of big result-sets by making a sub-query in MySQL.

SET @rownum := 0;

SELECT * FROM (
SELECT @rownum := @rownum+1 AS rank, id, points
FROM highscores 
ORDER BY points DESC
) AS highscores WHERE id = 254;