A quick coding style suggestion
Posted: April 14th, 2009 | Author: Jim Thomason | Filed under: Programming | 2 Comments »I am going to keep this short and sweet and wax poetic about a certain programming idiom that irks me to no end, and provide my preferred alternative.
Now, most Perl programmers know that it’s very useful to pass in a hash (or hashref!) of parameters for functions that are very very long. It’s useful and keeps things tidy.
Compare, which would you rather read:
function(0, 'true', 'apple', undef, undef, undef, undef, undef, 'porcupine');
or
function( 'reptile_count' => 0, 'in_forest' => 'true', 'preferred_fruit'=> 'apple', 'alternative' => undef, 'nomenclature' => undef, 'predator' => undef, 'food_source' => undef, 'known_zoos' => undef, 'animal' => 'porcupine', );
Even better, all of those undefs are superfluous, since the key’s value is already undef. We don’t need to pass them in. Well, unless of course there’s a default value that we instead want to explicitly set to undef, but more on that in a second.
function( 'reptile_count' => 0, 'in_forest' => 'true', 'preferred_fruit' => 'apple', 'animal' => 'porcupine', );
And it’s much tidier. Just by looking at the function, you can probably guess that the animal you’re dealing with is a porcupine, he’s in the forest, he likes apples, and there are no reptiles about. Good luck figuring that out from the original version of the function with all the unnamed parameters. Very tidy, very neat, very compact.
But then, almost inevitably, I see functions that have huge blocks of code at the top of them to copy out the hash arguments and set defaults.
sub function { my %args = @_; my $reptile_count = $args{'reptile_count'} || 0; my $in_forest = $args{'in_forest'} || 'false'; my $preferred_fruit = $args{'preferred_fruit'}; my $alternative = $args{'alternative'}; my $nomenclature = $args{'nomenclature'}; my $predator = $args{'predator'} || 'humans'; my $food_source = $args{'food_source'}; my $known_zoos = $args{'known_zoos'}; my $animal = $args{'animal'}; . . . do_something($animal); }
And it just gets worse as more and more arguments are added. The block grows gigantically and it’s just a huge messy wart. Line after line after line of useless setup information.
Plus, there’s a subtle issue which sometimes pops up. Take a look at the example function, for the predator variable. If no value is passed in, then it uses the default of “humans”. But what if you actually wanted to set it to undef? Say the porcupine actually has no predators (does it?). This function won’t even allow it. Admittedly, that may be a good thing. Or it may not. It can be a subtle bug to track down.
Secondly, tell me quickly, at a glance, which variables have defaults and what are they? You have to scan the whole block to see them. Admittedly, proper columnation of your defaults on the right side will help, but still.
My preference is to just leave all of the passed in arguments in a hash and be done with it. No need to copy out into separate variables. Sure, you have a smidgen bit more typing later in your function, but you also have the added advantage of knowing exactly when you’re operating with an argument to your function as opposed to a variable you created yourself.
Compare my version:
sub function { my %args = @_; . . . do_something($args{'animal'}); }
Done! Much simpler. “But wait!”, you protest. “You’re no longer setting the defaults!” Easily done. Make them part of your hash definition:
sub function { my %args = ( 'reptile_count' => 0, 'in_forest' => 'false', 'predator' => 'humans', @_ ); . . . do_something($args{'animal'}); }
Ta da! You end up with much cleaner, simpler code. At a glance, you can see at the top of your function just which arguments have defaults set for them.
As an added bonus, you can now explicitly set “predator” to undef and that’s what your function will get. Here, you’re interpreting a list as a hash, so whichever key definition occurs last wins. Early on, it sets ‘predator’ to ‘humans’ (the default), but then inside of @_ we may have passed in ‘predator’ => undef, so it happily uses that instead.
Okay, but maybe you wanted to ignore keys specifically set to undef. We can still easily do this, we’re just going to define 2 separate hashes.
Finally:
sub function { my %defaults = ( 'reptile_count' => 0, 'in_forest' => 'false', 'predator' => 'humans', ); my %args = @_; #set passed args to defaults, if no value passed in %args = map {$_, $args{$_} || $defaults{$_}} keys %args; . . . do_something($args{'animal'}); }
In my opinion, the resulting code is substantially tidier. You have a small block of defaults at the top, and the rest of the function specifically identifies which variables were arguments to begin with. No big, messy, superfluous wart to wade through.

OK, I’m gonna say I really like this idea. I wonder if it’s some of my code that you’ve been looking at that sets all those defaults in a big block at the top of the function.
map in every function…tidy, but can cause slowness