2012-09-27

Counting weekdays between dates

Need a method to accurately count the number of weekdays between two days?  [The key here is "accurately", it is a bit harder than it seems at first].  In Python there are several ways to do this, but most involve some iteration or list comprehension.  In my opinion, if you have to do that, you are probably violating the Python idiom of "use the batteries".
A better way to solve this is to use recurrence rules - used every day in scheduling software, groupware, and anything that supports iCalendar.  Recurrence rules in Python are handled by the dateutil module's rrule component.  Here is the code:

from datetime import date, timedelta
from dateutil.rrule import rrule, MO, TU, WE, TH, FR, DAILY
start = date.today( ) - timedelta( days=90 )
end = date.today( ) + timedelta( days=3 )
rule = rrule( DAILY,
              byweekday=( MO, TU, WE, TH, FR ),
              dtstart=start,
              until=end )
print( 'Days: {0}'.format( rule.count( ) )

The date range here is inclusive, it includes both the start and end dates.  One caveat is that if your end date is prior to the start date you will not get an error or exception - you'll just get a recurrence with zero elements.

2012-09-25

Idjit's Guide To Installing RabbitMQ On openSUSE 12.2

The RabbitMQ team provides a generic SUSE RPM which works on openSUSE 11.x, openSUSE 12.1, and I presume on the pay-to-play versions of SuSE Enterprise Server. About the only real dependency for RabbitMQ is the erlang platform which is packaged in the erlang language repo. So the only real trick is getting the RabbitMQ package itself [from this page].  Then install and start is as simple as:
zypper ar http://download.opensuse.org/repositories/devel:/languages:/erlang/openSUSE_12.2 erlang
zypper in erlang
wget http://www.rabbitmq.com/releases/rabbitmq-server/v2.8.6/rabbitmq-server-2.8.6-1.suse.noarch.rpm
rpm -Uvh rabbitmq-server-2.8.6-1.suse.noarch.rpm
Now before you start the rabbitmq-server you need to modify the /etc/init.d/rabbitmq-server file changing "LOCK_FILE=/var/lock/subsys/$NAME" to "LOCK_FILE=/var/run/rabbitmq/$NAME".  The directory "/var/lock/subsys/$NAME" doesn't exist on openSUSE, so this change puts the lock file over under /var/run/rabbitmq along with the PID file.  Otherwise you can create the /var/lock/subsys/$NAME directory with the appropriate permissions.

Every time you modify a service script in /etc/init.d you need to then run systemctl --system daemon-reload so that systemd knows to anticipate the changed file.
If you want to use Rabbit's management interface you now need to enable the appropriate plugins:
rabbitmq-plugins enable rabbitmq_management
rabbitmq-plugins enable rabbitmq_management_visualiser

By default RabbitMQ will listen on all your hosts interfaces for erlang kernel, AMQP, and HTTP (management interface) connections.  Especially in the case of a developement host you may want to restrict the availability of one or all of these services to the local machine.

In order to keep the erlang kernel and Rabbit's AMQ listeners restricted to the local host you'll need to add two exported environment variables to the service script - just put them in following the definition of PID_FILE.
export RABBITMQ_NODENAME=rabbit@localhost
export ERL_EPMD_ADDRESS=127.0.0.1
For the management inteface and other components you'll need to modify [and possibly create] the /etc/rabbitmq/rabbitmq.config configuration file.  RabbitMQ violates the only-have-one-way-to-configure rule of system administration; this is in part due to its reliance on the Erlang runtime - controlling the behavior of the run-time is a [poorly documented] black art.  Both the environment variables and the configuration file are required to restrict all the components to the local interface.  The following configuration file restricts HTTP [management interface] and AMQP services to the localhost and informs the RabbitMQ application that it should find the Erlang kernel at the address 127.0.0.1.
[
  {mnesia, [{dump_log_write_threshold, 1000}]},
  {kernel,[{inet_dist_use_interface,{127,0,0,1}}]},
  {rabbit, [{tcp_listeners, [{"127.0.0.1", 5672}]}]},
  {rabbitmq_management,  [ {http_log_dir,   "/tmp/rabbit-mgmt"} ] },
  {rabbitmq_management_agent, [ {force_fine_statistics, true} ] },
  {rabbitmq_mochiweb, [ {listeners, [{mgmt, [{port, 55672},
                                             {ip, "127.0.0.1"}]}]},
                        {default_listener, [{port, 60000} ] } ] }
 ].
Always modify the configuration file when the RabbitMQ service is shutdown.  A botched configuration file can render the broker unable to shutdown properly leaving you to have to manually kill the processes old-school.

With the RABBITMQ_NODENAME defined in the services file you will either need to add that same variable to the administrator's and application's environment or specify the node name when attempting to connect to or manage the RabbitMQ broker service [your application probably already refers to a configured broker, but you'll certainly have to deal with this when using the rabbitmqclt command].

Now the service should start:
service rabbitmq-server start
The broker service should now be running and you can see the components' open TCP connections using the netstat command.  The management interface should also be available on TCP/55672 [via your browser of choice] unless you specified an alternative port in the rabbitmq.config file.

linux-nysu:/etc/init.d # netstat --listen --tcp --numeric --program
Active Internet connections (only servers)
Proto Local Address    Foreign Address State  PID/Program name  
tcp   127.0.0.1:5672   0.0.0.0:*       LISTEN 23180/beam.smp     
tcp   127.0.0.1:60712  0.0.0.0:*       LISTEN 23180/beam.smp     
tcp   127.0.0.1:4369   0.0.0.0:*       LISTEN 22681/epmd         
tcp   127.0.0.1:55672  0.0.0.0:*       LISTEN 23180/beam.smp     
Now you probably want to do some configuration and provisioning using the rabbitmqctl command; but your RabbitMQ instance is up and running.

2012-09-24

Deduplicating with group_by, func.min, and having

You have a text file with four million records and you want to load this data into a table in an SQLite database.  But some of these records are duplicates (based on certain fields) and the file is not ordered.  Due to the size of the data loading the entire file into memory doesn't work very well.  And due to the number of records doing a check-at-insert when loading the data is also prohibitively slow.  But what does work pretty well is just to load all the data and then deduplicate it.   Having an auto-increment record id is what makes this possible.

scratch_base = declarative_base( )

class VendorSKU(scratch_base):
   
    __tablename__ = 'sku'
    id      = Column(Integer, primary_key=True, autoincrement=True)
    ...
   
Once all the data gets loaded into the table the deduplication is straight-forward using minimum and group by.

        query = scratch.query( func.min( VendorCross.id ),
                               VendorCross.sku,
                               VendorCross.oem,
                               VendorCross.part ).\
                    filter( VendorCross.source == source ).\
                    group_by( VendorCross.sku,
                              VendorCross.oem,
                              VendorCross.part ).\

                    having( func.count( VendorCross.id ) > 1 )
        counter = 0
        for ( id, sku, oem, part ) in query.all( ):
            counter += 1
            scratch.query( VendorCorss ).\
                filter( and_( VendorCross.source == source,
                              VendorCross.sku == sku,
                              VendorCross.oem == oem,
                              VendorCross.part == part,
                              VendorCross.id != id ) ).delete( )
            if not ( counter % 1000 ):
                scratch.commit( )
        scratch.commit( )

       
This incantation removes all the records from each group except for the one with the lowest id.  The trick for good performance is to batch many deletes into each transaction - only commit every so many [in this case 1,000] groups processed; just also remember to commit at the end to catch the deletes from the last iteration.

2012-09-22

Recommended GNOME3 Extensions

I'm a GNOME3 / GNOME Shell user, and a big time fan of this elegant new desktop environment.  Gone are the clumsy panels, task bars, and the applets.  GNOME Shell replaces all that with an extension system that allows developers to extend and modify the working environment using only JavaScript and CSS [leave your compiler at home].  Extensions can be installed on the fly, and enable or disabled at will.  Extensions can be browsed and installed just by visiting the extensions.gnome.org website with your Epiphany or Firefox web browser. 

This is a list of extensions I find most useful.

Tracker Search
Tracker is an efficient and fast desktop search engine.  Open Source desktop search experienced a painful set-back when faux Open Source advocates ignorantly crusading against the Mono project bludgeoned the reputation of the Beagle project based on a few bugs experienced in early releases [as if every project and product doesn't have those].  Tracker stepped in to replace Beagle, and being implemented in C, avoided the ire of the trolls [or at least that set of trolls].  It has taken a l-o-n-g time for Tracker to match Beagle's level of awesome, but that day has arrived.  And to put this amazing little search engine work for you is the Tracker Search extension.  This extension adds search results derived from all your data to the search feature of Shell's overview mode;  you can see applications, recent items, and the top matches from your data all in one dynamic view.  This extension is like having your own personal secretary with a degree in library science - and who doesn't want that?

Disable Hot Corners
GNOME Shell features hot corners so that it can claim to support the hip new thing known as "gestures".  Gestures are an awful idea and impede usability.  This extension disables hot corners - win!.  If you have a keyboard you can get to overview mode using either Alt-F1 or the Windows key; what could be faster?  Nothing. If you do not have a keyboard you almost certainly are not doing anything productive anyway - go outside, get some exercise, make some friends who don't live in their mother's basement.

Journal
Zeitgeist is the activity hub of the Open Source desktop.  It correlates and records your activity and the data you access.  In conjunction with Tracker and the Tracker Search this provides a nearly full-fledged secretarial service.  Often times I can resume what I ended working on yesterday directly from the GNOME Activity Journal.  This extension adds all that knowledge and context to Shell's overview mode.

One-Click-Terminal
Frequently I just need to run something, or check something, and I to do so I need a terminal window.  This extension puts an icon on Shell's top bar that with a single click always gives me a shiny new shell.  Simple.

Advanced Settings in UserMenu
GNOME hackers haven't quite settled on where settings belong.  It appears that between all the various work environments that may just be an eternal question.  And people have strong opinions about it.  So GNOME Shell provides "System Settings" in the drop down menu.  But... a lot of settings aren't there.  Including the ability to enable and disable extensions.  This extension just adds an "Advanced Settings" option which shortcuts to the gnome-tweak-tool where numerous [officially unsupported] settings can be tweaked (hence the name).  In gnome-tweak-tool it is also possible to enable and disable extensions.  This extension just makes it faster to get to the tool.  Once you have things the way you really want you won't use it much, but getting to that point you'll possibly be searching for and running gnome-tweak-tool on a regular basis.

Dash to Dock

This extension makes the dash [the dashboard for launching favorite applications] a bit more like a dock or toolbar.  The dash will stick around, even when not in overview, until a window presses it out of the way.  Sometimes it is a bit too sticky but most of the time it works as expected.  The best part of the modified behavior is that every time you navigate to a new [empty] workspace the dash is ready and waiting for you to summon some applications.

Connection Manager

This great little extension drops a new drop-down menu into the top bar of the Shell from which you can create, via one-click, a new SSH session from a predefined lists of hosts.  And there is not froggin' about in a configuration file to setup the hosts - the extension provides a handy configuration dialog to add and remove host entries.  It even integrates with the GNOME Terminal profiles so that you can select what profile you'd like for each SSH host entry.  This is a must-have for the beleaguered system adminstrator.

2012-09-21

Changing FAT Labels

I use a lot of SD cards and USB thumb-drives; when plugging in these devices automount in /media as either the file-system label (if set) or some arbitrary thing like "/media/disk46". So how can one modify or set the label on an existing FAT filesystem? Easy as:

# mlabel -i /dev/mmcblk0p1 -s ::WMMI06
Volume has no label
# mlabel -i /dev/mmcblk0p1 ::WMMI06
# mlabel -i /dev/mmcblk0p1 -s ::
Volume label is WMMI06

# mlabel -i /dev/sdb1 -s ::
Volume label is Cruzer
# mlabel -i /dev/sdb1 ::DataCruzer
# mlabel -i /dev/sdb1 -s ::
Volume label is DataCruzer (abbr=DATACRUZER )

mlabel is provided in the mtools package. Since we don't have a drive letter the "::" is used to defer to the actual device specified using the "-i" directive. The "-s" directive means show, otherwise the command attempts to set the label to the value immediately following (no space!) the drive designation [default behavior is to set, not show].

2012-09-20

If a record exists

A common action when synchronizing data between some source and a database is to check if such-and-such record already exists and needs to be updated or if a new record needs to be created.  The SQLAlchemy's one() method [of the query object] provides an easy way to check to see if such-and-such record exists;  but it doesn't return either an ORM object or None - if no record is found it raises an exception.  This is surprising at first as x=do;if-not-x is possibly the most common of all Python constructs.  The corresponding SQLAlchemy construct is just to catch the NoResultFound exception.

from sqlalchemy import and_
from sqlalchemy.orm.exc import NoResultFound

try:
    db.query( VendorCross ).\
        filter( and_( VendorCross.vendor_code == vendor_code,
                      VendorCross.oem_code = oem_code,
                      VendorCross.oem_partcode = oem_partcode ).one( )
except NoResultFound:
    # no such record exists
else:
    # record exists