Category Archives: Software

Securely Sharing State Between Instances Of Unrelated JavaScript Classes

Securely sharing state between instances of unrelated JavaScript classes can easily be accomplished by using a simple variation on the sub-class pattern of the “insider properties” model.

Designate one of your classes to handle the role of the “base” class. This class is responsible for creating the shared state object and controlling distribution of access. This class should use the standard “base class” pattern.

All other classes should implement a variation on the standard “sub-class” pattern that accepts (and stores) a base-class instance-reference and uses that for access instead of this.

The variation looks like this:

import { Base } from './insider-trusted.js';

export const Partner = (() => {
    const cls = Object.freeze(class Partner { // ** Unrelated to Base **
        static #insiderBaton;
        #baseInstance; // Reference instance (instead of `this`)
        #insider;

        constructor (baseInstance) {
            // Accept base instance parameter instead of using `this`
            this.#baseInstance = baseInstance;
            Base._getInsider(cls, baseInstance, () => this.#insider = cls.#insiderBaton);
            // Insider properties (this.#insider.prop) now available here
        }

        // Standard handoff-pattern class-method
        static _passInsider (insider, receiver) {
            this.#insiderBaton = insider;
            receiver();
            this.#insiderBaton = null;
        }
    });
    Object.freeze(cls.prototype);
    return cls;
})();

Make sure the Partner class is included in the trust configuration (represented by the insider-trusted.js file in this example).

Accessing #insider For Other Instances

The #insider state of other instances can be access in one of two ways (via another Partner object or another Base object):

export const cls = /* ... */ class Partner {
    // ...
    #getInsiderFor (other) {
        // Native JS cross-instance private #insider access
        if (other instanceof cls) return other.#insider;

        // Cross-class cross-instance #insider access
        if (other instanceof Base) {
            let insider;
            Base._getInsider(cls, other, () => insider = cls.#insiderBaton);
            return insider;
        }
    }
} // ...

Resources

This code is also available on GitHub at https://github.com/bkatzung/insider-js.

Related

JavaScript Object Property Encapsulation: Beyond Public, Protected, And Private (Insider Properties)

JavaScript Object Property Encapsulation: Beyond Public, Protected, And Private

Background

JavaScript object properties are public by default.

Since ECMAScript (ES) 2020, there is native support for private properties (and methods) using the hash (#) prefix.

In my previous post, I demonstrate a way to implement natively-enforced (not based on the underscore (_) convention) protected-style properties that doesn’t require shared lexical scope or public accessors (i.e. it behaves very much like you would expect a native implementation to).

The historical way to simulate protected properties was to use a lexically-scoped WeakMap of instances to properties (or, more efficiently, to a protected-state object).

The way protected properties work, all sub-classes automatically get access to all protected properties of the current object instance. A method also gains access to other object instances if the other instance is at least as derived in the same class hierarchy (i.e. a method defined in class C has access to other instances if they are instanceof C).

The “legacy” lexically-scoped WeakMap approach is more like trusted-class access. A method at any class level can access any type of instance stored in the WeakMap, even it’s a less-or-differently-derived instance.

What if you want the trusted-class capability, but without the lexically-shared-scope limitation (having to have classes appear within the same source file in order to share data)?

Enter “insider” properties.

“Insider” Properties

The insider-properties model allows you to share private state objects across trusted classes. The effect is similar to sharing access to a WeakMap, but classes can be in separate files without publicly-usable accessors for access distribution.

This model has three types of components:

  1. A trust component, which defines the scope of trust
  2. A base-class component, which creates the shared state and controls distribution
  3. One or more sub-class components, which request and cache access to the shared state

The Trust Component

The trust component is a stand-alone file that contains the “trust configuration”. It is also a JavaScript “barrel file” that exports the related base and trusted sub-classes.

This approach groups trusted classes in a way that can be deployment/application-specific without modifying the class files themselves in any way in addition to insuring trusted-version consistency and proper dependency-loading resolution.

Here is the trust-component pattern:

// "Barrel" bundling of base + trusted sub-classes
import { Base } from './insider-base.js';
import { Sub } from './insider-sub.js';
export { Base, Sub };

let trusted;

// Return the trusted sub-class list
// (but not before the classes are initialized)
export const getTrusted = () => {
    // Adjust the trusted-class array below as required
    trusted ||= Object.freeze([Sub]);
    return trusted;
};

Base-Class Component

In addition to your core base-class behavior, the base class is also responsible for creating the initial, per-instance #insider state object and controlling distribution of access to trusted sub-classes (which will store a reference to the same shared object in their own #insider private fields).

Here is the base-component pattern:

import { getTrusted } from './insider-trusted.js';

export const Base = (() => {
    const cls = Object.freeze(class Base {
        static #trusted; // Base-class cached trusted-sub-class array
        static #insiderBaton = null; // Per-class handoff baton
        #insider = { /* insider: true /* Instance insider properties */ };

        // Base-class constructor
        constructor () {
            // Cache trusted sub-class list upon first instantiation
            cls.#trusted ||= getTrusted();
            // Insider instance properties (this.#insider.prop) are ready here
        }

        /*
         * Base-class-only class-method to pass #insider access
         * @param {Class} reqCls - The requesting method's class (for proper handoff)
         * @param {Object} instance - The instance whose #insider is requested
         * @param {Function} receiver - Baton-receiver/instance-#insider-setter function
         */
        static _getInsider (reqCls, instance, receiver) {
            // Request must be for a class on the trusted list
            if (!cls.#trusted.includes(reqCls)) throw new Error('Untrusted request');
            // Make sure the handoff class-method is a frozen function
            const passProps = Object.getOwnPropertyDescriptor(reqCls, '_passInsider');
            if (typeof passProps.value !== 'function' || passProps.writable !== false || passProps.configurable !== false) throw new Error('Unsafe handoff');
            // Use the supplied class-level handoff method to pass #insider to the receiver
            reqCls._passInsider(instance.#insider, receiver);
        }
    });
    Object.freeze(cls.prototype);
    return cls;
})();

Sub-Class Components

Sub-class constructors are responsible for requesting insider property access and caching it in its private #insider field for use within each sub-class level.

They do this by calling a known, frozen, class-method of a known class (the base class), passing it their class object, the instance for which #insider properties are desired, and a “baton receiver function”.

The base class passes the #insider state object and the supplied receiver function to a known, frozen, “handoff” class-method of the specified class. The handoff function uses a class-specific baton (every trusted sub-class must have one), which the receiver function (as an instance method of the same class) is able to receive.

The #insider state of other instances can be received using the same mechanism, but by supplying an instance other than this and a receiver function that sets something other than this.#insider (typically a variable in the requesting method’s local scope).

The sub-class pattern looks like this:

 import { Base } from './insider-trusted.js';

 export const Sub = (() => {
    const cls = Object.freeze(class Sub extends Base {
        static #insiderBaton = null; // Handoff baton (in every sub-class)
        #insider; // Per-class-level private view of shared #insider state

        constructor () {
            super();
            Base._getInsider(cls, this, () => this.#insider = cls.#insiderBaton);
            // Insider instance properties (this.#insider.prop) are ready here
        }

        /*
         * Baton handoff function (all sub-classes); called by Base._getInsider
         * @param {Object} insider - The requested instances #insider
         * @param {Function} receiver - The receiver function to call
         */
        static _passInsider (insider, receiver) {
            cls.#insiderBaton = insider;
            receiver(); // Receiver must be a cls method to accept the baton
            cls.#insiderBaton = null;
        }
    });
    Object.freeze(cls.prototype);
    return cls;
})();

Accessing Another Instance’s #insider

There are two ways to access another instance’s #insider.

The first method uses native JavaScript cross-instance, direct private-field access (e.g. otherInstance.#insider). This only works if the other instance is at least as derived as the accessing method’s class, in the same class hierarchy. This will always work from base-class methods, since trusted sub-classes are derived from the base class.

The second method uses Base._getInsider with a custom receiver function. Since it’s invoking a base-class method, it can be used between any instances (as long as the requesting method is in one of the trusted sub-classes).

        // Two ways to get another instance's #insider
        #getOtherInsider (other) {
            // #1: Only works if other instanceof cls (at least as derived in same hierarchy)
            // Should always work from Base
            const insider1 = other.#insider;
            // return insider1;

            // #2: Works for any instance (if called from a trusted-sub-class method)
            let insider2;
            Base._getInsider(cls, other, () => insider2 = cls.#insiderBaton);
            // return insider2;
        }

Security

Most forms of type checking in JavaScript, including those based on an object’s constructor or new.target can be misdirected through code manipulation. Private element (#) access is managed directly at the JavaScript-engine level, however, and therefore does not have that problem. This model leverages that mechanism for verifying that only methods of actually-trusted classes can gain access.

Base._getInsider will only ever pass #insider via a pre-determined method on a class it is configured to trust. A method in an untrusted class has two options:

  1. Follow the pattern, creating its own handoff function and supplying its own class to Base._getInsider
  2. “Lie”, passing a trusted class to Base._getInsider instead

In the first case, the class will not be on the trusted list, so Base._getInsider will throw an error without even attempting to distribute access. This result will be typical for trust misconfiguration (a sub-class that should be trusted wasn’t added to the trust configuration, or the wrong trust configuration is being loaded).

In the second case, Base._getInsider will distribute #insider access to the specified (trusted class) baton, but the requesting method, being of a different class, will have no access to the trusted-class baton. This might happen as the result of malicious code, or if “hard-wired” class names are being used instead of the boilerplate approach in the pattern as documented.

The patterns here include code to freeze class and prototype objects. This is largely performative (only protecting against some types of coding mistakes, as opposed to actual attacks), however, unless you own the execution context and can ensure that Object.freeze(Object) is run before any untrusted code executes.

Resources

The code is also available on GitHub at https://github.com/bkatzung/insider-js.

Related

Implementing Cross-File JavaScript Protected Properties And Methods
Securely Sharing State Between Instances Of Unrelated JavaScript Classes

Implementing Cross-File JavaScript Protected Properties And Methods

Background

It’s often desirable to be able to control the visibility of an object’s properties. Sometimes it’s convenient for an object’s properties to be publicly accessible, sometimes base-classes and derived-classes need to share access, and sometimes you don’t want to allow any access outside of the defining class.

Many languages, including the TypeScript derivative of JavaScript, include access control keywords such as public, protected, and private for this. The options available natively within JavaScript are more limited (and TypeScript’s protections cannot protect against non-TypeScript-generated JavaScript).

JavaScript supports public object properties (the default), an unenforced convention of using an underscore (_) prefix before protected/private properties, and, since ECMAScript 2022, “private elements” (fields, methods, properties) via hash (#) name-prefixes.

Private elements are accessible by class. Any code within the defining class can access the private elements of any instance of that class. Private-element names must be unique across all of the private elements within a class, but are available for reuse in other classes. Code does not have access to the private elements of other classes within the same class hierarchy.

Intended Scope

The goal of this implementation is to make data accessible to all of the methods within an instance’s class hierarchy, and inaccessible (except via class-provided interfaces) to all other code.

class A {}
const a1 = new A(), a2 = new A();
class B extends A {}
const b1 = new B(), b2 = new B();
class C extends B {}
const c1 = new C(), c2 = new C();
class D {}
const d1 = new D();
function f () {}
  • a1 and a2 will have access to each other’s protected properties
  • Class A methods of b1 and b2 will have access to a1, a2, b1, and b2 (all instanceof A) protected properties
  • Class B methods of b1 and b2 will have access to b1 and b2 (instanceof B) but not to a1 or a2 protected properties
  • Class A methods of c1 and c2 will have access to a1, a2, b1, b2, c1, and c2 (all instanceof A) protected properties
  • Class B methods of c1 and c2 will have access to a1, a2, b1, and b2 (all instanceof B), but not c1 or c2 protected properties
  • Class C methods of c1 and c2 will have access to c1 and c2 (instanceof C) but not to a1, a2, b1, or b2 protected properties
  • d1, d2, and f will not have access to a1, a2, b1, b2, c1, or c2 protected data

Notice that you cannot gain additional access to an existing instance of an existing type by creating a new sub-class with additional methods (the additional methods only have access to its own instances or instances of its own sub-classes).

Historical Approach: Lexical Scoping

One historical approach for storing protected (and before ES2022, private) properties is through the use of scoped storage, like this (note that I am using “guarded” instead of “protected” to avoid TypeScript (and possibly future JavaScript) keyword confusion):

const guardedMap = new WeakMap(); // <instance, protectedProperties>

class A {
  #guarded; // Class A private cached lookup result

  constructor () {
    const guarded = this.#guarded = { /* protected properties here */ };
    guardedMap.set(this, guarded);
    // Public properties: this.prop
    // Protected properties: this.#guarded.prop (or guarded.prop)
    // Private properties: this.#prop
  }
}

class B extends A { // In the same source file
  #guarded; // Class B private cached lookup result

  constructor () {
    super();
    const guarded = this.#guarded = guardedMap.get(this);
    // The same protected properties are now visible in class B methods
  }
}

However, requiring all of the related classes for a hierarchy to exist within a single file is often impractical for a number of reasons (file size, different authors, different development timeframes, etc).

You can include accessor methods in the base class to allow sub-classes in other files to gain access, but then there’s nothing to prevent code outside of the class hierarchy from using the accessors to gain access too.

Fortunately, with just a bit more effort, we can use a more tightly-controlled approach.

Goals For A Better Implementation

  • Related classes within a class hierarchy must have shared access to protected properties
  • Related classes must not need to reside within the same source file (i.e. support multiple lexical scopes)
  • Access from outside the class hierarchy should be prevented at the language level
  • Protected properties should be available for use as soon as possible during object construction
  • Avoid TypeScript (and maybe future JavaScript?) “protected” keyword confusion (I’ll continue to use “guarded” instead)
  • Some form of protected methods (methods that can only be invoked from within the class hierarchy)
  • Note: This implementation does not include generating nested protected scopes (a single protected scope will be shared across the class hierarchy)

“Threaded-Access” Strategy

Let’s use a different solution (one that doesn’t risk public access) by “threading” access between classes in the hierarchy via a common method defined in each class level, with each method invoking the next using “super“. Conceptually, the approach looks something like this:

// ** CONCEPT ONLY - CODE WILL NOT WORK **

class A {
  #guarded; // Class A private access to shared protected properties

  constructor () {
    const guarded = { /* protected properties here */ };
    this._setGuarded(guarded); // Initiate #guarded threading at target class
    // #guarded is now threaded; constructor returns to sub-class constructors
  }

  _setGuarded (guarded) { // Final stop for #guarded threading
    if (!this.#guarded) { // Only set once (during construction)
      this.#guarded = guarded;
    }
  }
}

class B extends A { // Can be in a different file
  #guarded; // Class B private access to same shared protected properties

  constructor () {
    super();
    // this.#guarded is now threaded and available for use
  }

  _setGuarded (guarded) {
    if (!this.#guarded) {
      this.#guarded = guarded;
      super._setGuarded(guarded); // Thread access up through the class hierarchy
    }
  }
}

The code above won’t work as-is, because private elements aren’t associated with the object until after the super call has completed. In this specific example, the B class this.#guarded does not yet exist at the time the B class _setGuarded is called from the A constructor, because the A constructor has not yet returned to the B constructor.

We can get around that problem by using a subscription-based, “pull model” that operates strictly within the class hierarchy. Working protected state also provides a way to offer pseudo-protected methods that can only be invoked from within the class hierarchy. The details are covered in the following section.

Cross-File, “Threaded” JavaScript Protected Properties
(Final Implementation)

// ** File 1 **

export class A { // Base class
  #guarded; // Class A private access to shared protected properties
  #guardedSubs = new Set(); // Protected-property subscriptions (setter functions)

  constructor () {
    const guarded = this.#guarded = { /* protected properties */ };
    this._subGuarded(this.#guardedSubs); // Invite subscribers
    // Public props: this.prop
    // Protected props: this.#guarded.prop (or guarded.prop)
    // Private props: this.#prop
  }

  // Distribute protected property access to ready subscribers
  // (base instance method)
  _getGuarded () {
    const guarded = this.#guarded, subs = this.#guardedSubs;
    try {
      for (const sub of subs) {
        sub(guarded); // Attempt guarded distribution to subscriber
        subs.delete(sub); // Remove successfully-completed subscriptions
      }
    }
    catch (_) { }
  }    

  // Optional base-class stub for sub-class interface consistency
  _subGuarded () { }

  method () { // Example consumer
    const guarded = this.#guarded;
    // Public props: this.prop
    // Protected props: this.#guarded.prop (or guarded.prop)
    // Private props: this.#prop
  }

  // A pseudo-protected (publicly visible, but access-controlled) method
  // Callers must supply their private #guarded to authenticate
  guardedMethod (guarded) {
    if (guarded !== this.#guarded) throw new Error('Unauthorized method call');
    // Caller is now confirmed to be in the class hierarchy for this instance
  }
}

// ** File 2 **

// import { A } from '...';

class B extends A {
  #guarded; // Class B private access to same shared protected properties

  constructor () {
    super();
    this._getGuarded(); // Obtain protected property access
    // this.#guarded is now populated and available for use here
    const guarded = this.#guarded;
  }

  _subGuarded (subs) { // Subscribe to protected properties
    super._subGuarded(subs); // Optional if super-class is the base
    subs.add((g) => this.#guarded ||= g); // Set this.#guarded once
  }

  // A method within any class in the hierarchy can call a
  // pseudo-protected method on its own instance (all classes
  // in the hierarchy see the same #guarded)
  callGuardedMethod () {
    const guarded = this.#guarded;
    this.guardedMethod(guarded);
  }

  // A method can also call a pseudo-protected method on another
  // instance if the called instance is instanceof the calling
  // method's class (i.e. at least as derived in the same hierarchy)
  callOtherGuardedMethod (other) {
    try {
      other.guardedMethod(other.#guarded);
    } catch (_err) {
      // TypeError thrown if there is no <this class>-level #guarded
    }
  }
}

Explanation: Protected Properties

During construction, the base class invites sub-classes to subscribe to receive the protected properties (base #guarded object). Only classes in the hierarchy receive the invitation (it’s never externally accessible).

Classes wanting protected-property access respond to the invitation (they subscribe) by calling the super-method and then adding a setter function (which accepts a protected-properties object and sets their private #guarded) to the subscription-set (subs) passed to _subGuarded.

Important: The super-method (super._subGuarded(subs)) must be called before adding the setter function to the subscription-set so that setter functions get added in least-derived-class-to-most-derived-class (i.e. top-to-bottom) order.

Each sub-class constructor calls this._getGuarded() to set its private this.#guarded to the shared protected-protected properties object. This works by attempting to execute each setter function in the subscription-set collected by the base-class constructor. A setter will complete (and be removed from the subscription-set) only if the associated class has returned from its constructor’s super() call.

In any class in which the super() call has not yet returned, attempting to set its this.#guarded in its setter function will throw an exception (with the side effect of leaving the setter function in the subscription-set to be attempted again in a subsequent call).

The net effect is that the private this.#guarded gets set, class-by-class, right after each super() call completes.

The setter-function subscriptions are idempotent. It’s possible to recreate the subscription-set by calling _subGuarded post-construction and run all the setter functions again (attempting to set a different protected-properties object), but as the setters have already set each this.#guarded during construction, running them again has no effect.

Explanation: Pseudo-Protected Methods

Once we have successfully created protected state (visible only to code within the class hierarchy), we can use that as a form of authentication token for “pseudo-protected” methods.

These are publicly visible methods (so not truly protected in the traditional sense) that throw an exception (or return a different, presumably innocuous, value) if not called from within the class hierarchy.

Since the calling code and the called code both independently know the non-public this.#guarded value (object) from the construction phase, the caller simply needs to pass it as a parameter. The called code compares the passed value to it’s own this.#guarded, confirming the caller is within the class hierarchy if they match.

Note that an instance cannot cross-instance call a pseudo-protected method on a less-derived instance than the calling method’s class. Given instances:

const a = new A(), b = new B(); // where B extends A

a can call protected methods on b (and vice-versa for A-class methods of b) because a and b both have an A-level #guarded. B-class methods of b, however, cannot call protected methods on a because there is no B-level #guarded for a.

If you need “real” protected methods, you can add bound methods/functions to the protected state so that they’re only visible within the class hierarchy, but these would need to be bound and added per instance.

Resources

This code is also available on GitHub at https://github.com/bkatzung/protected-js.

Related

JavaScript Object Property Encapsulation: Beyond Public, Protected, And Private

JavaScript Pattern For Deferred/Just-In-Time Subclass Prototype Initialization

Impetus And Use Case

The impetus for this code pattern is to be able to support class hierarchies spanning multiple “async defer“-loaded JavaScript files.

Developing A Solution

A typical SubClass.prototype = new SuperClass or Object.create(SuperClass) won’t work because a super-class may not have finished loading when a subclass is defined.

To avoid order-of-execution issues, just-in-time initialization of the prototype is performed upon the first constructor invocation. The prototype of the default new instance is already bound by the time the constructor function executes, so the constructor function must return a new “new” instance after switching prototypes.

The constructor calls a helper class-method, $c, to perform the just-in-time initialization. This method replaces itself during initialization to prevent reinitialization in subsequent calls.

Both versions of the helper method call a second helper method, $i, to (potentially) perform instance initialization. This method is registered as both an instance method (for polymorphic invocation) and a class method (as a shortcut for prototype.$i, to facilitate super-initialization in subclasses).

To prevent any initialization of instances for subclass prototypes and duplicate initialization of replacement new objects, the constructor accepts its class object as a sentinel value to indicate that no initialization should be applied.

When the sentinel value is supplied to the constructor, the single parameter false is passed from the constructor to $c and from $c to $i. Otherwise, the constructor’s arguments object is passed as the only parameter instead.

Sample Pseudo-Trace

Here’s a simplified view of what the flow of execution might look like creating the first instance of a subclass using a previously initialized super-class for its prototype.

instance = new Sub(...parameters) // Initial super is Object
  Sub.$c_1x(Arguments [...parameters])
    Sub.prototype = new Super(Super)
      Super.$c(false)
        Super.$i(false)
    new Sub(Sub) // New super is Super
      Sub.$c(false)
        Sub.$i(false)
          Super.$i(false)
    Sub.$i(Arguments [...parameters])
      Super.$i(Arguments [...parameters])

Code Pattern

function Class () {
    var c = Class;
    return c.$c.call(this, arguments[0] !== c && arguments);
}
Class.$c = function (args) { // JIT 1x class init helper
    // var c = Class, p = c.prototype; // "Base" classes (Object subclasses)
    var s = SuperClass, c = Class, p = c.prototype = new s(s); // Subclasses
    p.constructor = c; // Subclasses
    c.$c = function (args) { return this.$i(args); }; // Post-init helper
    p.$i = c.$i = function (args) { // Instance init helper
        s.$i.call(this, args); // Subclasses
        if (!args) return this; // Skip init on false
        // Add this.properties here
        return this;
    };
    // Add p.methods here
    // return this.$i(args); // Base classes (original prototype)
    /*
     * We need to return a new "new" to pick up the new subclass prototype.
     * Note that new c(c) invokes $c(false) which invokes $i(false)
     * before returning here for (possible) initialization.
     */
    return new c(c).$i(args); // Subclasses
};

Ruby Sub-Classes/Inheritance, Include, And Extend

Overview

Ruby Objects, Modules, and Classes

  • In Ruby, an object is a collection of (zero or more) instance variables. It also has a class (see below) and possibly a lazily-created singleton class to hold object-specified methods.
  • A module is an object containing a collection of (zero or more) constants, class variables, instance methods, and included modules. You can include a module in another module and you can extend most objects with a module. Since Ruby 2, you can also prepend a module to a module.
    # Parts of a module
    CONSTANT = "I'm a constant"
    @@class_var = "I'm a class variable"
    @class_inst_var = "I'm a class instance variable" # in a class/module definition
    def self.method; "I'm a class method"; end
    class << self
      def another_method; "I'm a class method too"; end
    end
    def method
      @inst_var = "I'm an instance variable" # inside an instance method
      "I'm an instance method"
    end
  • A class is sub-class of module.
    • Each class has a parent class called a super-class. The child class is called a sub-class. The class inherits the behaviors of the super-class. New classes are sub-classes of the Object class unless you specify otherwise.
    • Classes can typically be instantiated via the new method.
    • Classes are not valid parameters for include or extend.
  • A “def method” adds a method to the “currently open” class or module. A “def object.method” adds a method to the singleton class for the object.
  • When you include a module (let’s call it M1) in another module (let’s call it M2), M1’s constants and instance methods become visible in M2 (as constants and instance methods), and M1 will appear in M2’s included_modules list. M1’s class methods are not added to M2 (but see Including Class Methods below).
  • When you extend an object with a module, the module’s instance methods are added to the object via an automatically-generated anonymous super-class of the singleton class (one for each extending module). In the case where the extended object is a module, the added methods are class methods, not instance methods. The object is unaffected by the module’s constants or class methods.

Confirming The Effects Of include And extend In Modules

The following program can be used to see the affect of using include and extend in modules (and classes):

module Inner
    INNER = "Inner constant"
    def self.inner_cm; "Inner class method"; end
    def inner_im; "Inner instance method"; end
end

module Outer
    include Inner;
    OUTER = "Outer constant"
    def self.outer_cm; "Outer class method"; end
    def outer_im; "Outer instance method"; end
end

module Extension
    EXT = "Extension constant"
    def self.ext_cm; "Extension class method"; end
    def ext_im; "Extension instance method"; end
end

class MyClass; include Outer; extend Extension; end

puts "Constants: " +
    (MyClass.constants(true) - Object.constants(true)).inspect
puts "Class methods: " + (MyClass.methods - Object.methods).inspect
puts "Instance methods: " +
  (MyClass.instance_methods - Object.instance_methods).inspect

The output is as follows:

Constants: [:OUTER, :INNER]
Class methods: [:ext_im]
Instance methods: [:outer_im, :inner_im]

Method Resolution Order

The following program can be used to show the class/module hierarchy and order of method resolution for sub-classing (inheritance), include, and extend:

module Mod1; def m; puts "Mod 1"; super; end; end
module Mod2; def m; puts "Mod 2"; super; end; end
module Mod3; def m; puts "Mod 3"; super; end; end
module Mod4; def m; puts "Mod 4"; super; end; end
module Mod5; def m; puts "Mod 5"; super; end; end
module Mod6; def m; puts "Mod 6"; super; end; end
class Base; def m; puts "Base"; end; end
class Sub < Base
    include Mod1, Mod2; include Mod3
    def m; puts "Sub"; super; end
end
o = Sub.new.extend(Mod4, Mod5).extend Mod6
puts "Sub ancestors: " + o.class.ancestors.inspect
o.m

Regrettably, the include and extend methods process their parameters from last to first, so you need to know that method resolution order is not simply last-to-first encountered when called with multiple modules. The output is as follows:

Sub ancestors: [Sub, Mod3, Mod1, Mod2, Base, Object, Kernel, BasicObject]
Mod 6
Mod 4
Mod 5
Sub
Mod 3
Mod 1
Mod 2
Base

Pictorially, it looks like this (with the number in parentheses indicating the search order):
Ruby extend/include/Sub-class Method Resolution Order

Including Class Methods

It is also possible to add class methods as part of an include or to add instance methods as part of an extend using the included or extended callbacks, respectively:

module Inc_Me
  def inst_m; end
  module ClassMethods; def class_m1; end; end
  def self.included (base)
    base.class_exec do
      extend ClassMethods     # method 1 - extend with named sub-module
      Module.new do           # method 2 - extend with anonymous module
        def class_m2; end
      end.tap { |mod| extend mod }
      def self.class_m3; end  # method 3 - add directly to the class
    end
  end
end

module Ext_Me
  def class_m; end            # instance method here, class there
  module InstanceMethods; def inst_m1; end; end
  def self.extended (base)
    base.class_exec do
      include InstanceMethods # method 1
      Module.new do           # method 2
        def inst_m2; end
      end.tap { |mod| include mod }
      def inst_m3; end        # method 3
    end
  end
end

module M1; include Inc_Me; end
puts "M1 class methods: " + (M1.methods - Object.methods).inspect
puts "M1 instance methods: " +
  (M1.instance_methods - Object.instance_methods).inspect
puts "M1 included modules: " + M1.included_modules.inspect, ''

module M2; extend Ext_Me; end
puts "M2 class methods: " + (M2.methods - Object.methods).inspect
puts "M2 instance methods: " +
  (M2.instance_methods - Object.instance_methods).inspect
puts "M2 included modules: " + M2.included_modules.inspect

which produces:

M1 class methods: [:class_m3, :class_m2, :class_m1]
M1 instance methods: [:inst_m]
M1 included modules: [Inc_Me]

M2 class methods: [:class_m]
M2 instance methods: [:inst_m3, :inst_m2, :inst_m1]
M2 included modules: [#<Module:0x00000000cbd108>, Ext_Me::InstanceMethods]

It is better to use the include-with-extend method (as in module Inc_Me) than the extend-with-include method (as in module Ext_Me), as the primary module name gets included in the included_modules list.

It is also better to extend a sub-class (methods 1 or 2) rather than adding the class methods directly (method 3), since the extended modules are each added to a separate, invisible super-class instead of to the including module itself. The benefit here is that the behaviors can be chained using super if desired, as shown by this code:

module Inc1
  module ClassMethods; def m1; puts "Inc1 m1"; super rescue nil; end; end
  def self.included (base)
    base.class_exec do
      extend ClassMethods
      Module.new do
        def m2; puts "Inc1 m2"; super rescue nil; end
      end.tap { |mod| extend mod }
      def self.m3; puts "Inc1 m3"; super rescue nil; end
    end
  end
end

module Inc2
  module ClassMethods; def m1; puts "Inc2 m1"; super rescue nil; end; end
  def self.included (base)
    base.class_exec do
      extend ClassMethods
      Module.new do
        def m2; puts "Inc2 m2"; super rescue nil; end
      end.tap { |mod| extend mod }
      def self.m3; puts "Inc2 m3"; super rescue nil; end
    end
  end
end

module M; include Inc2, Inc1; end
M.m1; M.m2; M.m3

which produces:

Inc2 m1
Inc1 m1
Inc2 m2
Inc1 m2
Inc2 m3

The included Callback And Nested Includes

If your module includes other modules, the included callbacks for the other modules (if present) will be called when they are included in your module, but not when your module is included elsewhere. This code shows the problem:

module M1
  CONST1 = 'M1 constant'
  module ClassMethods; def cm1; 'M1 class method'; end; end
  def im1; 'M1 instance method'; end
  def self.included (base)
    puts "#{self} included in #{base}"
    base.class_exec { extend ClassMethods }
  end
end

module M2
  include M1
  def self.included (base); puts "#{self} included in #{base}"; end
end

module M3; include M2; end

puts "M2 class methods: " + (M2.methods - Object.methods).inspect
puts M3::CONST1
puts "M3 class methods: " + (M3.methods - Object.methods).inspect
puts "M3 instance methods: " +
  (M3.instance_methods - Object.instance_methods).inspect

which produces:

M1 included in M2
M2 included in M3
M2 class methods: [:included, :cm1]
M1 constant
M3 class methods: []
M3 instance methods: [:im1]

The including module’s included callback should therefore call the included callback for any included modules if none of the base object’s ancestors have previously included the other modules:

def M2.included (base)
  puts "#{self} included in #{base}"
  M1.included base if M1.respond_to?(:included) &&
   (!base.respond_to?(:superclass) || !base.superclass.include?(M1))
end

which, after the change, produces:

M1 included in M2
M2 included in M3
M1 included in M3
M2 class methods: [:included, :cm1]
M1 constant
M3 class methods: [:cm1]
M3 instance methods: [:im1]

Download It

A Ruby gem (called extended_include) based on this posting is available at rubygems.org.

Ruby Gem Sarah Version 2.0.1 Released

Ruby Gem Sarah version 2.0.1 has just been released.

What Is It?

Sarah is a combination sequential array, sparse array, and (“random access”) hash.

Ruby’s own array literal and method calling syntaxes allow you to specify a list of sequential values followed by an either implicit or explicit hash of name/value pairs stored at end of the array. Sarah takes this concept a few steps further.

Values with sequential indexes beginning at 0 are typically stored in the sequential array for efficiency. You can also assign values with non-sequential indexes, and these values are stored in the sparse array (which is actually implemented as a hash). The sequential and sparse arrays work together like a traditional Ruby array, except that there can really be empty holes with no values (as opposed to having nil values as place-holders where no other value has been set in the case of a traditional Ruby array). You can perform most of the typical array operations, including pushing, popping, shifting, unshifting, and deleting. These result in the re-indexing of sparse values in addition to sequential values after the point of insertion or deletion, just as if they had all been stored in a traditional Ruby array.

Values stored with non-integer keys are stored in a separate “random access” (i.e. unordered) hash. Re-indexing of the sequential and sparse arrays does not affect these key/value pairs.

Instead of accessing sparse and random-access values through a hash at the end of the array first, these values all appear at the same level. Compare:

# Traditional Ruby array with implicit hash
a = ['first', 5 => 'second', :greeting => 'hello']
# a[0] = 'first'
# a[1] is a hash
# a[1][5] = 'second'
# a[1][:greeting] = 'hello'

# Using a Sarah
s = Sarah['first', 5 => 'second', :greeting => 'hello']
# s[0] = 'first'
# s[5] = 'second'
# s[:greeting] = 'hello'

Why Should I Use It?

Sarah provides a pure-Ruby sparse array implementation, and can easily be the basis for a pure-Ruby sparse matrix implementation. It also provides efficient linear storage and manipulation in case you don’t know in advance if your data will be sequential or sparse in nature (i.e. it can vary significantly based on user input).

By default, negative indexes are interpreted relative to the end of the array. However, if it’s appropriate to your problem domain, Sarah also has a mode that supports negative indexes as actual indexes. In this mode, insertions and deletions do not result in value re-indexing.

Ruby Gem XKeys Version 2.0.0 Released

Ruby Gem XKeys version 2.0.0 has just been released.

What Is It?

XKeys is a module that can be included in Ruby classes or used to extend Ruby objects to provide convenient handling of nested arrays or hashes, including Perl-like auto-vivification, PHP-like auto-indexing, and per-access default values.

Perl-Like Auto-Vivification For Ruby

A fairly common Ruby programming question, especially for current and former Perl programmers, is how to automatically generate intermediate nodes in nested array and hash structures.

Say, for example, that you want to keep some sort of running tally grouped by year, month, and day. In Perl, this is easily accomplished as follows:

my %tally; # top-level hash of tallies
# and later...
++$tally{$year}{$month}{$day}; # increment tally by year/month/day

Perl will automatically create nested arrays or hashes as you attempt to write to them. They just “spring to life” when you need them; the process is called auto-vivification.

In straight Ruby, implementing the example is more cumbersome…

tally = {} # top-level hash of tallies
# and later...
tally[year] ||= {} # make sure year hash exists
tally[year][month] ||= {} # make sure month hash exists
tally[year][month][day] ||= 0 # make sure day value exists
tally[year][month][day] += 1 # increment tally by year/month/day

Alternatively, you can provide a block of code to the top-level hash to create new hashes whenever a non-existent node is referenced, but they are created when reading (getting) the nested structure instead of when writing (setting) the nested structure, so you get new nodes even when you’re “just looking”.

Using the XKeys gem, the code becomes easier again:

require 'xkeys'
tally = {}.extend XKeys::Hash
# and later...
tally[year, month, day, :else => 0] += 1

The “:else” value is used when the value doesn’t exist yet (this avoids generating an error trying to add 1 to nil on the first tally of each day). Missing nodes are automatically added, but only on write, not on read.

PHP-Like Auto-Indexing For Ruby

PHP allows you to auto-index items being added to the end of an array by leaving the array subscript empty. For example:

$languages = array();
$languages[] = 'Perl'; # assigned to $languages[0]
$languages[] = 'PHP'; # assigned to $languages[1]
$languages[] = 'Ruby'; assigned to $languages[2]

XKeys allows you to do something similar using the symbol :[] with arrays or other types of containers supporting the #push method. This is called “push mode”. In Ruby using XKeys, it looks like this:

require 'xkeys'
languages = [].extend XKeys::Auto
languages[:[]] = 'Perl' # languages.push 'Perl' ==> languages[0]
languages[:[]] = 'PHP' # languages.push 'PHP' ==> languages[1]
languages[:[]] = 'Ruby' # languages.push 'Ruby' ==> languages[2]