Active File

A Scripting Pattern

Problem

How can we save data from a scripted application in a form which is easy to read and parse in both the scripting language and other languages?

Context

A script or component needs to write its internal data structures into a file and read them back later.

Forces

  1. Data must be translated between various internal data structures and a clean and readable file format.
  2. Files must often be read by other programs written in both the scripting language and other languages
  3. Writing code to parse complex data formats is tedious.
  4. Regular expressions are powerful parsing tools but are often hard to read and maintain, especially for people for whom programming is not their primary occupation.
  5. Sometimes humans need to edit data files by hand.

Solution

Most scripting languages provide meta-programming capabilities in which code can be treated as data, allowing a script to dynamically build and execute new scripts. By using meta-programming "back-to-front" - treating data as code - a script can write its internal data structures into a file as a script using script commands to describe the structure of the data. An application that wants to read the data file must define the commands that are used in the file such that they build internal data structures. When the data file is executed, the commands are invoked and the internal data structures are recreated.

Consequences

The simple syntax of most scripting languages means it is relatively easy to parse such files in other languages. Alternatively, one can embed a script interpreter into a program to process these data files.

There is a danger that the file-format can be too tightly coupled to the implementation of a particular application. Make sure that the commands are an abstraction of the data held in the file and do not expose any implementation details.

Because an Active File is executed by programs which process the file, it is important to take security considerations into account when deciding to whether or not to use this pattern. In general, it is dangerous to allow multiple users to read from and write to Active Files because a malicious user would be able to write arbitrary script commands into a file to be run by other users, probably without their knowledge. However, the pattern is safe when files that are only used by individual users or are written by a trusted user and read by other users. Users must use operating-system security mechanisms to ensure that files created using this idiom are secure. Alternatively, a program could execute an Active File in a safe interpreter that has restricted capabilities and so can trap operations that are potentially unsafe.

Known Uses

The Object-Tcl library links Tcl and C++ and allows C++ objects to be controlled by Tcl scripts. The programmer defines the interface of the C++ objects to be scripted using a class definition language (CDL). This CDL is actually a Tcl script that the CDL compiler evaluates to collect information about the C++ class before generating code that implements Tcl commands to create and invoke objects of that class.

The TaskSpace todo-list program stores the user's tasks as a script which is evaluated when the program starts to recreate the last saved state of the program.

Steve Crane's 8-bit Microcomputer Emulator uses Active Files to store configurations of emulated devices. Configurations are saved as scripts that are sourced by the Tcl interpreter to rebuild the configuration.

Example

Here's the Tcl code for file input/output from my TaskSpace program. I think it is a good illustration of how this pattern simplifies file parsing.

# Global variables used by the program

...
set todo(file)		~/.tasks
...

# Info about each task is stored in the tasks array.  Element tasks(list) 
# contains a list of all tasks, while elements of form info($task,???) store
# information about individual tasks

set tasks(list) {}

...


# Create new tasks - this proc is called in response to user input
#                    and when tasks are read in from the task file

proc CreateTask {mnemonic urgency importance description} {
    ...
}


# File I/O

proc ReadFile {} {
    global todo
    
    if [file readable $todo(file)] {
        source $todo(file)
    }
}

proc SaveFile {} {
    global todo tasks
    
    set stream [open $todo(file) w]
    
    foreach t $tasks(list) {
        puts $stream "CreateTask [list $tasks($t,short) $tasks($t,urgency) \
        	$tasks($t,importance) $tasks($t,description)]"
    }
    
    close $stream
}

Nat Pryce (np2@doc.ic.ac.uk)