ShellSort: A Multi-Level Array Sorting Method in 4D Code

  by Naranker Dulay,  nd@doc.ic.ac.uk,  December 1996
  

Introduction

  As seasoned 4D programmers will know, multi-level array sorts are not easy
  to achieve with 4D's SORT ARRAY command; they require that the programmer
  build, and then sort on, a string or text array that holds a concatenation
  of the sort keys. The key values need to be converted to string type, 
  padded to fixed character-widths and the value inverted if a descending 
  order is required (inverting real and date values is also somewhat tricky).
  The popular alternative is, of course, to buy a plug-in such as
  Foresight's DisplayList or DynArray both of which include a multi-level 
  array sorting method.
  
  This note presents yet another means of performing a fast multi-level 
  array sort, namely by the implementation of a multi-level sorting 
  algorithm directly in 4D code.  We've called the resulting method 
  ShellSort and hope that some of you find it useful.


Uses

  Here are a few suggestions on when you might wish to consider using 
  ShellSort:
  
  1) When you don't wish to purchase or deploy a plug-in for multi-level 
     array sorts.
  2) When you require sorts (single or multi-level) on built arrays.
  3) When you require sorts (single or multi-level) on hundreds or even 
     thousands of arrays in one call.
  4) When ShellSort outperforms other sorting methods.
  5) When you wish to modify or extend (or debug) the ShellSort source code.
  
  Like other code-intensive methods, we cannot recommend using ShellSort
  in interpreted databases where "large" arrays may need to be sorted. 
  Interpreted 4D is just too slow for such tasks. If you need fast 
  multi-level array sorting in interpreted 4D databases consider purchasing
  a plug-in.
  
  ShellSort is fast compiled. Performance, however, depends on many factors, 
  such as the sorting algorithm, the processor speed, the sizes of processor
  caches, the types and sizes of arrays being sorted, the pre-sortedness of
  the data, the number of distinct values in the sort data, the quality of 
  4D compiled code.
 

The Interface(s)

  There are two interfaces for ShellSort:
  
  Call Interface 1 (modelled on da_sort from Foresight's DynArray plug-in)
  ----------------
  
    ShellSort (sort directions; arraypointer1 {;...; arraypointerN})

    The parameters for this call are:
      
    $1 (sort directions)
       a text string consisting of '>' (ascending) and '<' (descending) 
       characters.  The number of sort levels (sort keys), K, is given by 
       length($1).  
       
       At least one sort direction character must be specified.
          
    $2; ...; $N
       pointers to the arrays to be sorted.  Arrays can be of type INTEGER,
       LONGINT, REAL, STRING, TEXT, DATE or BOOLEAN.  PICTURE and POINTER
       arrays can be passed as non-key arrays.
     
       At least one array pointer must be passed.
     
  Call Interface 2
  ----------------
  
    ShellSort (sort directions + "A"; builtarraypointer)

    The parameters for this call are:
      
    $1 (sort directions) 
       a text string consisting of '>' (ascending) and '<' (descending) 
       characters followed by the letter A (uppercase or lowercase). The 
       number of sort levels (sort keys), K, is given by length($1)-1 for
       this call interface.
     
       At least one sort direction character must be specified.
          
    $2
       a pointer to an array of pointers to the arrays to be sorted. The 
       arrays to be sorted can be of type INTEGER, LONGINT, REAL, STRING, 
       TEXT, DATE or BOOLEAN.  PICTURE and POINTER arrays can be passed as 
       non-key arrays.
     
       The built array must have at least one element, i.e. there must be at
       least one array to sort.
     
  Description
  -----------
    
    The first K arrays act as sort keys. Any additional arrays are sorted
    relative to the final sort order of the K key arrays.  The ordering of 
    each sort key is defined by the corresponding sort direction character 
    in $1. 
  
    The number of arrays that can be passed to ShellSort in the first call
    interface above, is limited only by the number of parameters that 4D 
    will allow a user-defined method to have. This limit is not defined in
    4D's reference manuals (we suspect the limit is 24 for interpreted 4D 
    and 64 for compiled 4D). If in doubt, use the 2nd call interface where
    you can sort hundreds or even thousands of arrays.
    
  Example 1:
  ----------
    
    ShellSort ("<>>"; >>Age; >>Surname; >>Firstname; >>Score; >>Class)
       
    This sorts the specified arrays by Age (descending), then by Surname
    (ascending), and then by Firstname (ascending). Score and Class are
    sorted relative to the first three arrays.
  
  Example 2:
  ----------
  
    ShellSort ("<>>A"; >>BuiltArray)
    
    This sorts the specified arrays by BuiltArray{1}>> (descending), then by
    BuiltArray {2}>> (ascending), then by BuiltArray 3}>> (ascending).  
    Additional arrays (i.e. BuiltArray {4}>> onwards) are sorted relative to
    the first three arrays
    

The Caveats

  1) Do not sort on PICTURE or POINTER arrays.  Elements of such arrays
     cannot be ordered in 4D.  PICTURE and POINTER arrays can be passed as
     non-key arrays however.
          
  2) Do not pass arrays of different sizes. 
   
  3) Do not pass the same array via two different parameters for the same 
     call or via two elements of a built array sort i.e. pointers to passed
     arrays must be distinct. 
   
  4) Sorts on string & text arrays with embedded '@' wildcard characters 
     may cause unexpected results. ShellSort uses 4D's built-in comparators, 
     which treat the '@' character as a wildcard when embedded within the
     right-side string of a string comparison operator.
        
  5) Do not assume that the zero'th element of arrays will retain their 
     values after ShellSort returns. ShellSort uses the zero'th element of 
     each array as a temporary variable when moving array elements to their 
     final sorted positions.

  6) Do not pass a pointer to a local array.  4D does not handle such 
     pointers in a consistent way, e.g. in interpreted databases 4D's TYPE 
     function returns UNDEFINED (5) on dereferenced pointers to local 
     arrays. In compiled databases, the TYPE function types dereferenced
     pointers to local arrays correctly, i.e. in our tests ShellSort 
     _appeared_ to work without problem in such circumstances.
        
  7) Do not pass pointers to 2-dimensional arrays. Direct sorting on 
     2-dimensional arrays is not currently supported.  You can however pass
     pointers to the 1st dimension of a 2-dimensional array, except in one 
     case. The 1st dimension of a 2-dimensional boolean array cannot be used
     as a sort key because 4D does not allow us to distinguish such arrays 
     from non-boolean 2-dimensional arrays and we need to be able 
     distinguish them in order to compare boolean values (4D's comparators
     '>' and '<' are not defined for booleans). 
     
  8) If you wish to sort on REAL arrays were values are "relatively close",
     then consider changing the '<' and '>' key comparison code to override
     the (undefined) "closeness" criteria used by 4D for '<' and '>' REAL
     comparisons.  Do not rely on a sensible result from ShellSort if a sort
     is performed on a REAL array that holds NaN (Not-A-Number) real values.
       
  9) In order to yield time to other running processes, ShellSort issues an
     IDLE call every 50,000 iterations of the main loop.  This should be 
     fine for most databases. Change $idle_every if your database would 
     benefit from more or less frequent process switching. To get ShellSort
     to issue an IDLE once in a "blue moon", set $idle_every to -1. If you 
     are sorting inter-process arrays do not use such arrays in other
     processes while the sort is being performed. In particular do not try
     to sort the same inter-process array(s) in different processes at the
     same time.  If you wish ShellSort to 'run-to-completion' in a compiled
     application, set $idle_every to -1.  Consider also protecting the code
     with a semaphore in this circumstance.
          
 10) ShellSort declares a process array of type pointer called SS_BUILTARR. 
     Do not use this identifier in your own code.
     
 11) Finally, when developing and testing using interpreted 4D, keep the 
     arrays to be sorted as small as possible.  Array sorting is a 
     time-consuming operation and running ShellSort on "non-small" arrays 
     with interpreted 4D will invariably lead to frustration.
     

The Code
 
  The sorting algorithm we have chosen to implement is D.L. Shell's
  ShellSort.  This is a reasonably fast algorithm and is straightforward to 
  code in 4D. The time complexity of our particular shellsort variant is 
  about O(N^1.2).  QuickSort is a potentially faster algorithm but requires 
  more careful coding to ensure optimal performance and to ensure bounded 
  process stack usage for recursive calls. (We've also implemented a 
  multi-level QuickSort in 4D, but found that ShellSort was quicker in our 
  tests).

  To avoid many potentially costly array element moves, our method uses 
  a local longint array ($position) to "point" to array elements. Array 
  elements are accessed indirectly via this array, and only elements of this
  array are moved in the main sorting loop. On completion of the sort, the
  elements of the passed arrays are moved to their new positions as defined
  by the "sorted" $position array.
  
  The method for ShellSort is very long.  We apologise in advance to readers
  who dislike lengthy methods - much of the code is devoted to validating 
  parameters and comparing sort keys.
  
  
Installation

  Copy and paste the ShellSort method from the shellsort database into
  your own database (you will need to create a new (empty) datafile when the 
  shellsort structure file is first opened).  If you use compiler methods
  copy and paste the method Compiler_ShellS as well. The shellsort database 
  also includes a small test method Test_ShellSort which can be executed 
  from the User Environment to gauge the performance of ShellSort (don't 
  forget to compile the shellsort database first though!)
    

Some Exercises for the Reader
     
 1) Extend ShellSort to support row-wise and/or column-wise multi-level 
    sorts on 2-dimensional arrays. 
    
 2) Add an option to ShellSort to implement stable sorts. A stable sort
    preserves the original order of the arrays when two sets of keys have
    identical values. The easiest way of accomplishing this would be to 
    initialise a local array with an ascending sequence of values, e.g. 1, 
    2, 3, ..., $size and then to use this array as an extra implicit sort 
    key after the last sort key passed into the method.


Some Exercises for ACI

  1) Extend the SORT ARRAY command to support sorts on multiple-levels, as
     well as "built" array sorts.
      
  2) Speed up multi-level sorting of record selections (it's currently too 
     slow).  Provide a "built" option for ORDER BY (SORT SELECTION in v3).
       
  3) Make 2-dimensional arrays "first-class", in particular extend the TYPE 
     function to return the correct element type of a 2-dimensional array, 
     and the correct array type for the 1st dimension of a 2-dimensional
     array. Allow formation of pointers to array elements access via another 
     pointer, e.g. allow operations like  >> (PTR >> {$k})
     
  4) Implement pointers to local variables and local arrays.  Yes, we all 
     know that such pointers could be dangerous when left in the hands of 
     inexperienced programmers, but 4D needs to cater for experienced 
     programmers as well. Ensure also, that such pointers when dereferenced, 
     can be passed to plug-in methods.
     
  5) Add to 4D Compiler the capability to produce compiled code libraries
     that can hold generic methods like ShellSort.  Allow such libraries
     to be used in interpreted databases.
     

A Reference

  For more details on the shellsort algorithm (and other interesting
  algorithms) seek out a copy of Practical Algorithms for Programmers by
  Andrew Binstock and John Rex, Addison-Wesley, 1995 (the algorithms are 
  written in C).  Most textbooks that cover sorting algorithms will include
  shellsort however.
  
  
Revision History

  July 1996: Written.
  Sept 1996: Minor cleanup of code and documentation. Made available on ftp.
  Dec  1996: Added support for built array sorts. Code to move elements 
             to final positions rewritten. Diminishing gaps now based on 
             (5n-1)/11.
  Oct  1998: ShellSort mentioned on NUG-list. ShellSort made available on 
             ftp in v3 and v6, Macintosh and Windows versions. This note 
             edited to use v6 terminology. 
  	     
  	     
The End
 
  Happy Sorting and Best Wishes, 
  
    Naranker Dulay


