DistribSort: A multi-level array sorting method for 4D
by N. Dulay, nd@doc.ic.ac.uk, October 1999.
Download:
Windows,
Mac
(4D v6.0 versions)
Acknowledgements
-
DistribSort uses code & documentation from the earlier ShellSort.
-
Coding ideas are also taken from the sorts of Michael Weisenberger and Tony
Ringsmuth.
Introduction
As seasoned 4D programmers will know, 4D does not provide a multi-level SORT
ARRAY command. One solution would be to use a plug-in. Another is to
implement a multi-level sorting algorithm directly in 4D code. This note
documents one such method called DistribSort, and we hope that some of you
find it useful.
Uses
Here are a few suggestions on when you might wish to consider using
DistribSort:
-
When you don't wish to purchase or deploy a plug-in for multi-level array
sorts (or can't find one to buy!)
-
When you need to perform sorts (single or multi-level) on built arrays.
-
When DistribSort happens to outperform other sorting methods.
-
When you wish to modify or extend (or debug) the DistribSort source code.
DistribSort is reasonably quick. Performance however, depends on many
factors, such as the sorting algorithm, the processor speed, the sizes of
processor caches, the types and sizes of the arrays being sorted, the
pre-sortedness of the data, the number of distinct values in the data, the
quality of 4D compiled code.
The Interfaces
There are two ways of calling DistribSort. The first is modelled on
Foresight's defunct dynarray plugin. The second supports sorts on built
arrays.
Call Interface 1
DistribSort (sort directions; arraypointer1 {;...; arraypointerN})
The parameters for this call are:
Call Interface 2
DistribSort (sort directions + "A"; builtarraypointer)
The parameters for this call are:
-
$1 (sort directions)
a text string consisting of '>' (ascending) and '<' (descending)
characters followed by the letter A (uppercase or lowercase). The
number of sort levels (sort keys), K, is given by length($1)-1 for this
call interface.
At least one sort direction character must be specified.
-
$2 (build array pointer)
a pointer to an array of pointers to the arrays to be sorted. The
arrays to be sorted can be of type INTEGER, LONGINT, REAL, STRING,
TEXT, DATE or BOOLEAN. PICTURE and POINTER arrays can be passed as
non-key arrays.
The built array must have at least one element, i.e. there must be at
least one array to sort.
Description
The first K arrays act as key arrays. Any additional arrays are sorted
relative to the final sort order of the K key arrays. The ordering of
each key array is defined by the corresponding sort direction character in
$1.
Example 1:
DistribSort ("<>>"; ->Age; ->Surname; ->Firstname; ->Score; ->Class)
This sorts the specified arrays by Age (descending), then by Surname
(ascending), and then by Firstname (ascending). Score and Class are sorted
relative to the first three arrays.
Example 2:
DistribSort ("<>>A"; ->BuiltArray)
This sorts the specified arrays by BuiltArray {1}-> (descending), then by
BuiltArray {2}-> (ascending), then by BuiltArray {3}-> (ascending).
Additional arrays (i.e. BuiltArray {4}-> onwards) are sorted relative to
the first three arrays
The Caveats
-
To sort arrays of size N, DistribSort uses 1 longint array of size N and
1 "position" array of size N and type longint or string or text. This
position array holds elements that have log10(N)*(number of sort levels)
digits encoded either into a longint or string or text value.
For example to sort 3 or more 40,000 element arrays on 3 levels requires
one 40,000 element longint array (=160,000 bytes) plus one 40,000 element
string array (=40,000x5*3=600,000 bytes).
The array type used for the position array is determined by keylength: a
longint array is used for keylengths <= 9, a string12 array is used for
keylengths <= 12, a string16 array is used for keylengths <= 16, a
string20 array is used for keylengths <= 20, a text array is used for
keylengths > 20. In addition, for speed, a blob is used for temporarily
saving and restoring key arrays.
-
Do not sort on PICTURE or POINTER arrays. Elements of such arrays cannot
be ordered in 4D. PICTURE and POINTER arrays can be passed as non-key
arrays however.
-
Do not pass arrays of different sizes.
-
The number of arrays that can be passed to DistribSort is limited only by
the number of parameters that 4D will allow a user-defined method to
have. This limit is not defined in 4D's reference manuals (we suspect the
limit is 24 for interpreted 4D and 64 for compiled 4D). If you pass a
large number of arrays consider changing the stack size of the calling
process. When more than 10 arrays are passed, DistribSort uses 4D's
EXECUTE command to build a SORT ARRAY command of all passed arrays and
the key array that DistribSort builds. Again the number of parameters
that can be passed to SORT ARRAY is not defined, nor are its stack
requirements known.
-
Do not pass the same array via two different parameters for the same call
or via two elements of a built array sort i.e. pointers to passed arrays
must be distinct.
-
Sorts on string & text arrays with embedded '@' wildcard characters may
cause unexpected results. DistribSort uses 4D's SORT ARRAY command, which
may treat the '@' character as a wildcard character.
-
DistribSort does not sort the zero'th element of arrays.
-
Do not pass pointers to local arrays.
-
Do not pass pointers to 2-dimensional arrays. Direct sorting on
2-dimensional arrays is not supported by 4D's SORT ARRAY command. You can
however pass pointers to the 1st dimension of a 2-dimensional array.
-
Do not rely on a sensible result from DistribSort if a sort is performed
on a REAL array that holds NaN (Not-A-Number) or INFINITY values.
-
DistribSort declares a number of process arrays and variables:
DS_BUILTARRAY, DS_KEY12, DS_KEY16, DS_KEY16, DS_KEYTEXT, DS_KEYPTR,
DS_ARRAYPTR. Do not use these identifiers in your own code.
-
Finally, when developing and testing using interpreted 4D, keep the
arrays to be sorted as small as possible.
The Code
The sorting algorithm we have chosen to implement is Distribution Sort (aka
Radix Sort). Distribution Sort is potentially a very fast linear time
algorithm. In our implementation, we use 4D's SORT ARRAY command on each
sort level to build a concatenated position key. This key is then used to
sort all arrays into the correct order. Thus sorting is actually performed
by multiple calls to the SORT ARRAY command (which is thought to use the
Quicksort algorithm internally).
The method for DistribSort is long. We apologise in advance to readers who
dislike lengthy methods - much of the code is devoted to checking the
parameters passed to DistribSort.
Installation
Download:
Windows,
Mac
(4D v6.0 versions)
Copy and paste the DistribSort method from the
DistribSort database into your own database (you will need to create a new
(empty) datafile when the DistribSort structure file is first opened). If
you use compiler methods copy and paste the method
Compiler_DistribSort as well. The DistribSort
database also includes a small test method
Test_DistribSort which can be
executed from the User Environment to gauge the performance of DistribSort.
Some Exercises for ACI
-
Extend the SORT ARRAY command to support sorts on multiple-levels, as
well as "built" array sorts.
-
Speed up multi-level sorting of record selections (it's currently too
slow). Provide a "built" option for ORDER BY.
-
Implement pointers to local variables and local arrays. Of course such
pointers could be dangerous when left in the hands of inexperienced
programmers, but 4D needs to cater for experienced programmers as well.
Ensure also, that such array pointers when dereferenced, can be passed to
plug-in methods.
-
Add to 4D Compiler the capability to produce compiled code libraries that
can hold generic methods like DistribSort. Allow such libraries to be
used in interpreted databases.
Finally
Happy sorting!
Revision History
-
Oct 1999:
- Sorting algorithm changed to original distribution sort and
renamed DistribSort. Enhancements from the sorts of Michael
Weisenberger and Tony Ringsmuth incorporated.
DistribSort is very similar to Tony's Radixsort. The idea to use
a blob to save and restore key arrays is taken from Tony's code,
as well as the use of Command Name (229) to get the name of the
SORT ARRAY command for use in the EXECUTE command.
Michael's Radixsort probably pre-dates both this and Tony's sort.
It has a similar-ish approach to DistribSort but different
memory requirements. The idea of using a longint key array to
encode small keys is taken from Michael's sort.
-
Oct 1998:
- ShellSort mentioned on NUG-list. ShellSort made available
on ftp in v3 and v6, Macintosh and Windows versions. This note
edited to use more v6 terminology.
-
Dec 1996:
- Added support for built array sorts. Code to move elements to
final positions rewritten. Diminishing gaps changed to
(5n-1)/11.
-
Sept 1996:
- Minor cleanup of code and documentation. Made available on ftp.
-
July 1996:
- ShellSort written (plus experimental versions of distribution
sort, quicksort and heapsort).