For teaching and research in the area of databases, it is useful to have available medium sized databases (meaning data measured in MB, rather than KB or GB) containing `real-world' information to test languages and tools on practical problems. This is a collection of information about how to prepare a relational database from various datasets that have been made publicly available. The first two listed (relational) databases are available online within the Computing Department at Imperial College.
The 1990 Census Gazetteer data has been made publicly available in a set of text files. The technical report Importing the US 1990 Census Data into a Relational DBMS (pdf) details how the Java Program BuildUSCensus1990.java can import this data into an RDB, together with a file of US states supplied here.
The USGS has made publicly available a set of text files containing conprehensive geographical data about the US. The technical report Importing the US Geographical Survey Data\\into a Relational DBMS (pdf) details how the Java Program BuildUSGS.java can import the summary data files into an RDB, together with a file of US states supplied here.
Collected from a number of sources, the Mondial database is available in both XML and relational form. It is often cited as a sample database in recent research work on data integration.
This is a large bibliography of computer science research work, which may be accessed online, as well as being downloadable as a large XML file and there is a simple DTD available for the file. Like Mondial, DBLP is often cited as a sample database in recent research work on data integration.
The complete works of William Shakespeare are available as a XML file
A wide range of literature works are available from the Oxford Text Archive, in a number of formats. The XML format chosen is of interest since it uses mixed mode XML elements (i.e. XML elements that contains character data and nested elements).
The import programs for US Geographical Survey and US Census require that the dblibrary.jar Toolkit be in your CLASSPATH. They are also part of the Toolkit. Thus in a bash shell you would only need to run (with the data files required in place:
export CLASSPATH=dblibrary.jarThe ... should be replaced by appropriate connection details for your database.
The Java JDBC API allows for database systems to be accessed from Java. To access a particular database, you need a JDBC driver for that database system. Some drivers which I have used are for common databases are directly linked to below:
A number of tools are available which will utilize JDBC drivers to provide database independent access to databases: