A little data handling, csv table data to SQLite database

Reading time ~2 minutes

Thought I'd post a little step-by-step on how I imported the .csv table data from the Open Data Enschede into an SQLite database and prepare it for my EnschedeLocator. This is mostly for my own sake, so that I have it as a reference, but it might also be of interest to others.

- Address data set [.csv 8.3MB]

Store your data set in a working folder. I am on a Mac and have SQLite3 installed and will use the terminal. The following steps on Windows and Linux should be similar.

The first thing I do is to open the file in Excel and remove the columns I don't need. I only need the street name, house number, latitude and longitude. Delete the other columns and save. You don't need the first row with the column names, so just delete that too.

Then insert a new column in front of street name. Here we will keep each address' primary key. Just use the formula =ROW() and double-click the bottom right corner of the box to apply to all cells in that column. Save the .csv file. It should now be about 3.8 MB.

Then open Terminal and move to your working folder. Create your SQLite database by executing:

[code]
sqlite3 databasename.db
[/code]

To make the database Android friendly, you have to create a new table called android_metadata and fill it with some columns. Then create a table for the addresses.

[code lang="sql"]
CREATE TABLE "android_metadata"("locale" TEXT DEFAULT 'en_US');
CREATE TABLE "addresses"(_id integer primary key,
straatnaam text, huisnummer integer, latitude double, longitude double);
[/code]

Now we're gonna use the import the data.

[code lang="sql"]
.separator ","
.import enschede_addresses.csv addresses
[/code]

And that should be enough. Note that if you have a .tsv-file or something else, you need to change the .separator command accordingly.

The first time I tried, I only got the first row imported. This is due to the line-break format of the csv-file, SQLite requires the LF-format. This can be fixed in a text editor, such as TextWrangler.

If this happened to you too, run a DELETE FROM ADDRESSES; to erase the data you imported, change the line break format and import it again.

Here are the complete steps in terminal:

Due to removing some columns in the dataset, the database now contains some redundant entries, and since we're storing the data locally on the device, we want to minimize the size.

Here is one way of removing all but one row containing the same street name and house number:

[code lang="sql"]
DELETE FROM addresses WHERE _id NOT IN(SELECT MIN(_id) FROM addresses GROUP BY straatnaam, huisnummer);
vacuum;
[/code]

Only deleting rows doesn't reduce the file size of the db, to do so, run the vacuum command in addition. The size of the database now went from 3.8 to 3.1 MB.

That's it! The database file is now ready to be read by an Android application.

CloudMerger - Java based application for merging (kinect) point clouds [Final project]

For my last three weeks at the Faculty of Geo-Information Science and Earth Observation (ITC) at the University of Twente in May 2012, I&...… Continue reading