104 million places from Foursquare in Oracle

February 27, 2025

apex, Oracle, parquet, spatial, SQL

Foursquare open sourced over 104 million points of interest (POIs) collected over several years. This is a large dataset perfect for testing and learning more about Spatial in the database. Oracle APEX has what you need to explore and display these places in your own browser.

Meet Foursquare OS Places

Foursquare used to be very popular. People where checking in everywhere with their mobile phones. In November last year Foursquare made these POIs available for download. You can read more about the dataset here, and you can explore the places with Foursquare Studio where the graphics above is taken from.

The files are delivered in Parquet format from an S3 bucket at AWS. They are also available from Huggingface and Snowflake.

In Oracle Autonomous Database you can read parquet files via an external table, created with the DBMS_CLOUD package, example below.1

You can access the files directly from the S3 bucket, but I thought downloading them first is a good idea to avoid unnecessary load on their infrastructure. You need the aws client and an AWS account to download, look at the userguide on how to install the client. Foursquare has published updated versions of these files, in this post I am using the parquet files at s3://fsq-os-places-us-east-1/release/dt=2025-02-06/places/parquet/, the categories are stored in one file located at s3://fsq-os-places-us-east-1/release/dt=2025-02-06/categories/parquet/

This is how you list the files in the S3 bucket with the aws cli:

aws s3 ls s3://fsq-os-places-us-east-1/release/dt=2025-02-06/places/parquet/
2025-02-06 17:40:35  110079156 places-00000.zstd.parquet
2025-02-06 17:40:35  117677765 places-00001.zstd.parquet
:
:
2025-02-06 17:41:47  108197266 places-00085.zstd.parquet
2025-02-06 17:41:59  122402515 places-00098.zstd.parquet
2025-02-06 17:42:00  108789011 places-00099.zstd.parquet

External table on Parquet files

I downloaded the files and then uploaded them to a bucket in OCI object storage. You can read more about object storage in this blogpost .

With all the files in one bucket in OCI I created an external table like this:

begin
    dbms_cloud.create_external_table (
       table_name =>'foursquare_categories',
       credential_name =>'OCI_CRED',
       file_uri_list =>'https://frgurbagurbavdn.objectstorage.eu-frankfurt-1.oci.customer-oci.com/n/frgurbagurbavdn/b/Foursquare/o/categories.zstd.parquet',
       format =>  '{"type": "parquet",  "schema": "first"}'
    );
end;
/

The table above has all the categories. The POIs resides in 100 parquet files and can be accessed in one external table with this syntax, note the use of * to include all the files:

begin
    dbms_cloud.create_external_table (
       table_name =>'foursquare',
       credential_name =>'OCI_CRED',
       file_uri_list =>'https://frgurbagurbavdn.objectstorage.eu-frankfurt-1.oci.customer-oci.com/n/frgurbagurbavdn/b/Foursquare/o/places-*.zstd.parquet',
       format =>  '{"type": "parquet",  "schema": "first"}'
    );
end;
/

I created the credential used above with the dbms_cloud.create_credential procedure, look at the post Use OCI object storage from PL/SQL for an explanation.

The main table has 26 columns including a reference to the categories table:

Now with this in place I can look for places around me with a query like this:

select *
from foursquare
where country='NO' 
    and latitude is not null 
    and longitude is not null 
    and sdo_within_distance(
    sdo_geometry(2001,8307,sdo_point_type(longitude,latitude,null),null,null), 
    sdo_geometry(2001,8307,sdo_point_type(8.758214057686658,58.4493290890655,null),null,null),
    'distance=1000') = 'TRUE';

But that is not really… fast.

A note on coordinate systems

As is common with positions collected with GPS, Foursquare uses WGS84 (Wikipedia ). When you define a column with datatype SDO_GEOMETRY you usually specify the coordinate system aka spatial reference system with a specific SRID. It is a study in itself (and it might cure your insomnia) with a large number of different systems around the world2. Originally WGS84 was defined by EPSG with SRID 4326 (see Wikipedia ) and coordinates order (latitude,longitude), but later some computer scientists came along and preferred the order that reminded them about (x,y)3. Since early days, Oracle has used SRID 8307 for “Longitude / Latitude (WGS 84)”, but to add to the confusion they later added SRID 4326, and in the Oracle Spatial manual, section 6.2.4, an example has this comment beside SRID 4326: “– SRID for WGS 84 longitude/latitude”. According to some sources SRID 4326 in Oracle is “EPSG SRID 4326 with axes reversed” (see this thread on Oracle Forums for an interesting discussion.) It’s like chili con carne for vegetarians (chili con carne sin carne?) I won’t say the Oracle Spatial manual is hysterical clear on these matters.

OK, I may have gotten the history wrong here, I promise to update this if my friend in Hamburg corrects me.

The order longitude/latitude is what is used in many web applications and databases today, including APEX (more on this later). If your house is displayed in the middle of the ocean somewhere it is a sign that you inverted the order. (I found mine in Turkey the first time I tried maps in APEX.) I have decided to use SRID 8307 in order to not contribute more to the confusion, and since the reference table SDO_COORD_REF_SYS gives it the name Longitude / Latitude (WGS 84).

Bring it in

Lets create a normal table so we can add a spatial index later:

create table foursquare_geom
as select f.*, 
    sdo_geometry(2001
        ,8307
        ,sdo_point_type(f.longitude,f.latitude,null),null,null) geometry
from foursquare f;

The number 2001 is the kind of geometry (SDO_GTYPE) which is a point in two dimensions. I could have used the constant SDO_POINT2D, see Spatial Developer’s Guide, 2.2.1. Section 2.2 goes into detail about the SDO_GEOMETRY type.

This is just one way to create the spatial datatype SDO_GEOMETRY. The column GEOM in the table has the position in WKB format (see section 2.4 in the manual), and you can achieve the same result with:

create table foursquare_geometry
as select f.*, sdo_geometry(geom,8307) geometry
from foursquare f;

Spatial metadata

Spatial is special and needs some metadata for the spatial column. This means updating a view USER_SDO_GEOM_METADATA (some triggers take care of updating the underlying tables). You either do this in SQL Developer as shown below:

or you can just write the insert like this:

Insert into user_sdo_geom_metadata (TABLE_NAME,COLUMN_NAME,DIMINFO,SRID)
    values ('FOURSQUARE_GEOM','GEOMETRY'
        ,MDSYS.SDO_DIM_ARRAY(
                MDSYS.SDO_DIM_ELEMENT('longitude', -180, 180, 0.000001)
                , MDSYS.SDO_DIM_ELEMENT('latitude', -90, 90, 0.000001))
                ,8307);

Depending on your patience, you may experiment with different levels of tolerance (0.000001 above.)

Spatial index

With the metadata in place you can create the index:

CREATE INDEX GEOM_SI ON FOURSQUARE_GEOM(GEOMETRY)
    INDEXTYPE IS MDSYS.SPATIAL_INDEX_V2;

Note the indextype. The indextype SPATIAL_INDEX_V2 is system-managed and Oracle recommends it over the previous type.

A simplified version of the query I ran on the external table, but on the internal table with a spatial index:

select *
from foursquare_geom
where sdo_within_distance(
    geometry, 
    sdo_geometry(2001,8307,sdo_point_type(8.75821,58.44932,	null),null,null),
    'distance=1000') = 'TRUE';

returns in “no time”, even without partitioning in place.

Maps in APEX

Creating maps in APEX is easy with the map region introduced some versions ago. But with so many points presenting them can be a challenge (there are over 160 in a radius of 1000 meter from me). First attempt just showed a completely overcrowded map. But the layer type Heat Map works great as you zoom out (up to a level):

Another layer of type Points is set to be visible for specific zoom levels where it makes sense to show individual points.

Visually inspection of these points shows that some points are inaccurate (or the guy who went to Øyvinds pub4 was walking on water when he got the location fix.)

Further explorations

I had some fun with simple analysis of the data, like how many points where created each year. The first was created at the end of 2003, and a simple group by query shows you the popularity over time (it has been declining).

I did not go much into the details about the datatypes and the data model in Oracle Spatial. It takes a bit to get used to, but makes really sense when you work with it. There are plenty of features I want to explore further like Spatial Analysis and Mining (chapter 9 in the manual).

Also I did a minimal conversion when I created the internal table, e.g. the date columns should be converted with to_date(), and there is no need to preserve the BLOB column. The data set is also perfect for experimenting with partitioning.


  1. For more details on parquet files and Oracle, see the article Working with Parquet files with Oracle Database.  ↩︎

  2. Have a look at this: select * from SDO_COORD_REF_SYS; Currently 6194 entries. ↩︎

  3. The JSON document in the BBOX column was probably not defined by someone who worries about correct naming, example: {"xmin":-121.75161480903625,"ymin":38.539413226528,"xmax":-121.75161480903625,"ymax":38.539413226528} ↩︎

  4. Yes, there was a pub with that name in Arendal, I went there a few years ago, I did not ask for a discount. Current status unknown. ↩︎