dataria.DATA

Functions

sparql_to_dataframe(endpoint_url, query[, csv_filename])

Execute a SPARQL query and convert the results into a Pandas DataFrame.

iso_to_period(iso_string)

Convert an ISO date string (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS) to a Pandas Period.

parse_xsd_date_or_datetime(iso_string, dtype[, unix_year])

Parse xsd:date or xsd:dateTime strings into Pandas datetime or period values.

get_token_matrix(series[, sep, dummies])

Generate a token matrix (either binary or count-based) from a Pandas Series.

Module Contents

dataria.DATA.sparql_to_dataframe(endpoint_url, query, csv_filename='query_result.csv')

Execute a SPARQL query and convert the results into a Pandas DataFrame.

Supports parsing of geometry (WKT, GeoJSON), numeric types, and xsd:date/xsd:dateTime fields. Can optionally save results as CSV.

Parameters:
  • endpoint_url (str) – SPARQL endpoint URL.

  • query (str) – SPARQL query string.

  • csv_filename (str) – Path to save the CSV result. If None, no file is written.

Returns:

The query results as a DataFrame with parsed values.

Return type:

pd.DataFrame

dataria.DATA.iso_to_period(iso_string)

Convert an ISO date string (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS) to a Pandas Period.

Intended for historical date handling when datetime parsing is not viable.

Parameters:

iso_string (str) – ISO 8601 date string.

Returns:

A Period with daily frequency or NaT if parsing fails.

Return type:

pd.Period or pd.NaT

dataria.DATA.parse_xsd_date_or_datetime(iso_string, dtype, unix_year=1950)

Parse xsd:date or xsd:dateTime strings into Pandas datetime or period values.

Automatically handles early historical dates by converting them into Periods to avoid datetime parsing issues (e.g., pre-1950).

Parameters:
  • iso_string (str) – ISO-formatted date or datetime string.

  • dtype (str) – Expected data type (e.g. xsd:date or xsd:dateTime).

  • unix_year (int) – Dates earlier than this year are treated as historical (default: 1950).

Returns:

Parsed date/time object, or NaT on failure.

Return type:

pd.Timestamp or pd.Period

dataria.DATA.get_token_matrix(series, sep=' ', dummies=True)

Generate a token matrix (either binary or count-based) from a Pandas Series.

Parameters:
  • series (pd.Series) – The input column containing string data.

  • sep (str) – Separator used to split tokens.

  • dummies (bool) – If True, return binary (0/1) presence; if False, return token counts.

Returns:

A DataFrame with one column per token and one row per entry.

Return type:

pd.DataFrame