dataria.DATA¶
Functions¶
|
Execute a SPARQL query and convert the results into a Pandas DataFrame. |
|
Convert an ISO date string (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS) to a Pandas Period. |
|
Parse xsd:date or xsd:dateTime strings into Pandas datetime or period values. |
|
Generate a token matrix (either binary or count-based) from a Pandas Series. |
Module Contents¶
- dataria.DATA.sparql_to_dataframe(endpoint_url, query, csv_filename='query_result.csv')¶
Execute a SPARQL query and convert the results into a Pandas DataFrame.
Supports parsing of geometry (WKT, GeoJSON), numeric types, and xsd:date/xsd:dateTime fields. Can optionally save results as CSV.
- Parameters:
endpoint_url (str) – SPARQL endpoint URL.
query (str) – SPARQL query string.
csv_filename (str) – Path to save the CSV result. If None, no file is written.
- Returns:
The query results as a DataFrame with parsed values.
- Return type:
pd.DataFrame
- dataria.DATA.iso_to_period(iso_string)¶
Convert an ISO date string (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS) to a Pandas Period.
Intended for historical date handling when datetime parsing is not viable.
- Parameters:
iso_string (str) – ISO 8601 date string.
- Returns:
A Period with daily frequency or NaT if parsing fails.
- Return type:
pd.Period or pd.NaT
- dataria.DATA.parse_xsd_date_or_datetime(iso_string, dtype, unix_year=1950)¶
Parse xsd:date or xsd:dateTime strings into Pandas datetime or period values.
Automatically handles early historical dates by converting them into Periods to avoid datetime parsing issues (e.g., pre-1950).
- Parameters:
iso_string (str) – ISO-formatted date or datetime string.
dtype (str) – Expected data type (e.g. xsd:date or xsd:dateTime).
unix_year (int) – Dates earlier than this year are treated as historical (default: 1950).
- Returns:
Parsed date/time object, or NaT on failure.
- Return type:
pd.Timestamp or pd.Period
- dataria.DATA.get_token_matrix(series, sep=' ', dummies=True)¶
Generate a token matrix (either binary or count-based) from a Pandas Series.
- Parameters:
series (pd.Series) – The input column containing string data.
sep (str) – Separator used to split tokens.
dummies (bool) – If True, return binary (0/1) presence; if False, return token counts.
- Returns:
A DataFrame with one column per token and one row per entry.
- Return type:
pd.DataFrame