dataria.DATA
============

.. py:module:: dataria.DATA


Functions
---------

.. autoapisummary::

   dataria.DATA.sparql_to_dataframe
   dataria.DATA.iso_to_period
   dataria.DATA.parse_xsd_date_or_datetime
   dataria.DATA.get_token_matrix


Module Contents
---------------

.. py:function:: sparql_to_dataframe(endpoint_url, query, csv_filename='query_result.csv')

   Execute a SPARQL query and convert the results into a Pandas DataFrame.

   Supports parsing of geometry (WKT, GeoJSON), numeric types, and xsd:date/xsd:dateTime fields.
   Can optionally save results as CSV.

   :param endpoint_url: SPARQL endpoint URL.
   :type endpoint_url: str
   :param query: SPARQL query string.
   :type query: str
   :param csv_filename: Path to save the CSV result. If None, no file is written.
   :type csv_filename: str

   :returns: The query results as a DataFrame with parsed values.
   :rtype: pd.DataFrame


.. py:function:: iso_to_period(iso_string)

   Convert an ISO date string (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS) to a Pandas Period.

   Intended for historical date handling when datetime parsing is not viable.

   :param iso_string: ISO 8601 date string.
   :type iso_string: str

   :returns: A Period with daily frequency or NaT if parsing fails.
   :rtype: pd.Period or pd.NaT


.. py:function:: parse_xsd_date_or_datetime(iso_string, dtype, unix_year=1950)

   Parse xsd:date or xsd:dateTime strings into Pandas datetime or period values.

   Automatically handles early historical dates by converting them into Periods
   to avoid datetime parsing issues (e.g., pre-1950).

   :param iso_string: ISO-formatted date or datetime string.
   :type iso_string: str
   :param dtype: Expected data type (e.g. xsd:date or xsd:dateTime).
   :type dtype: str
   :param unix_year: Dates earlier than this year are treated as historical (default: 1950).
   :type unix_year: int

   :returns: Parsed date/time object, or NaT on failure.
   :rtype: pd.Timestamp or pd.Period


.. py:function:: get_token_matrix(series, sep=' ', dummies=True)

   Generate a token matrix (either binary or count-based) from a Pandas Series.

   :param series: The input column containing string data.
   :type series: pd.Series
   :param sep: Separator used to split tokens.
   :type sep: str
   :param dummies: If True, return binary (0/1) presence; if False, return token counts.
   :type dummies: bool

   :returns: A DataFrame with one column per token and one row per entry.
   :rtype: pd.DataFrame