dataria.DATA ============ .. py:module:: dataria.DATA Functions --------- .. autoapisummary:: dataria.DATA.sparql_to_dataframe dataria.DATA.iso_to_period dataria.DATA.parse_xsd_date_or_datetime dataria.DATA.get_token_matrix Module Contents --------------- .. py:function:: sparql_to_dataframe(endpoint_url, query, csv_filename='query_result.csv') Execute a SPARQL query and convert the results into a Pandas DataFrame. Supports parsing of geometry (WKT, GeoJSON), numeric types, and xsd:date/xsd:dateTime fields. Can optionally save results as CSV. :param endpoint_url: SPARQL endpoint URL. :type endpoint_url: str :param query: SPARQL query string. :type query: str :param csv_filename: Path to save the CSV result. If None, no file is written. :type csv_filename: str :returns: The query results as a DataFrame with parsed values. :rtype: pd.DataFrame .. py:function:: iso_to_period(iso_string) Convert an ISO date string (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS) to a Pandas Period. Intended for historical date handling when datetime parsing is not viable. :param iso_string: ISO 8601 date string. :type iso_string: str :returns: A Period with daily frequency or NaT if parsing fails. :rtype: pd.Period or pd.NaT .. py:function:: parse_xsd_date_or_datetime(iso_string, dtype, unix_year=1950) Parse xsd:date or xsd:dateTime strings into Pandas datetime or period values. Automatically handles early historical dates by converting them into Periods to avoid datetime parsing issues (e.g., pre-1950). :param iso_string: ISO-formatted date or datetime string. :type iso_string: str :param dtype: Expected data type (e.g. xsd:date or xsd:dateTime). :type dtype: str :param unix_year: Dates earlier than this year are treated as historical (default: 1950). :type unix_year: int :returns: Parsed date/time object, or NaT on failure. :rtype: pd.Timestamp or pd.Period .. py:function:: get_token_matrix(series, sep=' ', dummies=True) Generate a token matrix (either binary or count-based) from a Pandas Series. :param series: The input column containing string data. :type series: pd.Series :param sep: Separator used to split tokens. :type sep: str :param dummies: If True, return binary (0/1) presence; if False, return token counts. :type dummies: bool :returns: A DataFrame with one column per token and one row per entry. :rtype: pd.DataFrame