MSSQL

The mssql module contains a variety of functions to interact with MSSQL databases through Python and Pandas.

Reading tables

pdsql.mssql.rd_sql(server, database, table=None, col_names=None, where_col=None, where_val=None, where_op='AND', geo_col=False, from_date=None, to_date=None, date_col=None, rename_cols=None, stmt=None, export_path=None)

Function to import data from an MSSQL database.

Parameters:
  • server (str) – The server name. e.g.: ‘SQL2012PROD03’
  • database (str) – The specific database within the server. e.g.: ‘LowFlows’
  • table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’
  • col_names (list of str) – The column names that should be retrieved. e.g.: [‘SiteID’, ‘BandNo’, ‘RecordNo’]
  • where_col (str or dict) – Must be either a string with an associated where_val list or a dictionary of strings to lists.’. e.g.: ‘SnapshotType’ or {‘SnapshotType’: [‘value1’, ‘value2’]}
  • where_val (list) – The WHERE query values for the where_col. e.g. [‘value1’, ‘value2’]
  • where_op (str) – If where_col is a dictionary and there are more than one key, then the operator that connects the where statements must be either ‘AND’ or ‘OR’.
  • geo_col (bool) – Is there a geometry column in the table?.
  • from_date (str) – The start date in the form ‘2010-01-01’.
  • to_date (str) – The end date in the form ‘2010-01-01’.
  • date_col (str) – The SQL table column that contains the dates.
  • rename_cols (list of str) – List of strings to rename the resulting DataFrame column names.
  • stmt (str) – Custom SQL statement to be directly passed to the database. This will ignore all prior arguments except server and database.
  • export_path (str) – The export path for a csv file if desired. If None, then nothing is exported.
Returns:

Return type:

DataFrame

pdsql.mssql.rd_sql_ts(server, database, table, groupby_cols, date_col, values_cols, resample_code=None, period=1, fun='mean', val_round=3, where_col=None, where_val=None, where_op='AND', from_date=None, to_date=None, min_count=None, export_path=None)

Function to specifically read and possibly aggregate time series data stored in MSSQL tables.

Parameters:
  • server (str) – The server name. e.g.: ‘SQL2012PROD03’
  • database (str) – The specific database within the server. e.g.: ‘LowFlows’
  • table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’
  • groupby_cols (str or list of str) – The columns in the SQL table to grouped and returned with the time series data.
  • date_col (str) – The date column in the table.
  • values_cols (str or list of str) – The column(s) of the value(s) that should be resampled.
  • resample_code (str or None) – The Pandas time series resampling code. e.g. ‘D’ for day, ‘W’ for week, ‘M’ for month, etc.
  • period (int) – The number of resampling periods. e.g. period = 2 and resample = ‘D’ would be to resample the values over a 2 day period.
  • fun (str) – The resampling function. i.e. mean, sum, count, min, or max. No median yet…
  • val_round (int) – The number of decimals to round the values.
  • where_col (str or dict) – Must be either a string with an associated where_val list or a dictionary of strings to lists.’. e.g.: ‘SnapshotType’ or {‘SnapshotType’: [‘value1’, ‘value2’]}
  • where_val (list) – The WHERE query values for the where_col. e.g. [‘value1’, ‘value2’]
  • where_op (str) – If where_col is a dictionary and there are more than one key, then the operator that connects the where statements must be either ‘AND’ or ‘OR’.
  • from_date (str) – The start date in the form ‘2010-01-01’.
  • to_date (str) – The end date in the form ‘2010-01-01’.
  • min_count (int) – The minimum number of values required to return groupby_cols. Only works when groupby_cols and vlue_cols are str.
  • export_path (str) – The export path for a csv file if desired. If None, then nothing is exported.
Returns:

Pandas DataFrame with MultiIndex of groupby_cols and date_col

Return type:

DataFrame

pdsql.mssql.rd_sql_geo(server, database, table, col_stmt, where_lst=None)

Function to extract the geometry and coordinate system from an SQL geometry field. Returns a shapely geometry object and a proj4 str.

Parameters:
  • server (str) – The server name. e.g.: ‘SQL2012PROD03’
  • database (str) – The specific database within the server. e.g.: ‘LowFlows’
  • table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’
  • where_lst (list) – A list of where statements to be passed and added to the final SQL statement.
Returns:

  • list of shapely geometry objects – The main output is a list of shapely geometry objects for all queried rows of the SQL table.
  • str – The second output is a proj4 str of the projection system.

Creating tables

pdsql.mssql.create_mssql_table(server, database, table, dtype_dict, primary_keys=None, foreign_keys=None, foreign_table=None, drop_table=False)

Function to create a table in an mssql database.

Parameters:
  • server (str) – The server name. e.g.: ‘SQL2012PROD03’
  • database (str) – The specific database within the server. e.g.: ‘LowFlows’
  • table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’
  • dtype_dict (dict of str) – Dictionary of df columns to the associated sql data type. Examples below.
  • primary_keys (str or list of str) – Index columns to define uniqueness in the data structure.
  • foreign_keys (str or list of str) – Columns to link to another table in the same database.
  • foreign_table (str) – The table in the same database with the identical foreign key(s).
  • drop_table (bool) – If the table already exists, should it be dropped?
Returns:

Return type:

None

Writing to tables

pdsql.mssql.to_mssql(df, server, database, table, index=False, dtype=None)

Function to append a DataFrame onto an existing mssql table.

Parameters:
  • df (DataFrame) – DataFrame to be saved. The DataFrame column/index names must match those on the mssql table exactly.
  • server (str) – The server name. e.g.: ‘SQL2012PROD03’
  • database (str) – The specific database within the server. e.g.: ‘LowFlows’
  • table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’
  • index (bool) – Should the index be added as a column?
  • dtype (dict of column name to SQL type, default None) – Optional specifying the datatype for columns. The SQL type should be an SQLAlchemy type.
Returns:

Return type:

None

Updating tables

pdsql.mssql.update_mssql_table_rows(df, server, database, table, on, append=True)

Function to selectively delete rows from an mssql table.

Parameters:
  • df (DataFrame) – DataFrame with data to be overwritten in SQL table.
  • server (str) – The server name. e.g.: ‘SQL2012PROD03’
  • database (str) – The specific database within the server. e.g.: ‘LowFlows’
  • table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’
  • on (str or list) – The columns for the df and sql table to join to to make the update.
  • stmt (str) – SQL delete statement. Will override everything except server and database.
Returns:

Return type:

None

Deleting rows in tables

pdsql.mssql.del_mssql_table_rows(server, database, table=None, pk_df=None, stmt=None, **kwargs)

Function to selectively delete rows from an mssql table.

Parameters:
  • server (str) – The server name. e.g.: ‘SQL2012PROD03’
  • database (str) – The specific database within the server. e.g.: ‘LowFlows’
  • table (str or None if stmt is a str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’
  • pk_df (DataFrame) – A DataFrame of the primary keys of the table for the rows that should be removed. Will override anything in the kwargs.
  • stmt (str) – SQL delete statement. Will override everything except server and database.
  • **kwargs – Any kwargs that can be passed to sql_where_stmts.
Returns:

Return type:

None

Notes

Using the pk_df is the only way to ensure that specific rows will be deleted from composite keys. The column data types and names of pk_df must match the equivelant columns in the SQL table. The procedure creates a temporary table from the pk_df then deletes the rows in the target table based on the temp table. Then finally deletes the temp table.

Helper functions

pdsql.mssql.sql_where_stmts(where_col=None, where_val=None, where_op='AND', from_date=None, to_date=None, date_col=None)

Function to take various input parameters and convert them to a list of where statements for SQL.

Parameters:
  • where_col (str or dict) – Either a str with an associated where_val list or a dictionary of string keys to list values. If a str, it should represent the table column associated with the ‘where’ condition.
  • where_val (list or None) – If where_col is a str, then where_val must be a list of associated condition values.
  • where_op (str of either 'AND' or 'OR') – The binding operator for the where conditions.
  • from_date (str or None) – The start date in the form ‘2010-01-01’.
  • to_date (str or None) – The end date in the form ‘2010-01-01’.
  • date_col (str or None) – The SQL table column that contains the dates.
Returns:

Returns a list of str where conditions to be passed to an SQL execution function. The function needs to bind it with ” where ” + ” and “.join(where_lst)

Return type:

list of str or None

pdsql.mssql.sql_ts_agg_stmt(table, groupby_cols, date_col, values_cols, resample_code, period=1, fun='mean', val_round=3, where_lst=None)

Function to create an SQL statement to pass to an SQL driver to resample a time series table.

Parameters:
  • table (str) – The SQL table name.
  • groupby_cols (str or list of str) – The columns in the SQL table to grouped and returned with the time series data.
  • date_col (str) – The date column in the table.
  • values_cols (str or list of str) – The column(s) of the value(s) that should be resampled.
  • resample_code (str) – The Pandas time series resampling code. e.g. ‘D’ for day, ‘W’ for week, ‘M’ for month, etc.
  • period (int) – The number of resampling periods. e.g. period = 2 and resample = ‘D’ would be to resample the values over a 2 day period.
  • fun (str) – The resampling function. i.e. mean, sum, count, min, or max. No median yet…
  • val_round (int) – The number of decimals to round the values.
  • where_lst (list or None) – A list of where statements to be passed and added to the final SQL statement.
Returns:

A full SQL statement that can be passed directly to an SQL connection driver like pymssql through pandas read_sql function.

Return type:

str

API Pages