MSSQL¶

The mssql module contains a variety of functions to interact with MSSQL databases through Python and Pandas.

Reading tables¶

pdsql.mssql.rd_sql(server, database, table=None, col_names=None, where_col=None, where_val=None, where_op='AND', geo_col=False, from_date=None, to_date=None, date_col=None, rename_cols=None, stmt=None, export_path=None)¶

Function to import data from an MSSQL database.

Parameters:	server (str) – The server name. e.g.: ‘SQL2012PROD03’ database (str) – The specific database within the server. e.g.: ‘LowFlows’ table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’ col_names (list of str) – The column names that should be retrieved. e.g.: [‘SiteID’, ‘BandNo’, ‘RecordNo’] where_col (str or dict) – Must be either a string with an associated where_val list or a dictionary of strings to lists.’. e.g.: ‘SnapshotType’ or {‘SnapshotType’: [‘value1’, ‘value2’]} where_val (list) – The WHERE query values for the where_col. e.g. [‘value1’, ‘value2’] where_op (str) – If where_col is a dictionary and there are more than one key, then the operator that connects the where statements must be either ‘AND’ or ‘OR’. geo_col (bool) – Is there a geometry column in the table?. from_date (str) – The start date in the form ‘2010-01-01’. to_date (str) – The end date in the form ‘2010-01-01’. date_col (str) – The SQL table column that contains the dates. rename_cols (list of str) – List of strings to rename the resulting DataFrame column names. stmt (str) – Custom SQL statement to be directly passed to the database. This will ignore all prior arguments except server and database. export_path (str) – The export path for a csv file if desired. If None, then nothing is exported.
Returns:
Return type:	DataFrame

pdsql.mssql.rd_sql_ts(server, database, table, groupby_cols, date_col, values_cols, resample_code=None, period=1, fun='mean', val_round=3, where_col=None, where_val=None, where_op='AND', from_date=None, to_date=None, min_count=None, export_path=None)¶

Function to specifically read and possibly aggregate time series data stored in MSSQL tables.

Parameters:	server (str) – The server name. e.g.: ‘SQL2012PROD03’ database (str) – The specific database within the server. e.g.: ‘LowFlows’ table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’ groupby_cols (str or list of str) – The columns in the SQL table to grouped and returned with the time series data. date_col (str) – The date column in the table. values_cols (str or list of str) – The column(s) of the value(s) that should be resampled. resample_code (str or None) – The Pandas time series resampling code. e.g. ‘D’ for day, ‘W’ for week, ‘M’ for month, etc. period (int) – The number of resampling periods. e.g. period = 2 and resample = ‘D’ would be to resample the values over a 2 day period. fun (str) – The resampling function. i.e. mean, sum, count, min, or max. No median yet… val_round (int) – The number of decimals to round the values. where_col (str or dict) – Must be either a string with an associated where_val list or a dictionary of strings to lists.’. e.g.: ‘SnapshotType’ or {‘SnapshotType’: [‘value1’, ‘value2’]} where_val (list) – The WHERE query values for the where_col. e.g. [‘value1’, ‘value2’] where_op (str) – If where_col is a dictionary and there are more than one key, then the operator that connects the where statements must be either ‘AND’ or ‘OR’. from_date (str) – The start date in the form ‘2010-01-01’. to_date (str) – The end date in the form ‘2010-01-01’. min_count (int) – The minimum number of values required to return groupby_cols. Only works when groupby_cols and vlue_cols are str. export_path (str) – The export path for a csv file if desired. If None, then nothing is exported.
Returns:	Pandas DataFrame with MultiIndex of groupby_cols and date_col
Return type:	DataFrame

pdsql.mssql.rd_sql_geo(server, database, table, col_stmt, where_lst=None)¶

Function to extract the geometry and coordinate system from an SQL geometry field. Returns a shapely geometry object and a proj4 str.

Parameters:

server (str) – The server name. e.g.: ‘SQL2012PROD03’
database (str) – The specific database within the server. e.g.: ‘LowFlows’
table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’
where_lst (list) – A list of where statements to be passed and added to the final SQL statement.

Returns:

list of shapely geometry objects – The main output is a list of shapely geometry objects for all queried rows of the SQL table.
str – The second output is a proj4 str of the projection system.

Creating tables¶

pdsql.mssql.create_mssql_table(server, database, table, dtype_dict, primary_keys=None, foreign_keys=None, foreign_table=None, drop_table=False)¶

Function to create a table in an mssql database.

Parameters:	server (str) – The server name. e.g.: ‘SQL2012PROD03’ database (str) – The specific database within the server. e.g.: ‘LowFlows’ table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’ dtype_dict (dict of str) – Dictionary of df columns to the associated sql data type. Examples below. primary_keys (str or list of str) – Index columns to define uniqueness in the data structure. foreign_keys (str or list of str) – Columns to link to another table in the same database. foreign_table (str) – The table in the same database with the identical foreign key(s). drop_table (bool) – If the table already exists, should it be dropped?
Returns:
Return type:	None

Writing to tables¶

pdsql.mssql.to_mssql(df, server, database, table, index=False, dtype=None)¶

Function to append a DataFrame onto an existing mssql table.

Parameters:	df (DataFrame) – DataFrame to be saved. The DataFrame column/index names must match those on the mssql table exactly. server (str) – The server name. e.g.: ‘SQL2012PROD03’ database (str) – The specific database within the server. e.g.: ‘LowFlows’ table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’ index (bool) – Should the index be added as a column? dtype (dict of column name to SQL type, default None) – Optional specifying the datatype for columns. The SQL type should be an SQLAlchemy type.
Returns:
Return type:	None

Updating tables¶

pdsql.mssql.update_mssql_table_rows(df, server, database, table, on, append=True)¶

Function to selectively delete rows from an mssql table.

Parameters:	df (DataFrame) – DataFrame with data to be overwritten in SQL table. server (str) – The server name. e.g.: ‘SQL2012PROD03’ database (str) – The specific database within the server. e.g.: ‘LowFlows’ table (str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’ on (str or list) – The columns for the df and sql table to join to to make the update. stmt (str) – SQL delete statement. Will override everything except server and database.
Returns:
Return type:	None

Deleting rows in tables¶

pdsql.mssql.del_mssql_table_rows(server, database, table=None, pk_df=None, stmt=None, **kwargs)¶

Function to selectively delete rows from an mssql table.

Parameters:	server (str) – The server name. e.g.: ‘SQL2012PROD03’ database (str) – The specific database within the server. e.g.: ‘LowFlows’ table (str or None if stmt is a str) – The specific table within the database. e.g.: ‘LowFlowSiteRestrictionDaily’ pk_df (DataFrame) – A DataFrame of the primary keys of the table for the rows that should be removed. Will override anything in the kwargs. stmt (str) – SQL delete statement. Will override everything except server and database. **kwargs – Any kwargs that can be passed to sql_where_stmts.
Returns:
Return type:	None

Notes

Using the pk_df is the only way to ensure that specific rows will be deleted from composite keys. The column data types and names of pk_df must match the equivelant columns in the SQL table. The procedure creates a temporary table from the pk_df then deletes the rows in the target table based on the temp table. Then finally deletes the temp table.

Helper functions¶

pdsql.mssql.sql_where_stmts(where_col=None, where_val=None, where_op='AND', from_date=None, to_date=None, date_col=None)¶

Function to take various input parameters and convert them to a list of where statements for SQL.

Parameters:	where_col (str or dict) – Either a str with an associated where_val list or a dictionary of string keys to list values. If a str, it should represent the table column associated with the ‘where’ condition. where_val (list or None) – If where_col is a str, then where_val must be a list of associated condition values. where_op (str of either 'AND' or 'OR') – The binding operator for the where conditions. from_date (str or None) – The start date in the form ‘2010-01-01’. to_date (str or None) – The end date in the form ‘2010-01-01’. date_col (str or None) – The SQL table column that contains the dates.
Returns:	Returns a list of str where conditions to be passed to an SQL execution function. The function needs to bind it with ” where ” + ” and “.join(where_lst)
Return type:	list of str or None

pdsql.mssql.sql_ts_agg_stmt(table, groupby_cols, date_col, values_cols, resample_code, period=1, fun='mean', val_round=3, where_lst=None)¶

Function to create an SQL statement to pass to an SQL driver to resample a time series table.

Parameters:	table (str) – The SQL table name. groupby_cols (str or list of str) – The columns in the SQL table to grouped and returned with the time series data. date_col (str) – The date column in the table. values_cols (str or list of str) – The column(s) of the value(s) that should be resampled. resample_code (str) – The Pandas time series resampling code. e.g. ‘D’ for day, ‘W’ for week, ‘M’ for month, etc. period (int) – The number of resampling periods. e.g. period = 2 and resample = ‘D’ would be to resample the values over a 2 day period. fun (str) – The resampling function. i.e. mean, sum, count, min, or max. No median yet… val_round (int) – The number of decimals to round the values. where_lst (list or None) – A list of where statements to be passed and added to the final SQL statement.
Returns:	A full SQL statement that can be passed directly to an SQL connection driver like pymssql through pandas read_sql function.
Return type:	str