dvha.tools¶

Name Prediction¶

Implementation of rapidfuzz for ROI name prediction

class dvha.tools.name_prediction.ROINamePredictor(roi_map, weight_simple=1.0, weight_partial=0.6, threshold=0.0)[source]¶

Bases: object

ROI Name Prediction class object

Parameters

roi_map (DatabaseROIs) – ROI map object
weight_simple (float, optional) – Scaling factor for fuzz.ratio for combined score
weight_partial (float, optional) – Scaling factor for fuzz.partial_ratio for combined score
threshold (float, optional) – Set a minimum score for a prediction to be returned

static combine_scores(score_1, score_2, mode='average')[source]¶

Get a combined fuzz score

Parameters

score_1 (float) – A fuzz ratio score
score_2 (float) – Another fuzz ratio score
mode (str, optional) – Method for combining score_1 and score_2. Options are ‘geom_mean’, ‘product’, and ‘average’

Returns

Combined score

Return type

float

get_best_roi_match(roi, physician, return_score=False)[source]¶

Check all ROI variations for best match, return physician ROI

Parameters

roi (str) – An ROI name
physician (str) – Physician as stored in ROI Map
return_score (bool, optional) – If true, return a tuple: prediction, score

Returns

The physician ROI associated with the ROI variation that is has the highest combined fuzz score for roi

Return type

str

get_combined_fuzz_score(a, b, mode='geom_mean')[source]¶

Return combine_scores for strings a and b

Parameters

a (str) – Any string
b (str) – Another string for comparison
mode (str, optional) – Method for combining fuzz.ratio and fuzz.partial_ratio. Options are ‘geom_mean’, ‘product’, and ‘average’

Returns

Results from combine_scores for a and b

Return type

float

get_combined_fuzz_scores(string, list_of_strings)[source]¶

Compare a string against many

Parameters

string (str) – A string to compare against each string in list_of_strings
list_of_strings (list) – A list of strings for comparison

Returns

A list of tuples (score, string) in order of score

Return type

list

MLC Analyzer¶

The code for DVHA’s MLC analysis has been exported to a stand-alone library.

GitHub: mlca.dvhanalytics.com
Docs: dvha-mlca.readthedocs.io

ROI Formatter¶

Formatting tools for roi data (dicompyler, Shapely, DVHA)

dvha.tools.roi_formatter.dicompyler_roi_coord_to_db_string(coord)[source]¶

Parameters: coord – dicompyler structure coordinates from GetStructureCoordinates()
Returns: roi string representation of an roi as formatted in the SQL database (roi_coord_string)
Return type: str

dvha.tools.roi_formatter.dicompyler_roi_to_sets_of_points(coord)[source]¶

Parameters: coord – dicompyler structure coordinates from GetStructureCoordinates()
Returns: a “sets of points” formatted dictionary
Return type: dict

dvha.tools.roi_formatter.get_contour_sample(polygon, dth_res=0.5) → tuple[source]¶

Get 3D points uniformly distributed in the perimeter space

Parameters

polygon (Polygon) – shapely object
dth_res (int, float) – Sampling distance in perimeter space (mm)

Returns

np.ndarray – x coordinates of sampled contour
np.ndarray – y coordinates of sampled contour

dvha.tools.roi_formatter.get_planes_from_string(roi_coord_string)[source]¶

Parameters: roi_coord_string (string: str) – roi string representation of an roi as formatted in the SQL database
Returns: a “sets of points” formatted dictionary
Return type: dict

dvha.tools.roi_formatter.get_roi_coordinates_from_planes(sets_of_points)[source]¶

Parameters: sets_of_points (dict) – a “sets of points” formatted dictionary
Returns: a list of numpy arrays, each array is the x, y, z coordinates of the given point
Return type: list

dvha.tools.roi_formatter.get_roi_coordinates_from_shapely(shapely_dict, sample_res=None)[source]¶

Parameters

shapely_dict (dict) – output from get_shapely_from_sets_of_points
sample_res (int, float) – If set to a numeric value, sample each polygon with this resolution (mm)

Returns

a list of numpy arrays, each array is the x, y, z coordinates of the given point

Return type

list

dvha.tools.roi_formatter.get_roi_coordinates_from_string(roi_coord_string)[source]¶

Parameters: roi_coord_string (string: str) – roi string representation of an roi as formatted in the SQL database
Returns: a list of numpy arrays, each array is the x, y, z coordinates of the given point
Return type: list

dvha.tools.roi_formatter.get_shapely_from_sets_of_points(sets_of_points, tolerance=None, preserve_topology=True)[source]¶

Parameters

sets_of_points (dict) – a “sets of points” formatted dictionary
tolerance (bool, optional) – If set to a number, will use Shapely’s simplify on each contour with the given tolerance
preserve_topology (bool, optional) – Passed to Shapely’s simplify if simplify_tolerance is set

Returns

roi_slices which is a dictionary of lists of z, thickness, and a Shapely Polygon class object

Return type

dict

dvha.tools.roi_formatter.points_to_shapely_polygon(sets_of_points)[source]¶

Parameters: sets_of_points (dict) – a “sets of points” formatted dictionary
Returns: a composite polygon as a shapely object (either polygon or multipolygon)
Return type: type

ROI Geometry¶

Tools for geometric calculations

dvha.tools.roi_geometry.centroid(roi)[source]¶

Parameters: roi – a “sets of points” formatted dictionary
Returns: centroid or the roi in x, y, z dicom coordinates (mm)
Return type: list

dvha.tools.roi_geometry.cross_section(roi)[source]¶

Calculate the cross section of a given roi

Parameters: roi (dict) – a “sets of points” formatted dictionary
Returns: max and median cross-sectional area of all slices in cm^2
Return type: dict

dvha.tools.roi_geometry.dth(min_distances)[source]¶

Parameters: min_distances – the output from min_distances_to_target
Returns: histogram of distances in 1mm bin widths
Return type: numpy.array

dvha.tools.roi_geometry.is_point_inside_roi(point, roi)[source]¶

Check if a point is within an ROI

Parameters

point (list) – x, y, z
roi (dict) – roi: a “sets of points” formatted dictionary

Returns

Whether or not the point is within the roi

Return type

bool

dvha.tools.roi_geometry.min_distances_to_target(oar_coordinates, target_coordinates, factors=None)[source]¶

Calculate all OAR-point-to-Target-point euclidean distances

Parameters

oar_coordinates (list) – numpy arrays of 3D points defining the surface of the OAR
target_coordinates (list) – numpy arrays of 3D points defining the surface of the PTV
factors – (Default value = None)

Returns

min_distances: all minimum distances (cm) of OAR-point-to-Target-point pairs

Return type

list

dvha.tools.roi_geometry.overlap_volume(oar, tv)[source]¶

Calculate the overlap volume of two rois

Parameters

oar (dict) – organ-at-risk as a “sets of points” formatted dictionary
tv (dict) – treatment volume as a “sets of points” formatted dictionary

dvha.tools.roi_geometry.planes_to_voxel_centers(planes, res=1, max_progress=None)[source]¶

Convert a sets of points into a 3D voxel centers within ROI

Parameters

planes (dict) – a “sets of points” dictionary representing the union of the rois
res (float) – resolution factor for voxels
max_progress (float) – if not None, set the maximum progress bar value (with update_dvh_progress)

Returns

A list of 3D points inside the ROI defined by planes

Return type

list

dvha.tools.roi_geometry.process_dth_string(dth_string)[source]¶

Convert a dth_string from the database into data and bins DVHA stores 1-mm binned surface DTHs with an odd number of bins, middle bin is 0.

Parameters: dth_string – a value from the dth_string column
Returns: counts, bin positions (mm)
Return type: type

dvha.tools.roi_geometry.spread(roi)[source]¶

Parameters: roi – a “sets of points” formatted dictionary
Returns: x, y, z dimensions of a rectangular prism encompassing roi
Return type: list

dvha.tools.roi_geometry.surface_area(coord, coord_type='dicompyler')[source]¶

Calculate the surface of a given roi

Parameters

coord – dicompyler structure coordinates from GetStructureCoordinates() or a sets_of_points dictionary
coord_type – either ‘dicompyler’ or ‘sets_of_points’ (Default value = ‘dicompyler’)

Returns

surface_area in cm^2

Return type

float

dvha.tools.roi_geometry.union(rois)[source]¶

Calculate the geometric union of the provided rois

Parameters: rois (list) – rois formatted as “sets of points” dictionaries
Returns: a “sets of points” dictionary representing the union of the rois
Return type: dict

dvha.tools.roi_geometry.volume(roi)[source]¶

Parameters: roi – a “sets of points” formatted dictionary
Returns: volume in cm^3 of roi
Return type: float

Stats¶

The code from DVHA’s statistical modules have been exported to a stand-alone library, however, DVHA does not use this internally (yet).

GitHub: stats.dvhanalytics.com
Docs: dvha-stats.readthedocs.io

Take numerical data from main app and convert to a format suitable for statistical analysis in Regression and Control Chart tabs

class dvha.tools.stats.MultiVariableRegression(X, y, saved_reg=None)[source]¶

Bases: object

Perform a multi-variable regression using sklearn

Parameters

X (np.array) – independent data
y (list) – dependent data

class dvha.tools.stats.StatsData(dvhs, table_data, group=1)[source]¶

Bases: object

Class used to to collect data for Regression and Control Chart This process is different than for Time Series since regressions require all variables to be the same length

Parameters

dvhs (DVH) – data from DVH query
table_data (dict) – table data other than from DVHs. Has keys of ‘Plans’, ‘Rxs’, ‘Beams’ with values of QuerySQL objects

add_variable(variable, values, units='')[source]¶

Add a new variable to StatsData.data, will not over-write

Parameters

variable (str) – variable name to be used as a key and plot title
values (list) – values to be stored for variable
units (str, optional) – Define units for display on plot

del_variable(variable)[source]¶

Delete a variable from StatsData.data

Parameters: variable (str) – variable name

get_X_and_y(y_variable, x_variables, include_patient_info=False)[source]¶

Collect data for input into multi-variable regression

Parameters

y_variable (str) – dependent variable
x_variables (list) – independent variables
include_patient_info (bool) – If True, return mrn, uid, dates with X and y

Returns

X, y or X, y, mrn, uid, dates

Return type

type

get_axis_title(variable)[source]¶

Get the plot axis title for variable

Parameters: variable (str) – A key of StatsData.data
Returns: variable with units if stored
Return type: str

get_beam_indices(uid)[source]¶

Get the indices of the Beams table with uid

Parameters: uid (str) – StudyInstanceUID as stored in the SQL database
Returns: Beams table indices that match uid
Return type: list

get_bokeh_data(x, y)[source]¶

Get data in a format compatible with bokeh’s ColumnDataSource.data

Parameters

x (str) – x-variable name
y (str) – y-variable name

Returns

x and y data

Return type

dict

get_corr_matrix_data(options, included_vars=None, extra_vars=None)[source]¶

Get a Pearson-R correlation matrix

Parameters

options (dvha.options.Options) – DVHA options class object. Used to get colors.
included_vars (list, optional) – variables to be included in matrix
extra_vars (list, optional) – variables to be excluded from the matrix

Returns

The dictionary has keys of ‘source_data’, ‘x_factors’ and ‘y_factors’. source_data is used for bokeh plotting, the factors are for axis tick labels. The 2nd parameter of the tuple is a list of removed mrns

Return type

tuple (dict, list)

get_plan_index(uid)[source]¶

Get the index of uid from the Plans table

Parameters: uid (str) – StudyInstanceUID as stored in the SQL database
Returns: Plans table index for uid
Return type: int

property mrns¶

MRNs from DVH object

Returns: DVH.mrn
Return type: list

set_variable_data(variable, data, units=None)[source]¶

Replace the data for the given variable in StatsData.data

Parameters

variable (str) – variable name
data (list) – new data
units (str, optional) – Define units for display on plot

set_variable_units(variable, units)[source]¶

Set the units for the given variable in StatsData.data

Parameters

variable (str) – variable name
units (str) – units for display on plot

property sim_study_dates¶

Simulation dates from Plans table

Returns: Simulation dates
Return type: list

property uids¶

StudyInstanceUIDs from DVH object

Returns: DVH.study_instance_uid
Return type: list

update_endpoints_and_radbio()[source]¶: Update endpoint and radbio data in self.data. This function is needed since all of these values are calculated after a query and user may change these values.

property variables¶

Get variable names for plotting

Returns: keys of StatsData.data sans ‘Simulation Date’
Return type: list

property vars_with_nan_values¶

Find variable names that contain non-numerical values

Returns: Variable names that cannot be converted to float
Return type: list

dvha.tools.stats.get_control_limits(y, std_devs=3)[source]¶

Calculate control limits for Control Chart

Parameters

y (list) – data
std_devs (int or float) – values greater than std_devs away are out-of-control (Default value = 3)

Returns

center line, upper control limit, and lower control limit

Return type

type

dvha.tools.stats.get_index_of_nan(numpy_array)[source]¶

Find indices of np.nan values

Parameters: numpy_array (np.ndarray) – A numpy array
Returns: indices of numpy_array that are np.nan
Return type: list

dvha.tools.stats.get_p_values(X, y, predictions, params)[source]¶

Get p-values using sklearn based on https://stackoverflow.com/questions/27928275/find-p-value-significance-in-scikit-learn-linearregression

Parameters

X (np.ndarray) – independent data
y (np.ndarray) – dependent data
predictions – output from linear_model.LinearRegression.predict
params – np.array([y_incercept, slope])

Returns

p-values

Return type

list

dvha.tools.stats.str_starts_with_any_in_list(string_a, string_list)[source]¶

Check if string_a starts with any string the provided list of strings

Parameters

string_a (str) – Any string
string_list (list) – A list of strings

Returns

True if any string_a starts with any string in string_list

Return type

bool

dvha.tools.stats.sync_variables_in_stats_data_objects(stats_data_1, stats_data_2)[source]¶

Ensure both stats_data objects have the same variables

Parameters

stats_data_1 (StatsData) – A StatsData object (e.g., Group 1)
stats_data_2 (StatsData) – Another StatsData object (e.g., Group 2)