dvha.tools

Name Prediction

Implementation of rapidfuzz for ROI name prediction

class dvha.tools.name_prediction.ROINamePredictor(roi_map, weight_simple=1.0, weight_partial=0.6, threshold=0.0)[source]

Bases: object

ROI Name Prediction class object

Parameters
  • roi_map (DatabaseROIs) – ROI map object

  • weight_simple (float, optional) – Scaling factor for fuzz.ratio for combined score

  • weight_partial (float, optional) – Scaling factor for fuzz.partial_ratio for combined score

  • threshold (float, optional) – Set a minimum score for a prediction to be returned

static combine_scores(score_1, score_2, mode='average')[source]

Get a combined fuzz score

Parameters
  • score_1 (float) – A fuzz ratio score

  • score_2 (float) – Another fuzz ratio score

  • mode (str, optional) – Method for combining score_1 and score_2. Options are ‘geom_mean’, ‘product’, and ‘average’

Returns

Combined score

Return type

float

get_best_roi_match(roi, physician, return_score=False)[source]

Check all ROI variations for best match, return physician ROI

Parameters
  • roi (str) – An ROI name

  • physician (str) – Physician as stored in ROI Map

  • return_score (bool, optional) – If true, return a tuple: prediction, score

Returns

The physician ROI associated with the ROI variation that is has the highest combined fuzz score for roi

Return type

str

get_combined_fuzz_score(a, b, mode='geom_mean')[source]

Return combine_scores for strings a and b

Parameters
  • a (str) – Any string

  • b (str) – Another string for comparison

  • mode (str, optional) – Method for combining fuzz.ratio and fuzz.partial_ratio. Options are ‘geom_mean’, ‘product’, and ‘average’

Returns

Results from combine_scores for a and b

Return type

float

get_combined_fuzz_scores(string, list_of_strings)[source]

Compare a string against many

Parameters
  • string (str) – A string to compare against each string in list_of_strings

  • list_of_strings (list) – A list of strings for comparison

Returns

A list of tuples (score, string) in order of score

Return type

list

MLC Analyzer

The code for DVHA’s MLC analysis has been exported to a stand-alone library.

ROI Formatter

Formatting tools for roi data (dicompyler, Shapely, DVHA)

dvha.tools.roi_formatter.dicompyler_roi_coord_to_db_string(coord)[source]
Parameters

coord – dicompyler structure coordinates from GetStructureCoordinates()

Returns

roi string representation of an roi as formatted in the SQL database (roi_coord_string)

Return type

str

dvha.tools.roi_formatter.dicompyler_roi_to_sets_of_points(coord)[source]
Parameters

coord – dicompyler structure coordinates from GetStructureCoordinates()

Returns

a “sets of points” formatted dictionary

Return type

dict

dvha.tools.roi_formatter.get_contour_sample(polygon, dth_res=0.5)tuple[source]

Get 3D points uniformly distributed in the perimeter space

Parameters
  • polygon (Polygon) – shapely object

  • dth_res (int, float) – Sampling distance in perimeter space (mm)

Returns

  • np.ndarray – x coordinates of sampled contour

  • np.ndarray – y coordinates of sampled contour

dvha.tools.roi_formatter.get_planes_from_string(roi_coord_string)[source]
Parameters

roi_coord_string (string: str) – roi string representation of an roi as formatted in the SQL database

Returns

a “sets of points” formatted dictionary

Return type

dict

dvha.tools.roi_formatter.get_roi_coordinates_from_planes(sets_of_points)[source]
Parameters

sets_of_points (dict) – a “sets of points” formatted dictionary

Returns

a list of numpy arrays, each array is the x, y, z coordinates of the given point

Return type

list

dvha.tools.roi_formatter.get_roi_coordinates_from_shapely(shapely_dict, sample_res=None)[source]
Parameters
  • shapely_dict (dict) – output from get_shapely_from_sets_of_points

  • sample_res (int, float) – If set to a numeric value, sample each polygon with this resolution (mm)

Returns

a list of numpy arrays, each array is the x, y, z coordinates of the given point

Return type

list

dvha.tools.roi_formatter.get_roi_coordinates_from_string(roi_coord_string)[source]
Parameters

roi_coord_string (string: str) – roi string representation of an roi as formatted in the SQL database

Returns

a list of numpy arrays, each array is the x, y, z coordinates of the given point

Return type

list

dvha.tools.roi_formatter.get_shapely_from_sets_of_points(sets_of_points, tolerance=None, preserve_topology=True)[source]
Parameters
  • sets_of_points (dict) – a “sets of points” formatted dictionary

  • tolerance (bool, optional) – If set to a number, will use Shapely’s simplify on each contour with the given tolerance

  • preserve_topology (bool, optional) – Passed to Shapely’s simplify if simplify_tolerance is set

Returns

roi_slices which is a dictionary of lists of z, thickness, and a Shapely Polygon class object

Return type

dict

dvha.tools.roi_formatter.points_to_shapely_polygon(sets_of_points)[source]
Parameters

sets_of_points (dict) – a “sets of points” formatted dictionary

Returns

a composite polygon as a shapely object (either polygon or multipolygon)

Return type

type

ROI Geometry

Tools for geometric calculations

dvha.tools.roi_geometry.centroid(roi)[source]
Parameters

roi – a “sets of points” formatted dictionary

Returns

centroid or the roi in x, y, z dicom coordinates (mm)

Return type

list

dvha.tools.roi_geometry.cross_section(roi)[source]

Calculate the cross section of a given roi

Parameters

roi (dict) – a “sets of points” formatted dictionary

Returns

max and median cross-sectional area of all slices in cm^2

Return type

dict

dvha.tools.roi_geometry.dth(min_distances)[source]
Parameters

min_distances – the output from min_distances_to_target

Returns

histogram of distances in 1mm bin widths

Return type

numpy.array

dvha.tools.roi_geometry.is_point_inside_roi(point, roi)[source]

Check if a point is within an ROI

Parameters
  • point (list) – x, y, z

  • roi (dict) – roi: a “sets of points” formatted dictionary

Returns

Whether or not the point is within the roi

Return type

bool

dvha.tools.roi_geometry.min_distances_to_target(oar_coordinates, target_coordinates, factors=None)[source]

Calculate all OAR-point-to-Target-point euclidean distances

Parameters
  • oar_coordinates (list) – numpy arrays of 3D points defining the surface of the OAR

  • target_coordinates (list) – numpy arrays of 3D points defining the surface of the PTV

  • factors – (Default value = None)

Returns

min_distances: all minimum distances (cm) of OAR-point-to-Target-point pairs

Return type

list

dvha.tools.roi_geometry.overlap_volume(oar, tv)[source]

Calculate the overlap volume of two rois

Parameters
  • oar (dict) – organ-at-risk as a “sets of points” formatted dictionary

  • tv (dict) – treatment volume as a “sets of points” formatted dictionary

dvha.tools.roi_geometry.planes_to_voxel_centers(planes, res=1, max_progress=None)[source]

Convert a sets of points into a 3D voxel centers within ROI

Parameters
  • planes (dict) – a “sets of points” dictionary representing the union of the rois

  • res (float) – resolution factor for voxels

  • max_progress (float) – if not None, set the maximum progress bar value (with update_dvh_progress)

Returns

A list of 3D points inside the ROI defined by planes

Return type

list

dvha.tools.roi_geometry.process_dth_string(dth_string)[source]

Convert a dth_string from the database into data and bins DVHA stores 1-mm binned surface DTHs with an odd number of bins, middle bin is 0.

Parameters

dth_string – a value from the dth_string column

Returns

counts, bin positions (mm)

Return type

type

dvha.tools.roi_geometry.spread(roi)[source]
Parameters

roi – a “sets of points” formatted dictionary

Returns

x, y, z dimensions of a rectangular prism encompassing roi

Return type

list

dvha.tools.roi_geometry.surface_area(coord, coord_type='dicompyler')[source]

Calculate the surface of a given roi

Parameters
  • coord – dicompyler structure coordinates from GetStructureCoordinates() or a sets_of_points dictionary

  • coord_type – either ‘dicompyler’ or ‘sets_of_points’ (Default value = ‘dicompyler’)

Returns

surface_area in cm^2

Return type

float

dvha.tools.roi_geometry.union(rois)[source]

Calculate the geometric union of the provided rois

Parameters

rois (list) – rois formatted as “sets of points” dictionaries

Returns

a “sets of points” dictionary representing the union of the rois

Return type

dict

dvha.tools.roi_geometry.volume(roi)[source]
Parameters

roi – a “sets of points” formatted dictionary

Returns

volume in cm^3 of roi

Return type

float

Stats

The code from DVHA’s statistical modules have been exported to a stand-alone library, however, DVHA does not use this internally (yet).

Take numerical data from main app and convert to a format suitable for statistical analysis in Regression and Control Chart tabs

class dvha.tools.stats.MultiVariableRegression(X, y, saved_reg=None)[source]

Bases: object

Perform a multi-variable regression using sklearn

Parameters
  • X (np.array) – independent data

  • y (list) – dependent data

class dvha.tools.stats.StatsData(dvhs, table_data, group=1)[source]

Bases: object

Class used to to collect data for Regression and Control Chart This process is different than for Time Series since regressions require all variables to be the same length

Parameters
  • dvhs (DVH) – data from DVH query

  • table_data (dict) – table data other than from DVHs. Has keys of ‘Plans’, ‘Rxs’, ‘Beams’ with values of QuerySQL objects

add_variable(variable, values, units='')[source]

Add a new variable to StatsData.data, will not over-write

Parameters
  • variable (str) – variable name to be used as a key and plot title

  • values (list) – values to be stored for variable

  • units (str, optional) – Define units for display on plot

del_variable(variable)[source]

Delete a variable from StatsData.data

Parameters

variable (str) – variable name

get_X_and_y(y_variable, x_variables, include_patient_info=False)[source]

Collect data for input into multi-variable regression

Parameters
  • y_variable (str) – dependent variable

  • x_variables (list) – independent variables

  • include_patient_info (bool) – If True, return mrn, uid, dates with X and y

Returns

X, y or X, y, mrn, uid, dates

Return type

type

get_axis_title(variable)[source]

Get the plot axis title for variable

Parameters

variable (str) – A key of StatsData.data

Returns

variable with units if stored

Return type

str

get_beam_indices(uid)[source]

Get the indices of the Beams table with uid

Parameters

uid (str) – StudyInstanceUID as stored in the SQL database

Returns

Beams table indices that match uid

Return type

list

get_bokeh_data(x, y)[source]

Get data in a format compatible with bokeh’s ColumnDataSource.data

Parameters
  • x (str) – x-variable name

  • y (str) – y-variable name

Returns

x and y data

Return type

dict

get_corr_matrix_data(options, included_vars=None, extra_vars=None)[source]

Get a Pearson-R correlation matrix

Parameters
  • options (dvha.options.Options) – DVHA options class object. Used to get colors.

  • included_vars (list, optional) – variables to be included in matrix

  • extra_vars (list, optional) – variables to be excluded from the matrix

Returns

The dictionary has keys of ‘source_data’, ‘x_factors’ and ‘y_factors’. source_data is used for bokeh plotting, the factors are for axis tick labels. The 2nd parameter of the tuple is a list of removed mrns

Return type

tuple (dict, list)

get_plan_index(uid)[source]

Get the index of uid from the Plans table

Parameters

uid (str) – StudyInstanceUID as stored in the SQL database

Returns

Plans table index for uid

Return type

int

property mrns

MRNs from DVH object

Returns

DVH.mrn

Return type

list

set_variable_data(variable, data, units=None)[source]

Replace the data for the given variable in StatsData.data

Parameters
  • variable (str) – variable name

  • data (list) – new data

  • units (str, optional) – Define units for display on plot

set_variable_units(variable, units)[source]

Set the units for the given variable in StatsData.data

Parameters
  • variable (str) – variable name

  • units (str) – units for display on plot

property sim_study_dates

Simulation dates from Plans table

Returns

Simulation dates

Return type

list

property uids

StudyInstanceUIDs from DVH object

Returns

DVH.study_instance_uid

Return type

list

update_endpoints_and_radbio()[source]

Update endpoint and radbio data in self.data. This function is needed since all of these values are calculated after a query and user may change these values.

property variables

Get variable names for plotting

Returns

keys of StatsData.data sans ‘Simulation Date’

Return type

list

property vars_with_nan_values

Find variable names that contain non-numerical values

Returns

Variable names that cannot be converted to float

Return type

list

dvha.tools.stats.get_control_limits(y, std_devs=3)[source]

Calculate control limits for Control Chart

Parameters
  • y (list) – data

  • std_devs (int or float) – values greater than std_devs away are out-of-control (Default value = 3)

Returns

center line, upper control limit, and lower control limit

Return type

type

dvha.tools.stats.get_index_of_nan(numpy_array)[source]

Find indices of np.nan values

Parameters

numpy_array (np.ndarray) – A numpy array

Returns

indices of numpy_array that are np.nan

Return type

list

dvha.tools.stats.get_p_values(X, y, predictions, params)[source]

Get p-values using sklearn based on https://stackoverflow.com/questions/27928275/find-p-value-significance-in-scikit-learn-linearregression

Parameters
  • X (np.ndarray) – independent data

  • y (np.ndarray) – dependent data

  • predictions – output from linear_model.LinearRegression.predict

  • params – np.array([y_incercept, slope])

Returns

p-values

Return type

list

dvha.tools.stats.str_starts_with_any_in_list(string_a, string_list)[source]

Check if string_a starts with any string the provided list of strings

Parameters
  • string_a (str) – Any string

  • string_list (list) – A list of strings

Returns

True if any string_a starts with any string in string_list

Return type

bool

dvha.tools.stats.sync_variables_in_stats_data_objects(stats_data_1, stats_data_2)[source]

Ensure both stats_data objects have the same variables

Parameters
  • stats_data_1 (StatsData) – A StatsData object (e.g., Group 1)

  • stats_data_2 (StatsData) – Another StatsData object (e.g., Group 2)