dvha.tools¶
Name Prediction¶
Implementation of rapidfuzz for ROI name prediction
- class dvha.tools.name_prediction.ROINamePredictor(roi_map, weight_simple=1.0, weight_partial=0.6, threshold=0.0)[source]¶
Bases:
object
ROI Name Prediction class object
- Parameters
roi_map (DatabaseROIs) – ROI map object
weight_simple (float, optional) – Scaling factor for fuzz.ratio for combined score
weight_partial (float, optional) – Scaling factor for fuzz.partial_ratio for combined score
threshold (float, optional) – Set a minimum score for a prediction to be returned
- static combine_scores(score_1, score_2, mode='average')[source]¶
Get a combined fuzz score
- Parameters
score_1 (float) – A fuzz ratio score
score_2 (float) – Another fuzz ratio score
mode (str, optional) – Method for combining
score_1
andscore_2
. Options are ‘geom_mean’, ‘product’, and ‘average’
- Returns
Combined score
- Return type
float
- get_best_roi_match(roi, physician, return_score=False)[source]¶
Check all ROI variations for best match, return physician ROI
- Parameters
roi (str) – An ROI name
physician (str) – Physician as stored in ROI Map
return_score (bool, optional) – If true, return a tuple: prediction, score
- Returns
The physician ROI associated with the ROI variation that is has the highest combined fuzz score for
roi
- Return type
str
- get_combined_fuzz_score(a, b, mode='geom_mean')[source]¶
Return
combine_scores
for stringsa
andb
- Parameters
a (str) – Any string
b (str) – Another string for comparison
mode (str, optional) – Method for combining
fuzz.ratio
andfuzz.partial_ratio
. Options are ‘geom_mean’, ‘product’, and ‘average’
- Returns
Results from
combine_scores
fora
andb
- Return type
float
- get_combined_fuzz_scores(string, list_of_strings)[source]¶
Compare a string against many
- Parameters
string (str) – A string to compare against each string in
list_of_strings
list_of_strings (list) – A list of strings for comparison
- Returns
A list of tuples (score, string) in order of score
- Return type
list
MLC Analyzer¶
The code for DVHA’s MLC analysis has been exported to a stand-alone library.
GitHub: mlca.dvhanalytics.com
Docs: dvha-mlca.readthedocs.io
ROI Formatter¶
Formatting tools for roi data (dicompyler, Shapely, DVHA)
- dvha.tools.roi_formatter.dicompyler_roi_coord_to_db_string(coord)[source]¶
- Parameters
coord – dicompyler structure coordinates from GetStructureCoordinates()
- Returns
roi string representation of an roi as formatted in the SQL database (roi_coord_string)
- Return type
str
- dvha.tools.roi_formatter.dicompyler_roi_to_sets_of_points(coord)[source]¶
- Parameters
coord – dicompyler structure coordinates from GetStructureCoordinates()
- Returns
a “sets of points” formatted dictionary
- Return type
dict
- dvha.tools.roi_formatter.get_contour_sample(polygon, dth_res=0.5) → tuple[source]¶
Get 3D points uniformly distributed in the perimeter space
- Parameters
polygon (Polygon) – shapely object
dth_res (int, float) – Sampling distance in perimeter space (mm)
- Returns
np.ndarray – x coordinates of sampled contour
np.ndarray – y coordinates of sampled contour
- dvha.tools.roi_formatter.get_planes_from_string(roi_coord_string)[source]¶
- Parameters
roi_coord_string (string: str) – roi string representation of an roi as formatted in the SQL database
- Returns
a “sets of points” formatted dictionary
- Return type
dict
- dvha.tools.roi_formatter.get_roi_coordinates_from_planes(sets_of_points)[source]¶
- Parameters
sets_of_points (dict) – a “sets of points” formatted dictionary
- Returns
a list of numpy arrays, each array is the x, y, z coordinates of the given point
- Return type
list
- dvha.tools.roi_formatter.get_roi_coordinates_from_shapely(shapely_dict, sample_res=None)[source]¶
- Parameters
shapely_dict (dict) – output from get_shapely_from_sets_of_points
sample_res (int, float) – If set to a numeric value, sample each polygon with this resolution (mm)
- Returns
a list of numpy arrays, each array is the x, y, z coordinates of the given point
- Return type
list
- dvha.tools.roi_formatter.get_roi_coordinates_from_string(roi_coord_string)[source]¶
- Parameters
roi_coord_string (string: str) – roi string representation of an roi as formatted in the SQL database
- Returns
a list of numpy arrays, each array is the x, y, z coordinates of the given point
- Return type
list
- dvha.tools.roi_formatter.get_shapely_from_sets_of_points(sets_of_points, tolerance=None, preserve_topology=True)[source]¶
- Parameters
sets_of_points (dict) – a “sets of points” formatted dictionary
tolerance (bool, optional) – If set to a number, will use Shapely’s simplify on each contour with the given tolerance
preserve_topology (bool, optional) – Passed to Shapely’s simplify if
simplify_tolerance
is set
- Returns
roi_slices which is a dictionary of lists of z, thickness, and a Shapely Polygon class object
- Return type
dict
ROI Geometry¶
Tools for geometric calculations
- dvha.tools.roi_geometry.centroid(roi)[source]¶
- Parameters
roi – a “sets of points” formatted dictionary
- Returns
centroid or the roi in x, y, z dicom coordinates (mm)
- Return type
list
- dvha.tools.roi_geometry.cross_section(roi)[source]¶
Calculate the cross section of a given roi
- Parameters
roi (dict) – a “sets of points” formatted dictionary
- Returns
max and median cross-sectional area of all slices in cm^2
- Return type
dict
- dvha.tools.roi_geometry.dth(min_distances)[source]¶
- Parameters
min_distances – the output from min_distances_to_target
- Returns
histogram of distances in 1mm bin widths
- Return type
numpy.array
- dvha.tools.roi_geometry.is_point_inside_roi(point, roi)[source]¶
Check if a point is within an ROI
- Parameters
point (list) – x, y, z
roi (dict) – roi: a “sets of points” formatted dictionary
- Returns
Whether or not the point is within the roi
- Return type
bool
- dvha.tools.roi_geometry.min_distances_to_target(oar_coordinates, target_coordinates, factors=None)[source]¶
Calculate all OAR-point-to-Target-point euclidean distances
- Parameters
oar_coordinates (list) – numpy arrays of 3D points defining the surface of the OAR
target_coordinates (list) – numpy arrays of 3D points defining the surface of the PTV
factors – (Default value = None)
- Returns
min_distances: all minimum distances (cm) of OAR-point-to-Target-point pairs
- Return type
list
- dvha.tools.roi_geometry.overlap_volume(oar, tv)[source]¶
Calculate the overlap volume of two rois
- Parameters
oar (dict) – organ-at-risk as a “sets of points” formatted dictionary
tv (dict) – treatment volume as a “sets of points” formatted dictionary
- dvha.tools.roi_geometry.planes_to_voxel_centers(planes, res=1, max_progress=None)[source]¶
Convert a sets of points into a 3D voxel centers within ROI
- Parameters
planes (dict) – a “sets of points” dictionary representing the union of the rois
res (float) – resolution factor for voxels
max_progress (float) – if not None, set the maximum progress bar value (with update_dvh_progress)
- Returns
A list of 3D points inside the ROI defined by
planes
- Return type
list
- dvha.tools.roi_geometry.process_dth_string(dth_string)[source]¶
Convert a dth_string from the database into data and bins DVHA stores 1-mm binned surface DTHs with an odd number of bins, middle bin is 0.
- Parameters
dth_string – a value from the dth_string column
- Returns
counts, bin positions (mm)
- Return type
type
- dvha.tools.roi_geometry.spread(roi)[source]¶
- Parameters
roi – a “sets of points” formatted dictionary
- Returns
x, y, z dimensions of a rectangular prism encompassing roi
- Return type
list
- dvha.tools.roi_geometry.surface_area(coord, coord_type='dicompyler')[source]¶
Calculate the surface of a given roi
- Parameters
coord – dicompyler structure coordinates from GetStructureCoordinates() or a sets_of_points dictionary
coord_type – either ‘dicompyler’ or ‘sets_of_points’ (Default value = ‘dicompyler’)
- Returns
surface_area in cm^2
- Return type
float
Stats¶
The code from DVHA’s statistical modules have been exported to a stand-alone library, however, DVHA does not use this internally (yet).
GitHub: stats.dvhanalytics.com
Take numerical data from main app and convert to a format suitable for statistical analysis in Regression and Control Chart tabs
- class dvha.tools.stats.MultiVariableRegression(X, y, saved_reg=None)[source]¶
Bases:
object
Perform a multi-variable regression using sklearn
- Parameters
X (np.array) – independent data
y (list) – dependent data
- class dvha.tools.stats.StatsData(dvhs, table_data, group=1)[source]¶
Bases:
object
Class used to to collect data for Regression and Control Chart This process is different than for Time Series since regressions require all variables to be the same length
- Parameters
dvhs (DVH) – data from DVH query
table_data (dict) – table data other than from DVHs. Has keys of ‘Plans’, ‘Rxs’, ‘Beams’ with values of QuerySQL objects
- add_variable(variable, values, units='')[source]¶
Add a new variable to
StatsData.data
, will not over-write- Parameters
variable (str) – variable name to be used as a key and plot title
values (list) – values to be stored for
variable
units (str, optional) – Define units for display on plot
- del_variable(variable)[source]¶
Delete a variable from
StatsData.data
- Parameters
variable (str) – variable name
- get_X_and_y(y_variable, x_variables, include_patient_info=False)[source]¶
Collect data for input into multi-variable regression
- Parameters
y_variable (str) – dependent variable
x_variables (list) – independent variables
include_patient_info (bool) – If True, return mrn, uid, dates with X and y
- Returns
X, y or X, y, mrn, uid, dates
- Return type
type
- get_axis_title(variable)[source]¶
Get the plot axis title for
variable
- Parameters
variable (str) – A key of
StatsData.data
- Returns
variable
with units if stored- Return type
str
- get_beam_indices(uid)[source]¶
Get the indices of the Beams table with
uid
- Parameters
uid (str) – StudyInstanceUID as stored in the SQL database
- Returns
Beams table indices that match
uid
- Return type
list
- get_bokeh_data(x, y)[source]¶
Get data in a format compatible with bokeh’s ColumnDataSource.data
- Parameters
x (str) – x-variable name
y (str) – y-variable name
- Returns
x and y data
- Return type
dict
- get_corr_matrix_data(options, included_vars=None, extra_vars=None)[source]¶
Get a Pearson-R correlation matrix
- Parameters
options (dvha.options.Options) – DVHA options class object. Used to get colors.
included_vars (list, optional) – variables to be included in matrix
extra_vars (list, optional) – variables to be excluded from the matrix
- Returns
The dictionary has keys of ‘source_data’, ‘x_factors’ and ‘y_factors’. source_data is used for bokeh plotting, the factors are for axis tick labels. The 2nd parameter of the tuple is a list of removed mrns
- Return type
tuple (dict, list)
- get_plan_index(uid)[source]¶
Get the index of
uid
from the Plans table- Parameters
uid (str) – StudyInstanceUID as stored in the SQL database
- Returns
Plans table index for
uid
- Return type
int
- property mrns¶
MRNs from DVH object
- Returns
DVH.mrn
- Return type
list
- set_variable_data(variable, data, units=None)[source]¶
Replace the data for the given variable in
StatsData.data
- Parameters
variable (str) – variable name
data (list) – new data
units (str, optional) – Define units for display on plot
- set_variable_units(variable, units)[source]¶
Set the units for the given variable in
StatsData.data
- Parameters
variable (str) – variable name
units (str) – units for display on plot
- property sim_study_dates¶
Simulation dates from Plans table
- Returns
Simulation dates
- Return type
list
- property uids¶
StudyInstanceUIDs from DVH object
- Returns
DVH.study_instance_uid
- Return type
list
- update_endpoints_and_radbio()[source]¶
Update endpoint and radbio data in self.data. This function is needed since all of these values are calculated after a query and user may change these values.
- property variables¶
Get variable names for plotting
- Returns
keys of
StatsData.data
sans ‘Simulation Date’- Return type
list
- property vars_with_nan_values¶
Find variable names that contain non-numerical values
- Returns
Variable names that cannot be converted to
float
- Return type
list
- dvha.tools.stats.get_control_limits(y, std_devs=3)[source]¶
Calculate control limits for Control Chart
- Parameters
y (list) – data
std_devs (int or float) – values greater than std_devs away are out-of-control (Default value = 3)
- Returns
center line, upper control limit, and lower control limit
- Return type
type
- dvha.tools.stats.get_index_of_nan(numpy_array)[source]¶
Find indices of np.nan values
- Parameters
numpy_array (np.ndarray) – A numpy array
- Returns
indices of
numpy_array
that arenp.nan
- Return type
list
- dvha.tools.stats.get_p_values(X, y, predictions, params)[source]¶
Get p-values using sklearn based on https://stackoverflow.com/questions/27928275/find-p-value-significance-in-scikit-learn-linearregression
- Parameters
X (np.ndarray) – independent data
y (np.ndarray) – dependent data
predictions – output from linear_model.LinearRegression.predict
params – np.array([y_incercept, slope])
- Returns
p-values
- Return type
list
- dvha.tools.stats.str_starts_with_any_in_list(string_a, string_list)[source]¶
Check if string_a starts with any string the provided list of strings
- Parameters
string_a (str) – Any string
string_list (list) – A list of strings
- Returns
True if any
string_a
starts with any string instring_list
- Return type
bool