mousestyles.data package¶
Submodules¶
mousestyles.data.utils module¶
Data utilities.
-
mousestyles.data.utils.
day_to_mouse_average
(features, labels, num_strains=16, stdev=False, stderr=False)[source]¶ first three columns of labels are strain num, mouse number, day number features is an M x N matrix of mouse day x features
Returns: new data matrix with a mean and stdev/stderr for each mouse over mouse days
-
mousestyles.data.utils.
idx_restrict_to_rectangles
(TXY, rects=[(0, 0)], xlims=(0, 1), ylims=(0, 1), xbins=2, ybins=4, eps=0.01)[source]¶ given (3 x T) TXY with 0th row array of times [ASSUMED SORTED] and rows 1,2 are x,y coords
returns new interval array which is E minus those things occuring outside of given rectangle
-
mousestyles.data.utils.
map_xbins_ybins_to_cage
(rectangle=(0, 0), xbins=2, ybins=4, YLower=1.0, YUpper=43.0, XUpper=3.75, XLower=-16.25)[source]¶ converts a rectangle in xbins x ybins to corresponding rectangle in Cage coordinates
format is [[p1, p2], [p3, p4]] where pi = (cage_height_location, cage_length_location)
# ??? -chris ### THIS GIVES WRONG CAGE LOCATIONS for top bottom left right # # # xbins ybins do NOT reflect cage geometry perfectly
-
mousestyles.data.utils.
mouse_to_strain_average
(features, labels, num_strains=16, stdev=False, stderr=False)[source]¶ first two columns of M x N data matrix are strain num (1 - num_strains), mouse number other columns are features
Returns: new data matrix with a mean and stdev/stderr for each strain over mice
-
mousestyles.data.utils.
pull_locom_tseries_subset
(M, start_time=0, stop_time=300)[source]¶ given an (m x n) numpy array M where the 0th row is array of times [ASSUMED SORTED]
returns a new array (copy) that is a subset of M corresp to start_time, stop_time
returns [] if times are not in array
- (the difficulty is that if mouse does not move nothing gets registered
- so we should artificially create start_time, stop_time movement events at boundries)
-
mousestyles.data.utils.
split_data_in_half_randomly
(features, labels)[source]¶ - given an array of the form:
- features = M x A x B x C x ...
where M is the number of mouse days
- and an array labels for this data of the form:
- labels = M x 2
where labels[:, 0] are strain numbers and the labels[:, 1] are mice numbers
- returns
- bootstrap_data_1 = a random half of the mouse days bootstrap_labels_1 bootstrap_data_2 = the other half bootstrap_labels_2
Module contents¶
-
mousestyles.data.
distances
(strain, mouse, day, step=50)[source]¶ Return a numpy array object of project movement data for the specified combination of strain, mouse and day.
At regular timesteps, defined by the step parameter, compute the euclidian distance between the positions of the mouse at two consecutive times.
More specifically:
- let delta_t be the step parameter.
- let \(t_n\) be the sequance of non negative numbers such that \(t_0 = 0\) and \(t_(n+1) = t_n + delta_t\). The sequence is defined for all \(n\) such that \(n>=0\) and \(t_n <= time\) of the experiment
- let \(d_n\) be the sequence of non negative numbers such that \(d_0 = 0\) and \(d_n\) equals the position of the mouse at a particular day at time \(t_n\). \(d_n\) is then defined on the same set of integers as the sequence \(t_n\).
- The function returns the sequence \(d_n\).
Parameters: - strain (int) – nonnegative integer indicating the strain number
- mouse (int) – nonnegative integer indicating the mouse number
- day (int) – nonnegative integer indicating the day number
- step (float) – positive float defining the time between two observations default corresponds to 1 second
Returns: movement
Return type: numpy array
Examples
>>> dist = distances(0, 0, 0, step=1e2)
-
mousestyles.data.
distances_bymouse
(strain, mouse, step=50, verbose=False)[source]¶ Aggregates ‘distances’ for all days of recorded data for one particular mouse.
More specifically:
- let \(d^1,...,d^D\) be the sequence of distances for one particular mouse for days \(1\) to \(D\).
- The function returns the concatenation of the \(d^i\).
Parameters: - strain (int) – nonnegative integer indicating the strain number
- mouse (int) – nonnegative integer indicating the mouse number
- step (float) – positive float defining the time between two observations default corresponds to 1 second
Returns: movement
Return type: numpy array
Examples
>>> dist = distances_bymouse(0, 0, step=1e2)
-
mousestyles.data.
distances_bystrain
(strain, step=50, verbose=False)[source]¶ Aggregates distances_bymouse for all mice in one given strain.
More specifically:
- let \(d^1,...,d^M\) be the sequence of distances for one particular strain for mouses \(1\) to \(M\).
- The function returns the sequence concatenation of the \(d^i\).
Parameters: - strain (int) – nonnegative integer indicating the strain number
- step (float) – positive float defining the time between two observations default corresponds to 1 second
Returns: movement
Return type: numpy array
Examples
>>> dist = distances_bystrain(0, step=1e2)
-
mousestyles.data.
load_all_features
()[source]¶ Returns a (21131, 13) size pandas.DataFrame object corresponding to 9 features over each mouse’s 2-hour time bin. The first four columns index each mouses’s 2-hour bin:
Column 0: the strain of the mouse (0-15) Column 1: the mouse number (number depends on strain) Column 2: the day number (5-16) Column 3: the 2-hour time bin (e.g., value 4 corresponds to hours 4 to 6)
The remaining 9 columns are the computed features.
Returns: features_data_frame – A dataframe of computed features. Return type: pandas.DataFrame
-
mousestyles.data.
load_intervals
(feature)[source]¶ Return a pandas.DataFrame object of project interval data for the specified feature.
There are 5 columns in the dataframe: strain: the strain number of the mouse mouse: the mouse number in its strain day: the day number start: the start time stop: the stop time
Parameters: feature ({"AS", "F", "IS", "M_AS", "M_IS", "W"}) – Returns: intervals – All data of the specified feature as a dataframe Return type: pandas.DataFrame Examples
>>> AS = load_intervals('AS') >>> IS = load_intervals('IS')
-
mousestyles.data.
load_mouseday_features
(features=None)[source]¶ Returns a (1921, 3+11*n) size pandas.DataFrame object corresponding to each 2-hour time bin of the n inputted features over each mouse. The first three columns index each mouse:
Column 0: the strain of the mouse (0-15) Column 1: the mouse number (number depends on strain) Column 2: the day number (5-16)
The remaining 3*n columns are the values for each 2-hour time bin of the n inputted features.
Parameters: features (list, optional) – A list of one or more features chosen from {“ASProbability”, “ASNumbers”, “ASDurations”, “Food”, “Water”, “Distance”, “ASFoodIntensity”, “ASWaterIntensity”, “MoveASIntensity”} Default all features when optional Returns: features_data_frame – A dataframe of computed features. Return type: pandas.DataFrame Examples
>>> mouseday = load_mouseday_features() >>> mouseday = load_mouseday_features(["Food"]) >>> mouseday = load_mouseday_features(["Food", "Water", "Distance"])
-
mousestyles.data.
load_movement
(strain, mouse, day)[source]¶ Return a pandas.DataFrame object of project movement data for the specified combination of strain, mouse and day.
There are 4 columns in the dataframe: t: Time coordinates (in seconds) x: X coordinates indicating the left-right position of the cage y: Y coordinates indicating the front-back position of the cage isHB: Boolean indicating whether the point is in the home base or not
Parameters: - strain (int) – nonnegative integer indicating the strain number
- mouse (int) – nonnegative integer indicating the mouse number
- day (int) – nonnegative integer indicating the day number
Returns: movement – CT, CX, CY coordinates and home base status of the combination of strain, mouse and day
Return type: pandas.DataFrame
Examples
>>> movement = load_movement(0, 0, 0) >>> movement = load_movement(1, 2, 1)
-
mousestyles.data.
load_movement_and_intervals
(strain, mouse, day, features=[u'AS', u'F', u'IS', u'M_AS', u'M_IS', u'W'])[source]¶ Return a pandas.DataFrame object of project movement and interval data for the specified combination of strain, mouse and day.
There are 4 + len(features) columns in the dataframe: t: Time coordinates (in seconds) x: X coordinates indicating the left-right position of the cage y: Y coordinates indicating the front-back position of the cage isHB: Boolean indicating whether the point is in the home base or not Additonal columns taking their names from features: Boolean indicating whether the time point is in an interval of behavior of the given feature.
Parameters: - strain (int) – nonnegative integer indicating the strain number
- mouse (int) – nonnegative integer indicating the mouse number
- day (int) – nonnegative integer indicating the day number
- features (list (or other iterable) of strings) – list of features from {“AS”, “F”, “IS”, “M_AS”, “M_IS”, “W”}
Returns: movement – coordinates, home base status, and feature interval information for a given srain, mouse and day
Return type: pandas.DataFrame CT, CX, CY
Examples
>>> m1 = load_movement(1, 1, 1) >>> m2 = load_movement_and_intervals(1, 1, 1, []) # don't add any features >>> np.all(m1 == m2) True >>> m3 = load_movement_and_intervals(1, 1, 1, ['AS']) >>> m3.shape[1] == m1.shape[1] + 1 # adds one column True >>> m3.shape[0] == m1.shape[0] # same number of rows True >>> m3[29:32] t x y isHB AS 29 56448.333 -6.289 34.902 False False 30 56448.653 -5.509 34.173 True True 31 56449.273 -5.048 33.284 True True
-
mousestyles.data.
load_start_time_end_time
(strain, mouse, day)[source]¶ Returns the start and end times recorded for the mouse-day. The first number indicates the number of seconds elapsed since midnight, the second number indicates when the cage is closed for cleaning. In other words, this is the interval for which all sensors are active.
Parameters: - strain (int) – nonnegative integer indicating the strain number
- mouse (int) – nonnegative integer indicating the mouse number
- day (int) – nonnegative integer indicating the day number
Returns: times – the start time and end time
Return type: a tuple of (float, float)