mousestyles.ultradian package¶
Module contents¶
-
mousestyles.ultradian.
aggregate_data
(feature, bin_width, nmouse=4, nstrain=3)[source]¶ Aggregate all the strains and mouses with any feature together in one dataframe. It combines the results you got from aggregate_movements and aggregate_interval. It will return a dataframe with three variables: mouse, strain, feature and hour.
Parameters: - feature – {“AS”, “F”, “IS”, “M_AS”, “M_IS”, “W”, “Distance”}
- bin_width (int) – Number of minutes, the time interval for data aggregation.
Returns: - pandas.dataframe
- describe – Column 0: the mouse number (number depends on strain)(0-3) Column 1: the strain of the mouse (0-2) Column 2: hour(numeric values below 24 accourding to bin_width) Column 3: feature values
Examples
>>> test = aggregate_data("Distance",20) >>> print(np.mean(test["Distance"])) 531.4500177747973
-
mousestyles.ultradian.
aggregate_interval
(strain, mouse, feature, bin_width)[source]¶ Aggregate the interval data based on n-minute time intervals, return a time series.
Parameters: - strain (int) – nonnegative integer indicating the strain number
- mouse (int) – nonnegative integer indicating the mouse number
- feature ({"AS", "F", "M_AS", "M_IS", "W"}) – “AS”: Active state probalibity “F”: Food consumed (g) “M_AS”: Movement outside homebase “M_IS”: Movement inside homebase “W”: Water consumed (g)
- bin_width (number of minutes of time interval for data aggregation) –
Returns: ts – a pandas time series of length 12(day)*24(hour)*60(minute)/n
Return type: pandas.tseries
-
mousestyles.ultradian.
aggregate_movement
(strain, mouse, bin_width)[source]¶ Aggregate the movement data based on n-minute time intervals, return a time series.
Parameters: - strain (int) – nonnegative integer indicating the strain number
- mouse (int) – nonnegative integer indicating the mouse number
- bin_width (number of minutes of time interval for data aggregation) –
Returns: ts – a pandas time series of length (#day)*24(hour)*60(minute)/n
Return type: pandas.tseries
-
mousestyles.ultradian.
find_cycle
(feature, strain, mouse=None, bin_width=15, methods=u'LombScargleFast', disturb_t=False, gen_doc=False, plot=True, search_range_fit=None, nyquist_factor=3, n_cycle=10, search_range_find=(2, 26), sig=array([ 0.05]))[source]¶ Use Lomb-Scargel method on different strain and mouse’s data to find the best possible periods with highest p-values. The function can be used on specific strains and specific mouses, as well as just specific strains without specifying mouse number. We use the O(NlogN) fast implementation of Lomb-Scargle from the gatspy package, and also provide a way to visualize the result.
Note that either plotting or calculating L-S power doesn’t use the same method in finding best cycle. The former can use user-specified search_range, while the latter uses default two grid search_range.
Parameters: - feature (string in {"AS", "F", "M_AS", "M_IS", "W", "Distance"}) – “AS”: Active state probalibity “F”: Food consumed (g) “M_AS”: Movement outside homebase “M_IS”: Movement inside homebase “W”: Water consumed (g) “Distance”: Distance traveled
- strain (int) – nonnegative integer indicating the strain number
- mouse (int, default is None) – nonnegative integer indicating the mouse number
- bin_width (int, minute unit, default is 15 minutes) – number of minutes, the time interval for data aggregation
- methods (string in {"LombScargleFast", "LombScargle"}) – indicating the method used in determining periods and best cycle. If choose ‘LombScargle’, ‘disturb_t’ must be True.
- disturb_t (boolean, default is False) – If True, add uniformly distributed noise to the time sequence which are used to fit the Lomb Scargle model. This is to avoid the singular matrix error that could happen sometimes.
- plot (boolean, default is True) – If True, call the visualization function to plot the Lomb Scargle power versus periods plot. First use the data (either strain specific or strain-mouse specific) to fit the LS model, then use the search_range_fit as time sequence to predict the corresponding LS power, at last draw the plot out. There will also be stars and horizontal lines indicating the p-value of significance. Three stars will be p-value in [0,0.001], two stars will be p-value in [0.001,0.01], one star will be p-value in [0.01,0.05]. The horizontal line is the LS power that has p-value of 0.05.
- search_range_fit (list, numpy array or numpy arange, hours unit,) – default is None list of numbers as the time sequence to predict the corrsponding Lomb Scargle power. If plot is ‘True’, these will be drawn as the x-axis. Note that the number of search_range_fit points can not be too small, or the prediction smooth line will not be accurate. However the plot will always give the right periods and their LS power with 1,2 or 3 stars. This could be a sign to check whether search_range_fit is not enough to draw the correct plot. We recommend the default None, which is easy to use.
- nyquist_factor (int) – If search_range_fit is None, the algorithm will automatically choose the periods sequence. 5 * nyquist_factor * length(time sequence) / 2 gives the number of power and periods used to make LS prediction and plot the graph.
- n_cycle (int, default is 10) – numbers of periods to be returned by function, which have the highest Lomb Scargle power and p-value.
- search_range_find (list, tuple or numpy array with length of 2, default is) – (2,26), hours unit Range of periods to be searched for best cycle. Note that the minimum should be strictly larger than 0 to avoid 1/0 issues.
- sig (list or numpy array, default is [0.05].) – significance level to be used for plot horizontal line.
- gen_doc (boolean, default is False) – If true, return the parameters needed for visualize the LS power versus periods
Returns: - cycle (numpy array of length ‘n_cycle’) – The best periods with highest LS power and p-values.
- cycle_power (numpy array of length ‘n_cycle’) – The corrsponding LS power of ‘cycle’.
- cycle_pvalue (numpy array of length ‘n_cycle’) – The corrsponding p-value of ‘cycle’.
- periods (numpy array of the same length with ‘power’) – use as time sequence in LS model to make predictions.Only return when gen_doc is True.
- power (numpy array of the same length with ‘periods’) – the corresponding predicted power of periods. Only return when gen_doc is True.
- sig (list, tuple or numpy array, default is [0.05].) – significance level to be used for plot horizontal line. Only return when gen_doc is True.
- N (int) – the length of time sequence in the fit model. Only return when gen_doc is True.
Examples
>>> a,b,c = find_cycle(feature='F', strain = 0,mouse = 0, plot=False,) >>> print(a,b,c) >>> [ 23.98055016 4.81080233 12.00693952 6.01216335 8.0356203 3.4316698 2.56303353 4.9294791 21.37925713 3.5697756 ] [ 0.11543449 0.05138839 0.03853218 0.02982237 0.02275952 0.0147941 0.01151601 0.00998443 0.00845883 0.0082382 ] [ 0.00000000e+00 3.29976046e-10 5.39367189e-07 8.10528027e-05 4.71001953e-03 3.70178834e-01 9.52707020e-01 9.99372657e-01 9.99999981e-01 9.99999998e-01]
-
mousestyles.ultradian.
mix_strain
(data, feature, print_opt=True, nstrain=3, search_range=(3, 12), degree=1)[source]¶ Fit the linear mixed model onto our aggregate data. The fixed effects are the hour, strain, interactions between hour and strain; The random effect is mouse because we want to make sure that the different mouses will not give out any differences. We added two dummy variables: strain0 and strain1 to be our fixed effects.
Parameters: - data (data frame output from aggregate_data function) –
- feature ({"AS", "F", "IS", "M_AS", "M_IS", "W", "Distance"}) –
- print_opt (True or False) –
- nstrain (positive integer) –
- range (array contains two elements) –
- degree (positive integer) –
Returns: - Two mixed model regression results which includes all the coefficients,
- t statistics and p values for corresponding coefficients; The first model
- includes interaction terms while the second model does not include the
- interaction terms
- Likelihood ratio test p values, if it is below our significance level,
- we can conclude that the different strains have significantly different
- time patterns
Examples
>>> result = mix_strain(data = aggregate_data("F",30), feature = "F", >>> print_opt = False, degree = 2) >>> print(result) 2.5025846540930469e-09
-
mousestyles.ultradian.
seasonal_decomposition
(strain, mouse, feature, bin_width, period_length)[source]¶ Apply seasonal decomposition model on the time series of specified strain, mouse, feature and bin_width.
Parameters: - strain (int) – nonnegative integer indicating the strain number
- mouse (int) – nonnegative integer indicating the mouse number
- feature ({"AS", "F", "M_AS", "M_IS", "W", "Distance"}) – “AS”: Active state probalibity “F”: Food consumed (g) “M_AS”: Movement outside homebase “M_IS”: Movement inside homebase “W”: Water consumed (g) “Distance”: Distance traveled
- bin_width (int) – number of minutes, the time interval for data aggregation
- period_length (int or float) – number of hours, usually the significant period length indicated by Lomb-scargle model
Returns: res – seasonal decomposition result for the mouse. Check the seasonal decomposition plot by res.plot(), seasonl term and trend term by res.seasonal and res.trend separately.
Return type: statsmodel seasonal decomposition object
Examples
>>> res = seasonal_decomposition(strain=0, mouse=0, feature="W", bin_width=30, period_length = 24)
-
mousestyles.ultradian.
strain_seasonal
(strain, mouse, feature, bin_width, period_length)[source]¶ Use seansonal decomposition model on the time series of specified strain, mouse, feature and bin_width. return the seasonal term and the plot of seasonal term by mouse of a set of mouses in a strain
Parameters: - strain (int) – nonnegative integer indicating the strain number
- mouse (list, set or tuple) – nonnegative integer indicating the mouse number
- feature ({"AS", "F", "M_AS", "M_IS", "W", "Distance"}) – “AS”: Active state probalibity “F”: Food consumed (g) “M_AS”: Movement outside homebase “M_IS”: Movement inside homebase “W”: Water consumed (g) “Distance”: Distance traveled
- bin_width (int) – number of minutes, the time interval for data aggregation
- period_length (int or float) – number of hours, usually the significant period length indicated by Lomb-scargle model
Returns: seasonal_all – mouse indicated by the input parameter
Return type: numpy array containing the seasonal term for every
Examples
>>> res = strain_seasonal(strain=0, mouse={0, 1, 2, 3}, feature="W", bin_width=30, period_length = 24)