mousestyles.dynamics package

Module contents

mousestyles.dynamics.create_time_matrix(combined_gap=4, time_gap=1, days_index=137, verbose=False)[source]

Return a time matrix for estimate the MLE parobability. The rows are 137 mousedays. The columns are time series in a day. The data are the mouse activity at that time. 0 represents IS, 1 represents eating, 2 represents drinking, 3 represents others activity in AS.

Parameters:
  • combined_gap (nonnegative float or int) – The threshold for combining small intervals. If next start time minus last stop time is smaller than combined_gap than combined these two intervals.
  • time_gap (positive float or int) – The time gap for create the columns time series
  • days_index (nonnegative int) – The number of days to process, from day 0 to day days_index.
  • verbose (bool) – If True, print out helpful information to the screen
Returns:

time – a matrix represents the activity for a certain mouse day and a certain time.

Return type:

Pandas.DataFrame

Examples

>>> time = create_time_matrix(combined_gap=4, time_gap=1).iloc[0, 0:10]
>>> strain    0
    mouse     0
    day       0
    48007     0
    48008     0
    48009     0
    48010     0
    48011     0
    48012     0
    48013     0
    Name: 0, dtype: float64
mousestyles.dynamics.find_best_interval(df, strain_num, interval_length_initial=array([ 600, 1200, 1800, 2400, 3000, 3600, 4200, 4800, 5400, 6000, 6600, 7200]))[source]

Returns the optimized time interval length and the corresponding fake mouse behavior string with the evaluation score for a particular mouse strain. The input data are the pandas DataFrame generated by function create_time_matrix which contains all strains information, the strain number that we want to develop, and some initial time interval length we want to optimize on. The outputs are the optimized time interval length, the simulated mouse states string using that optimized time interval and the evaluation score comparing this simulated mouse with the real mice behavior in the strain using the same optimized time interval length.

Parameters:
  • df (Pandas.DataFrame) – a huge data frame containing info on strain, mouse no., mouse day, and different states at chosen time points.
  • strain_num (int) – an integer of 0, 1, 2 that specifying the desired mouse strain
  • interval_length_initial (numpy.ndarray) – a numpy.ndarray specifying the range of time interval that it optimizes on, with the default value of a sequence from 600s to 7200s with 600s step since 10min to 2h is a reasonable choice.
Returns:

  • best_interval_length (int) – a interger indicating the optimized time interval in terms of the evaluation score
  • best_fake_mouse (list) – a list of around 88,283 integers indicating the simulated states using the best_interval_length
  • best_score (float) – a float between 0 and 1 representing the evaluation score comparing the best_fake_mouse with the real mice behavior in the strain under the same optimized time interval length. higher the score, better the simulation behavior.

Examples

>>> row_i = np.hstack((np.zeros(40)))
>>> time_df_eg = np.vstack((row_i, row_i, row_i))
>>> time_df_eg = pd.DataFrame(time_df_eg)
>>> time_df_eg.rename(columns={0:'strain'}, inplace=True)
>>> find_best_interval(time_df_eg, 0, np.arange(10, 40, 10))
(10, array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]), 0.98245614035087725)
mousestyles.dynamics.get_prob_matrix_list(time_df, interval_length=1000)[source]

returns a list of probability transition matrices that will be later used to characterize and simu- late the behavior dynamics of different strains of mice. The data used as input is the pandas DataFrame generated by function create_time_matrix with de- fault parameters. The output is a list of numpy arrays, each being a transition matrix characterizing one small time interval. The interval length could be chosen.

Parameters:
  • time_df (Pandas.DataFrame) – a huge data frame containing info on strain, mouse no., mouse day, and different states at chosen time points.
  • interval_length (int) – an integer specifying the desired length of each small time interval.
Returns:

matrix_list – a list of the mle estimations of the probability tran- sition matrices for each small time interval stored in the format of numpy array. Each element of this list is a numpy array matrix.

Return type:

list

Examples

>>> row_i = np.hstack((np.zeros(13), np.ones(10),
                        np.ones(10)*2, np.ones(10)*3))
>>> time_df_eg = np.vstack((row_i, row_i,row_i))
>>> time_df_eg = pd.DataFrame(time_df_eg)
>>> mat_list = get_prob_matrix_list(time_df_eg,
                                    interval_length=10)
>>> mat_list[0]
>>> array([[ 1.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.]])
>>> mat_list[1]
>>> array([[ 0.,  0.,  0.,  0.],
           [ 0.,  1.,  0.,  0.],
           [ 0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.]])
>>> mat_list[2]
>>> array([[ 0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.],
           [ 0.,  0.,  1.,  0.],
           [ 0.,  0.,  0.,  0.]])
>>> mat_list[3]
>>> array([[ 0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  1.]])
mousestyles.dynamics.get_prob_matrix_small_interval(string_list, verbose=False)[source]

return the MLE estimate of the probability matrix of the markov chain model. The data used as input is a list of strings that contains the information regarding the transition of states of the mouse be- havior. The output is a matrix stored in the format of numpy array, where the i,j th term indicates the probability of transiting from state i to state j.

Parameters:
  • string_list (list) – a list of strings of the states in the given time slot.
  • verbose (bool) – If True, print out helpful information to the screen
Returns:

M – the MLE estimation of the probability tran- sition matrix. Each entry M_ij represents the probability of transiting from state i to state j.

Return type:

numpy.ndarray

Examples

>>> time_list = ['002', '001', '012']
>>> get_prob_matrix_small_interval(time_list)
>>> array([[ 0.4,  0.4,  0.2,  0. ],
           [ 0. ,  0. ,  1. ,  0. ],
           [ 0. ,  0. ,  0. ,  0. ],
           [ 0. ,  0. ,  0. ,  0. ]])
mousestyles.dynamics.get_score(true_day, simulated_day, weight=[1, 10, 50, 1])[source]

Returns the evaluation score for the simulted day that will be later used to choose the best time interval for different strains. The input data should be two numpy arrays and one list, the two arrays are possibly with different lengths, one being the activities for one particular day of one particular mouse, the other array is our simulated day for this mouse from the mcmc_simulation function. And the list is the weight for different status. We should give different rewards for making correct simulations on various status. For example, the average mouse day have 21200 timestamps. 10000 of them are IS, 1000 are EAT, 200 are drink, and the left 10000 are OTHERS. So we should weigh more on drink and eat, their ratio is 10000:1000:200:10000 = 1:0.1:0.02:0.1. So I took the inverse of them to be 1:10:50:1. The output will be one number between 0 and max(weight), indicating the similiary of the true day of a mouse and a simulated day of the same mouse. We will use this function to measure the performance of the simulation and then choose the appropriate time interval.

Parameters:
  • true_day (numpy.array) – a numpy.array containing the activities for one particular mouse on a specific day
  • simulated_day (numpy.array) – a numpy.array containing the simulated activities for this particular mouse.
  • weight (list) – a list with positive numbers showing the rewards for making the right predictions of various status.
Returns:

score – a float from 0 to max(weight), indicating the similarity of the simulated data with the actual value, and therefore, the performance of the simulation, with max(weight) being the most similar, and 0 being the least similar.

Return type:

float

Examples

>>> true_day_1 = np.zeros(13)
>>> simulated_day_1 = np.ones(13)
>>> get_score(true_day_1, simulated_day_1)
>>> 0.0
>>> true_day_2 = np.ones(13)
>>> simulated_day_2 = np.ones(13)
>>> get_score(true_day_2, simulated_day_2)
>>> 10.0
mousestyles.dynamics.mcmc_simulation(mat_list, n_per_int)[source]

This function gives the Monte Carlo simulation of the stochastic process modeling the dynamic changes of states of the mice behavior. The in- put of this function is a list of probability transition matrices and an integer indicates how many outputs for each matrix. This number is related to the interval_length parameter in function get_prob_matrix_list. The output is an array of numbers, each indicates one state.

Parameters:
  • mat_list (List) – a list of numpy arrays storing the probabi- lity transition matrices for each small time interval chosen.
  • n_per_int (int) – an integer specifying the desired output length of each probability transition matrix. This is the same as the parameter interval_length used in the function get_prob_matrix_small_interval
Returns:

simu_result – an array of integers indicating the simulated states given a list of probability transition matrices.

Return type:

numpy.array

Examples

>>> mat0 = np.zeros(16).reshape(4, 4)
>>> np.fill_diagonal(mat0, val=1)
>>> mat1 = np.zeros(16).reshape(4, 4)
>>> mat1[0, 1] = 1
>>> mat1[1, 0] = 1
>>> mat1[2, 2] = 1
>>> mat1[3, 3] = 1
>>> mat_list_example = [mat0, mat1]
>>> mcmc_simulation(mat_list_example, 10)
>>> array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            1, 0, 1, 0, 1, 0, 1, 0, 1, 0])