python - Pandas report top-n in group and pivot -


I summarize dataframe with grouping with one dimension D1 and reporting summary statistics for each element of d1 I am trying. Specifically I am interested in the top n (index and value) for many metrics, what I want to output is a line for each element of d1.

I say two dimensions D1, D2 and 4 matrix M1, M2, M3, M4

1) What grouping by D1, and, Top ND2 and Finding the Metric Value is the method of suggestion for each of the Matrix M1 -. M4

Data analysis shows that for Wes's book in Python (page 35)

  def get_top1000 (group): return group.sort_index (= 'births' From, ascending = false] [: 1000] Grouped = names.groupby (['' year '' 'sex']) Top1000 = Is this still the recommended way (only for 1000s of D5 and many matrix 2) Now the next problem is that I want to pivot the top 5 (ie, I have a line for each element of D1)  

so The resultant data frame dimension should look for D1, D2 and Metric M1: 5 values ​​of column D2 for index D1 and top and related value of M1

D1DD 2-2 D 2-3 D2-4 D2-5 M1-1 M 1-2 M 1-3 M 1-4 M 1-5

....

So for the pivot I want to make a ranking on D2 (i.e. 1 to 5 - this is my column field). If I had always had 5 entries then it would be easy, but sometimes D1 has less than 5 elements for the given value of D1.

Any such suggestion how can add ranking to grouping, so that I have the right column index for pivoting

I do not have any toy data that is to be used or expected to compare to the result, but I think you have the following:

  N = 1000 name = my_fake_data_loader () classified = names.groupby (['year', 'sex']) grouped.apply (lambda g: g.sort_index (= 'births', ascending = false) .head (N)) < / Code>  

and they will give 1000 elements to each group first.

Comments

Popular posts from this blog

java - ImportError: No module named py4j.java_gateway -

python - Receiving "KeyError" after decoding json result from url -

.net - Creating a new Queue Manager and Queue in Websphere MQ (using C#) -