count - How do you summarize columns based on unique IDs without knowing IDs in R? -


I am going through posts about data compression, but I think what I'm looking for.

I would like to create an abstract "counting table" which will help me see how many times patients were given a specific medication. The fact is that some patients have got many medicines together, it does not matter, because I only want the essence of all the medicines and then calculate how the percentage of each drug class is given. The issue is that I Possibly the names given for the drug are not known, they are somewhere in "hidden" data.frame , so I have to specify which column will be R. To create a "list" already See, Through which he can then summarize the columns.

I hope that it competes against the plyr package but in my work it has not worked yet to use it.

My df looks like something

  x   

As you can see, data There are three columns in .frame in which

what I want to do now is creating a list of unique characters,

  unique (x) unique (wi) ) Exclusive (Z)   

which serves as my reference list, through which r can summarize the count in each column.

  summary (df)   

summarizes the calculations of each column but without the ID of each and without the percentage of all the unique numbers.

I also tried the following, which goes in the right direction, but ideally, I have a list of unique characters, which I can feed on length Logic

  ddply (df. (X), summarized, calculation = length (unique (y)))   

How can I do any ideas Am this Very appreciated help

If you want a count for the entire dataframe, you can see the table (List (DF)) (See also Guckler's Answer) & amp; If you also want the possibilities: prop.table (table (list (dsp)) . When you want to get count for individual columns, it becomes even more difficult.

To compute for each column and total count, I wrote the following function:

  # Some reproducible data: set.seed (1) x & Lt; - Sample (letter [1: 4], 20, instead of = TRUE) y    

to implement the function with func (df) (id, x2) ("dat", x2, Envir = .globelEnv) dataframe dat in your global environment:

 > Id id xyz total 1 a 4 4 3 11 2b 5 5 2 12 3 c 5 4 4 13 4 d 6 4 5 15 5 e 3 5 8 6 f 0 1 1   

dplyr package with percentage:

  Library (dplyr) dat <- dat%>% mutate ( Xperc = round (100 * x / yoga (total), 1), yperc = round (100 * y / yoga (total), 1), junior = round (100 * z / sum (total), 1), perc = Round 100 * total / total (total), 1))   

resulting in:

  & gt; Dat id xyz total xperc yperc zperc perc 1 a 4 4 3 11 6.7 6.7 5.0 18.3 2 b 5 5 2 12 8.3 8.3 3.3 20.0 3 c 5 4 4 13 8.3 6.7 6.7 21.7 4 d 6 4 5 15 10.0 6.7 8.3 25.0 5 E 0 3 5 8 0.0 5.0 8.3 13.3 6 F 0 1 1 0.0 0.0 0.0 1.7 1.7    

Comments

Popular posts from this blog

java - ImportError: No module named py4j.java_gateway -

python - Receiving "KeyError" after decoding json result from url -

.net - Creating a new Queue Manager and Queue in Websphere MQ (using C#) -