I have a dataframe (will call it 'df') with a decent amount of variables (numeric, logical and characters) representing an experiment where different cell types were moved from a specific medium, to another one, and the activity of the cell was quantified at specific times.
(我有一个数据帧(将其称为“ df”),其中包含相当数量的变量(数字,逻辑和字符),代表一项实验,其中不同类型的细胞从一种特定的培养基移至另一种,并且该细胞的活性为在特定时间进行量化。)
The first and second columns hold the name of the 'source' medium, and the name of the medium the cells were moved to, respectively; (第一列和第二列分别保存“源”媒体的名称和单元格要移动到的媒体的名称;)
the third column describes the time at which the activity was quantified, the fourth is the cell type, the fifth is the activity measured, and this is where it gets funny. (第三列描述了活动的量化时间,第四列是细胞类型,第五列是测量的活动,这很有趣。)
I have two main questions, the first one is to know if there is an 'R-esque' way to did what I did to obtain the sixth column, which contains the increase/decrease (in percentage) of the value in 'Activity' relative from that present in the previous row, but in a group manner (each group consist of a combination of Cell.Type, Pre.Medium and Time), so that's why its value is NA everytime the value of Time is zero.
(我有两个主要问题,第一个是要知道是否有一种'R-esque'的方式来完成我所获得的第六列,该列包含'Activity'中值的增加/减少(百分比)相对于上一行中存在的相对值,但以分组方式(每个组由Cell.Type,Pre.Medium和Time的组合组成),因此这就是每次Time的值为零时其值为NA的原因。)
Assuming this is my dataframe (I've simplified it in order to make my question more clear):
(假设这是我的数据框(为了使我的问题更清楚,我对其进行了简化):)
df <- structure(list(Pre.Medium = c("Medium1", "Medium1", "Medium1",
"Medium2", "Medium2", "Medium2", "Medium1", "Medium1", "Medium1",
"Medium2", "Medium2", "Medium2"), Pos.Medium = c("Medium2", "Medium2",
"Medium2", "Medium1", "Medium1", "Medium1", "Medium2", "Medium2",
"Medium2", "Medium1", "Medium1", "Medium1"), Time = c(0, 2, 4,
0, 2, 4, 0, 2, 4, 0, 2, 4), Cell.Type = c("Cell_A", "Cell_A",
"Cell_A", "Cell_A", "Cell_A", "Cell_A", "Cell_B", "Cell_B", "Cell_B",
"Cell_B", "Cell_B", "Cell_B"), Activity = c(0.5, 1, 2, 2, 1,
0.5, 0.2, 0.8, 0.2, 0.2, 0.2, 0.4), Percent.Increase = c(NA,
100, 100, NA, -50, -50, NA, 300, -75, NA, 0, 100), Primary.Increase = c(NA,
TRUE, FALSE, NA, TRUE, FALSE, NA, TRUE, FALSE, NA, FALSE, FALSE
), Secondary.Increase = c(NA, FALSE, FALSE, NA, FALSE, FALSE,
NA, FALSE, FALSE, NA, FALSE, TRUE)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), problems = structure(list(
row = 1L, col = NA_character_, expected = "8 columns", actual = "9 columns",
file = "'new 2'"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), spec = structure(list(cols = list(Pre.Medium = structure(list(), class = c("collector_character",
"collector")), Pos.Medium = structure(list(), class = c("collector_character",
"collector")), Time = structure(list(), class = c("collector_double",
"collector")), Cell.Type = structure(list(), class = c("collector_character",
"collector")), Activity = structure(list(), class = c("collector_double",
"collector")), Percent.Increase = structure(list(), class = c("collector_double",
"collector")), Primary.Increase = structure(list(), class = c("collector_logical",
"collector")), Secondary.Increase = structure(list(), class = c("collector_logical",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
### Pre.Med Pos.Med Time Cell.Type Activity Percent.Increase Primary.Increase Secondary.Increase
### Medium1 Medium2 0 Cell_A 0.5 NA NA NA
### Medium1 Medium2 2 Cell_A 1 100 TRUE FALSE
### Medium1 Medium2 4 Cell_A 2 100 FALSE FALSE
### Medium2 Medium1 0 Cell_A 2 NA NA NA
### Medium2 Medium1 2 Cell_A 1 -50 TRUE FALSE
### Medium2 Medium1 4 Cell_A 0.5 -50 FALSE FALSE
### Medium1 Medium2 0 Cell_B 0.2 NA NA NA
### Medium1 Medium2 2 Cell_B 0.8 300 TRUE FALSE
### Medium1 Medium2 4 Cell_B 0.2 -75 FALSE FALSE
### Medium2 Medium1 0 Cell_B 0.2 NA NA NA
### Medium2 Medium1 2 Cell_B 0.2 0 FALSE FALSE
### Medium2 Medium1 4 Cell_B 0.4 100 FALSE TRUE
I did by using the group_by and mutate functions, and then the lag function to calculate the increase/decrease from the previous and the previous previous row, was there a better way to do so?
(我使用了group_by和mutate函数,然后使用lag函数来计算上一行和上一行的增加/减少,是否有更好的方法呢?)
For my specific case, lag was enough, but what if I had more than three time measurements in each 'group' and needed to go way behind to calculate it? (对于我的特定情况,滞后就足够了,但是如果我在每个“组”中进行了三次以上的时间测量并且需要落后于时间来进行计算,该怎么办?)
With my approach, at some point I would've had to use something like lag(lag(lag(lag(lag((Activity / lag(Activity)) - 1) * 100)))) etc. (用我的方法,在某些时候我将不得不使用lag(lag(lag(lag(lag(lag((Activity / lag(Activity))-1)* 100))))等东西。)
The other thing is something I have not been able to figure out in any way, and it is to turn my 'wide' dataset into a long one, by turning my columns 'Primary.Increase' and 'Secondary.Increase' into a column named 'Increase.Type' where its value will consist, for each group (combination of Cell.Type, Pre.Med and Time), in the name of the column (either Primary.Response or Secondary.Response) where the value of one of its member was TRUE.
(另一件事是我无法以任何方式弄清楚,它是通过将我的列“ Primary.Increase”和“ Secondary.Increase”变成一列来将“宽”数据集变成一个长数据集名为“ Increase.Type”,其中对于每个组(Cell.Type,Pre.Med和Time的组合),其值将包含在列名(Primary.Response或Secondary.Response)中,其中一者的值它的成员为TRUE。)
It should look something like this: (它看起来应该像这样:)
df <- structure(list(Pre.Med = c("Medium1", "Medium1", "Medium1", "Medium2",
"Medium2", "Medium2", "Medium1", "Medium1", "Medium1", "Medium2",
"Medium2", "Medium2"), Pos.Med = c("Medium2", "Medium2", "Medium2",
"Medium1", "Medium1", "Medium1", "Medium2", "Medium2", "Medium2",
"Medium1", "Medium1", "Medium1"), Time = c(0, 2, 4, 0, 2, 4,
0, 2, 4, 0, 2, 4), Cell.Type = c("Cell_A", "Cell_A", "Cell_A",
"Cell_A", "Cell_A", "Cell_A", "Cell_B", "Cell_B", "Cell_B", "Cell_B",
"Cell_B", "Cell_B"), Activity = c(0.5, 1, 2, 2, 1, 0.5, 0.2,
0.8, 0.2, 0.2, 0.2, 0.4), Percent.Inc = c(NA, 100, 100, NA, -50,
-50, NA, 300, -75, NA, 0, 100), Increase.Type = c("Primary.Increase",
"Primary.Increase", "Primary.Increase", "Primary.Increase", "Primary.Increase",
"Primary.Increase", "Primary.Increase", "Primary.Increase", "Primary.Increase",
"Secondary.Increase", "Secondary.Increase", "Secondary.Increase"
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-12L), spec = structure(list(cols = list(Pre.Med = structure(list(), class = c("collector_character",
"collector")), Pos.Med = structure(list(), class = c("collector_character",
"collector")), Time = structure(list(), class = c("collector_double",
"collector")), Cell.Type = structure(list(), class = c("collector_character",
"collector")), Activity = structure(list(), class = c("collector_double",
"collector")), Percent.Inc = structure(list(), class = c("collector_double",
"collector")), Increase.Type = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
### Pre.Med Pos.Med Time Cell.Type Activity Percent.Inc Increase.Type
### Medium1 Medium2 0 Cell_A 0.5 NA Primary.Increase
### Medium1 Medium2 2 Cell_A 1 100 Primary.Increase
### Medium1 Medium2 4 Cell_A 2 100 Primary.Increase
### Medium2 Medium1 0 Cell_A 2 NA Primary.Increase
### Medium2 Medium1 2 Cell_A 1 -50 Primary.Increase
### Medium2 Medium1 4 Cell_A 0.5 -50 Primary.Increase
### Medium1 Medium2 0 Cell_B 0.2 NA Primary.Increase
### Medium1 Medium2 2 Cell_B 0.8 300 Primary.Increase
### Medium1 Medium2 4 Cell_B 0.2 -75 Primary.Increase
### Medium2 Medium1 0 Cell_B 0.2 NA Secondary.Increase
### Medium2 Medium1 2 Cell_B 0.2 0 Secondary.Increase
### Medium2 Medium1 4 Cell_B 0.4 100 Secondary.Increase
Is there a way to do this in the first place?
(首先有没有办法做到这一点?)
I'd assume so, but so far I've not been able to do it :/ I'm an undergraduate in biology relatively new to R, I'm loving what you can do with it but I'm still a long way from being good at it. (我以为是这样,但是到目前为止我还没有做到:/我是R的新兴生物学专业的本科生,我很喜欢你能用它做什么,但是我还有很长的路要走从擅长)
Any help is heavily appreciated.
(非常感谢您的帮助。)
ask by John Sandman translate from so