--- title: "Light Cue Manuscript Statistics Overview" author: "Brandon Güell" date: "11/11/2017" output: pdf_document: toc: yes html_document: toc: yes toc_float: yes --- > The statistical analysis shown here are from the Guüell & Warkentin light cue manuscript Analysis are shows in chronological order as reported in the manuscript. I show details on importing, manipulating, summarizing, analyzing, and visualizing the data. See manuscript for details on background, research questions, hypothesis, methods, etc. # Hatching Timing Experiment Data Analysis ## Data importing First, we can load the data straight from my computer as follows... ```{r} #library(dplyr) #library(readxl) #photopattern <- read_excel("/Volumes/EggScience 1/Warkentin Lab/2015 Data/Hatching Phototaxis Data/Diel Hatching Pattern Data.xlsx") #photopattern ``` Or, we can also access this dataset from my Github repository as follows: ```{r} library(curl) f <- curl("https://raw.githubusercontent.com/bguell/Light-Cue-Manuscript-Final/master/Diel%20Hatching%20Pattern%20Data.csv") photopattern <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE) head(photopattern) ``` **EXLUCDE clutch 238** ```{r} library(dplyr) photopattern= filter(photopattern, clutch!=238) # exclude clude 238 photopattern %>% group_by(clutch) %>% summarise(n()) #check how many clutches and embryos/clutch ``` ## GLMM model generations I created the following models using a GLMM where **for the full, most complete model**, the proportion data is the response variable and treatment, age (time), and their interaction are the predictor variables. I also added clutch as a random effect in these models. The models will be run using a **binomial family distribution**, and all follow a **logit link function** of correction. I also added the variable "initial # eggs" as the weights to ensure proper binomial distribution with our proportion response data. I will call this full complex model "rana". ```{r} library(lme4) #full model with treatment, time, and interaction rana=glmer(percentage ~ treatment*age + (1 | clutch), family = "binomial", data = photopattern, weights = intial...eggs) summary(rana) ``` We can then make a series of more simple, nested models as follows: ```{r} #treatment and time rana2=glmer(percentage ~ treatment + age + (1 | clutch), family = "binomial", data = photopattern, weights = intial...eggs) summary(rana2) #just treatment rana3=glmer(percentage ~ treatment + (1 | clutch), family = "binomial", data = photopattern, weights = intial...eggs) summary(rana3) #just time rana4=glmer(percentage ~ age + (1 | clutch), family = "binomial", data = photopattern, weights = intial...eggs) summary(rana4) ``` ************************* ## Model comparison using **Likelihood Ratio Tests (LRT)** > Note: The logic of the likelihood ratio test is to compare the likelihood of two models with each other, i.e., a model that includes the factor that we are interested in versus a reduced, nested model with that factor excluded. To get our p-values for each predictor variable we will compare models using the anova() function. We always want to put the reduced model first and then the full model after, making sure to use a Chisquared test. First, we can use an information theoretic approach to compare our models. This helps us decide which model fits the data the best: model with **lowest AIC** ```{r} library(AICcmodavg) print(aictab(list(rana, rana2, rana3, rana4), c("rana", "rana2", "rana3", "rana4")), LL = FALSE) ``` We can see that indeed "rana", the full complex model with interactions, is the best model for our data. Next, we can test to see how much more variance is explained by the random effects than the fixed effects alone... ```{r} library(MuMIn) r.squaredGLMM(rana) ``` Now, we can use the "rana" model as our base model to obtain our p-values using the LRTs... ### **EFFECT OF AGE** ```{r} anova(rana3,rana, test = "Chisq") ``` * *X^2* == 10317 * *p-value* == 2.2e-16 *** ### **EFFECT OF TREATMENT** ```{r} anova(rana4,rana, test = "Chisq") ``` * *X^2* == 272.56 * *p-value* == 2.2e-16 *** ### **EFFECT OF TREATMENT*AGE INTERACTION** ```{r} anova(rana2,rana, test = "Chisq") ``` * *X^2* == 92.442 * *p-value* == 2.2e-16 *** ## Model summary alt. Altervatively, we could use the Anova() function to get p-values using a Wald chi-square tests for each of our main effects... ```{r} library(car) # load car package for Anova function on model Anova(rana) #Analysis of Deviance Table (Type II Wald chisquare tests) ``` Regardless, we can see that the results are extremely similar... Thus we will **report results from the LRT** in our manuscript. ********************* ********************* ********************* ## **POST HOC** OF TREATMENTS DIFFERENT TO EACH OTHER ```{r} library(multcomp) summary(glht(rana, linfct=mcp(treatment="Tukey"))) ``` * light-dark **p-value** == 0.236 **NS** * photo-dark **p-value** == <0.001 *** * phoo-light **p-value** == <0.001 *** > Interpretation: This post hoc test shows that the photo and light treatment were significantly different, and photo and dark were different ********************************************* ********************************************* ********************************************* ********************************************* ********************************************* ********************************************* ## **Visualization of Hatching Timing Experiment** > In order visualize this data, we will plot a proportion hatched curve for each treatment. Data points will be mean proportion hatched at each time point in a clutch (with standard error bars between clutches). In order to do this, we will need to "wrangle" our data a bit first: ```{r} head(photopattern) library(dplyr) #allows for data manipulation k= photopattern %>% group_by(treatment, time, age) #group percentage/proportion hatched data by age colum to summarize by data! k k= k %>% summarise(sample_size= n(), mean= mean(percentage), SE= sd(percentage)/sqrt(length(percentage))) # sample size, mean, and SE of percentage by time and treatment! k # new dataset with these variables we created ``` ### Final plot with SE ribbons (shaded error areas) ```{r} library(ggplot2) library(scales) legend_title="Light \ntreatment" g1=ggplot(k, aes(x=age, y=mean)) + #add the axis geom_line(aes(color=treatment), size=1.5) + #make lines different colors... these are lines drawn b/t the means, then make lines thicker... size="must be outside of aes" geom_ribbon(aes(ymin=mean-SE, ymax=mean+SE, group=treatment, fill=treatment), alpha=.5) + # make SE ribbon limits, then make 3 ribbon groups by treatment, the nfill them by color by treatment, alpha is see through-ness labs(y="Mean proportion hatched", x="Age (days)") + #creat labels... theme_classic(base_size=21) + theme(panel.background = element_rect(colour = "black", size=1), legend.position="right", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black")) + scale_x_continuous(breaks= pretty_breaks(n=6)) + # allows for making "pretty" breaks on x axis... only showing 4 and 5 values... scale_y_continuous(breaks= pretty_breaks(n=6)) + # allows for making "pretty" breaks on y axis... shows 5 breaks scale_color_manual(legend_title, values = c("black", "darkorange", "darkgreen")) + scale_fill_manual(guide = FALSE, values = c("black", "darkorange", "darkgreen")) + # remove title for shading colors... geom_rect(aes(xmin = -Inf, xmax = 4.25, ymin = -Inf, ymax = Inf), alpha = 0.009) + geom_rect(aes(xmin = 4.75, xmax = 5.25, ymin = -Inf, ymax = Inf), alpha = 0.009) + geom_rect(aes(xmin = 5.75, xmax = 6.25, ymin = -Inf, ymax = Inf), alpha = 0.009) + geom_rect(aes(xmin = 6.75, xmax = 7.25, ymin = -Inf, ymax = Inf), alpha = 0.009) g1 ``` ********************************************* ********************************************* ********************************************* ********************************************* ********************************************* ********************************************* ## **PEAK HATCHING** Take data from time points 34 and 38 before and after 36 when lights when off to see peak H after onset of darkness... ```{r} library(dplyr) peak = photopattern %>% filter(time == 34 | time == 38) peak ``` Now find the min and max prop hatched at time point 38 and 44 ```{r} minmax_peak= peak %>% group_by(clutch, treatment) %>% summarise(min=min(percentage), max=max(percentage)) %>% mutate(diff = max - min) minmax_peak ``` ### Stats: ```{r} m=lm(data=minmax_peak, diff~treatment) summary(m) library(car) Anova(m) aov(m) summary(aov(m)) #post hoc TukeyHSD(aov(m)) ``` * F value == 20.596; df= 2,30 * overall treatment **p-value** == 1.86e-05 *** * light-dark **p-value** == 0.7229111 **NS** * photo-dark **p-value** == 0.0000054 * phoo-light **p-value** == 0.0000467 > interpretation: Here we see that treatment has an effect on the diff b/t proportion hatched before and after darkness; specifically, the photo treatment was effected very strongly in proportion hatched compared to other treatments by the onset of darkness Check for normality ```{r} plot(fitted(m), residuals(m)) hist(residuals(m)) qqnorm(residuals(m)) shapiro.test(m$residuals) ``` * shapiro test **p-value** == 0.05205 ***NORMALLy DIST** so stats are good! ### Visualization ```{r} mpeak = minmax_peak %>% group_by(treatment) %>% summarise(mean_diff= mean(diff), SE= sd(diff)/sqrt(length(diff)), sample_size=n()) mpeak ``` ```{r} library(ggplot2) legend_title="Light \ntreatment" #make legend title here g = ggplot(data=mpeak, aes(x= treatment, y= mean_diff)) + geom_point(aes(color= treatment), size= 3) + geom_errorbar(aes(ymin=mean_diff-SE, ymax=mean_diff+SE, color = treatment), width= .4, size= 1) + labs(y="Proportion hatched\nbetween 5.67-5.83 days", x="Light treatment") + scale_y_continuous(limits=c(0,0.85),oob = rescale_none, breaks= pretty_breaks(n=4)) + scale_x_discrete(limits= c("dark", "light", "photoperiod")) + # put the x axis stuff in order that I want. scale_color_manual(legend_title,values = c("black", "darkorange", "darkgreen")) + geom_text(aes(y = c(0.23, 0.3, 0.8), label = c("a", "a", "b")), position = position_dodge(width = .8), color= "black", vjust = 0, size = 7) + theme_classic(base_size = 21) + theme(panel.background = element_rect(colour = "black", size=1), legend.position="none", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black")) g ``` ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ## **SYNCHRONY WITHIN clutches** Analysis > here lets find the max proportion hatched (max difference between any two time points) between any 2 time points... ```{r} synchrony = photopattern %>% group_by(treatment) %>% filter(time == 36 | time == 38) synchrony synch_diff= synchrony %>% group_by(treatment, clutch) %>% summarise(mean= mean(percentage), min=min(percentage), max=max(percentage)) %>% mutate(diff = max - min) synch_diff synch_sum = synch_diff %>% group_by(treatment) %>% summarise(mean_diff= mean(diff), SE= sd(diff)/sqrt(length(diff)), sample_size=n()) synch_sum ``` * dark: 40-->42 ---------- 0.29870130 * light: 30-->32 ---------- 0.140259740 * photo: 36-->38 ---------- 0.52987013 > now lets just get that data ```{r} synch = photopattern %>% group_by(treatment) %>% filter(time == 40 & treatment == "dark" | time == 42 & treatment == "dark" | time == 30 & treatment == "light" | time == 32 & treatment == "light" | time == 36 & treatment == "photoperiod" | time == 38 & treatment == "photoperiod") synch synch_diff= synch %>% group_by(treatment, clutch) %>% summarise(mean= mean(percentage), min=min(percentage), max=max(percentage)) %>% mutate(diff = max - min) synch_diff synch_sum = synch_diff %>% group_by(treatment) %>% summarise(mean_diff= mean(diff), SE= sd(diff)/sqrt(length(diff)), sample_size=n()) synch_sum ``` ### Stats: ```{r} m=lm(data=synch_diff, diff~treatment) summary(m) library(car) Anova(m) aov(m) summary(aov(m)) #post hoc TukeyHSD(aov(m)) ``` * *F value* == 5.6252; df== 2,30 * overall treatment **p-value** == 0.008422 ** * light-dark **p-value** == 0.3762132 **NS** * photo-dark **p-value** == 0.1349544 **NS** * photo-light **p-value** == 0.0062725 *** > interpretation: Here we see that treatment has an effect on the max proportion hatched in any given 2 hour time period. The photo treatment was more synchronous than the light treatment, but not than the dark treatment Check for normality ```{r} plot(fitted(m), residuals(m)) hist(residuals(m)) qqnorm(residuals(m)) shapiro.test(m$residuals) ``` * shapiro test **p-value** == 0.6777 ***NORMALLy DIST** so stats are good! ### Visualization ```{r} library(ggplot2) library(scales) legend_title="Light \ntreatment" #make legend title here g = ggplot(data=synch_sum, aes(x= treatment, y= mean_diff)) + geom_point(aes(color= treatment), size= 3) + geom_errorbar(aes(ymin=mean_diff-SE, ymax=mean_diff+SE, color = treatment), width= .4, size= 1) + labs(y="Max proportion hatched\nin any 2 hour period", x="Light treatment") + scale_y_continuous(limits=c(0,0.8),oob = rescale_none, breaks= pretty_breaks(n=4)) + scale_x_discrete(limits= c("dark", "light", "photoperiod")) + # put the x axis stuff in order that I want. scale_color_manual(legend_title,values = c("black", "darkorange", "darkgreen")) + geom_text(aes(y = c(0.42, 0.21, 0.64), label = c("ab", "a", "b")), position = position_dodge(width = .8), color= "black", vjust = 0, size = 7) + theme_classic(base_size = 21) + theme(panel.background = element_rect(colour = "black", size=1), legend.position = "none", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black")) g ``` ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ## **SYNCHRONY ACROSS CLUTCHES** TRIAL ```{r} library(dplyr) df = photopattern %>% group_by(clutch, treatment) %>% mutate(twohour = percentage - lag(percentage, n= 2, default = 0), maximum=max(twohour)) %>% #this creates new column with differences of lagged percentages by clutch for four hour periods filter(twohour == maximum) %>% # include only the maximum values arrange(clutch) %>% dplyr::select(clutch, treatment, age, percentage,twohour, maximum) %>% filter(age==min(age)) # leave initial ages with max prop hatching in 4 hours df ``` ###Visualization ```{r} library(ggplot2) library(scales) legend_title="Treatment" #make legend title here g= ggplot(data=df, aes(x=age, fill=treatment)) + geom_histogram(position= "dodge")+ labs(y="Number of clutches", x="Age (days)") + scale_y_continuous(limits=c(0,11), expand = c(0, 0), oob = rescale_none, breaks= pretty_breaks(n=5)) + # expand thing forces it to start at 0 scale_x_continuous(limits=c(5,6.25), breaks= pretty_breaks(n=4)) + scale_fill_manual(legend_title,values = c("black", "darkorange", "darkgreen")) + theme_classic(base_size = 21) + theme(panel.background = element_rect(colour = "black", size=1), axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black")) # this creates border around whole thing g ``` ******* ****** ****** ###Stats: Kurtosis value for dist. of ages of maximum hatching in 4h period for each treatment ```{r} library(e1071) library(moments) dark_synch= df %>% filter(treatment == "dark") dark_synch light_synch= df %>% filter(treatment == "light") light_synch photo_synch= df %>% filter(treatment == "photoperiod") photo_synch kurtosis(dark_synch$age) kurtosis(light_synch$age) kurtosis(photo_synch$age) ``` * dark == 3.003346 * light == 3.232073 * photoperiod == 9.1 ###Levene Test for Homogeneity of Variance ```{r} dark_light = df %>% filter(treatment == "light" | treatment == "dark") dark_light photo_dark = df %>% filter(treatment == "photoperiod" | treatment == "dark") photo_dark photo_light = df %>% filter(treatment == "photoperiod" | treatment == "light") photo_light #overall: leveneTest(data=df, age~treatment) #dark vs light leveneTest(data=dark_light, age~treatment) #photo vs dark leveneTest(data=photo_dark, age~treatment) #photo vs light leveneTest(data=photo_light, age~treatment) ``` * *F value* == 4.6012; df== 2,30 * overall treatment **p-value** == 0.01808 ** * light-dark *F value* == 0.093; df== 1,20 **p-value** == 0.7635 **NS** * photo-dark *F value* == 7.0203; df== 1,20 **p-value** == 0.01538 ** * photo-light *F value* == 14.76; df== 1,20 **p-value** == 0.001018 **NS** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ## **EARLY HATCHING in DARK treatment** ```{r} head(photopattern) early = photopattern %>% group_by(treatment) %>% filter(age == 5.75) early test1 = early %>% group_by(treatment) %>% summarise(mean= mean(percentage), SE= sd(percentage)/sqrt(length(percentage)), sample_size=n()) test1 ``` Stats ```{r} m=lm(data=early, percentage~treatment) summary(m) library(car) Anova(m) aov(m) summary(aov(m)) #post hoc TukeyHSD(aov(m)) ``` * F value == 0.0019; df= 2,30 * overall treatment **p-value** == 0.9981 **NS** * light-dark **p-value** == 0.9979927 **NS** * photo-dark **p-value** == 0.9998360 **NS** * photo-light **p-value** == 0.9989753 **NS** > interpretation: **proportion** hatched after at 5.75 days old; same in all treatments.... dark does not elicit early hatching... Check for normality ```{r} plot(fitted(m), residuals(m)) hist(residuals(m)) qqnorm(residuals(m)) shapiro.test(m$residuals) ``` * shapiro test **p-value** == 0.01537 ***NOT normally distributed** so stats are NOT good.... ```{r} library(FSA) kwt=kruskal.test(data=early, percentage~as.factor(treatment)) kwt dunnTest(data=early, percentage~as.factor(treatment), method = "bh") ``` * X^2 == 0.096347 * overall treatment **p-value** == 0.953 **NS** * light-dark **p-value** == 1.0000000 **NS** * photo-dark **p-value** == 0.8766644 **NS** * photo-light **p-value** == 1.0000000 **NS** > interpretation: prop hatched at 5.75 days old not different at all.... darkness does not induce early hatching ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ## **HATCHING after day 6 (LATE hatching in LIGHT treatment)** ```{r} head(photopattern) delay = photopattern %>% group_by(treatment) %>% filter(time == 42) %>% mutate(prop_hatched=1-percentage) delay test = delay %>% group_by(treatment) %>% summarise(mean= mean(prop_hatched), SE= sd(prop_hatched)/sqrt(length(prop_hatched)), sample_size=n()) test ``` Stats ```{r} m=lm(data=delay, prop_hatched~treatment) summary(m) library(car) Anova(m) aov(m) summary(aov(m)) #post hoc TukeyHSD(aov(m)) ``` * F value == 6.9731; df==2,30 * overall treatment **p-value** == 0.003258 ** * light-dark **p-value** == 0.0291367 * * photo-dark **p-value** == 0.6589382 **NS** * photo-light **p-value** == 0.0033054 ** > interpretation: **proportion** hatched after 6 days old; more embryos hatched after 6 d in light than in other treatments Check for normality ```{r} plot(fitted(m), residuals(m)) hist(residuals(m)) qqnorm(residuals(m)) shapiro.test(m$residuals) ``` * shapiro test **p-value** == 0.001358 ***NOT normally distributed** so stats are NOT good.... ```{r} library(FSA) kwt=kruskal.test(data=delay, prop_hatched~as.factor(treatment)) kwt dunnTest(data=delay, prop_hatched~as.factor(treatment), method = "bh") ``` * X^2 == 11.476 * overall treatment **p-value** == 0.003222 ** * light-dark **p-value** == 0.027008223 * * photo-dark **p-value** == 0.359018578 **NS** * photo-light **p-value** == 0.003084079 ** > interpretation: **number of embryos** hatched after 6 days old; more embryos hatched after 6 d in light than in other treatments ### Visualization ```{r} legend_title="Light \ntreatment" #make legend title here g = ggplot(data=test, aes(x= treatment, y= mean)) + geom_point(aes(color= treatment), size= 3) + geom_errorbar(aes(ymin=mean-SE, ymax=mean+SE, color = treatment), width= .4, size= 1) + labs(y="Proportion hatched\nafter age 6.0 days", x="Light treatment") + scale_y_continuous(limits=c(0,0.6),oob = rescale_none, breaks= pretty_breaks(n=4)) + scale_x_discrete(limits= c("dark", "light", "photoperiod")) + # put the x axis stuff in order that I want. scale_color_manual(legend_title,values = c("black", "darkorange", "darkgreen")) + geom_text(aes(y = c(.17, .45, .07), label = c("a", "b", "a")), position = position_dodge(width = .8), color= "black", vjust = 0, size = 7) + theme_classic(base_size = 21) + theme(panel.background = element_rect(colour = "black", size=1), legend.position = "none", axis.text.x = element_text(color="black"), axis.text.y = element_text(color="black")) g ```atching Orientation Experiment Data Analysis** ## Data importing First, we can load this straight from my computer as follows... ```{r} #library(readxl) #phototaxis <- read_excel("/Volumes/EggScience 1/Warkentin Lab/2015 Data/Hatching Phototaxis Data/Hatching Orientation Data.xlsx", sheet = "phototaxis") #str(phototaxis) #structure of the data; column names, etc. #head(phototaxis) #preview the data ``` Or, anyone can access this dataset from my personal Github repository as follows: ```{r} library(curl) f <- curl("https://raw.githubusercontent.com/bguell/Light-Cue-Manuscript-Final/master/Hatching%20Orientation%20Data.csv") phototaxis <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE) head(phototaxis) ``` ***** ## 4d Hatching Complications Analysis ### Open water whole clutch vs natural surface area extremes 25% and 50% ```{r} prop.test(x=182, n=205, p=.025) ``` * **p-value** < 2.2e-16 ```{r} prop.test(x=182, n=205, p=.050) ``` * **p-value** < 2.2e-16 > Question: Should this test be one sided to test if its GREATER than the .034 proportion....???? It doesnt change the significant since they are so different, just a formality of the tests we are running... I'll show both below: ### Jelly whole clutch vs open water individual cup Two Sided: ```{r} prop.test(x=6,n=21,p=.034) ``` * **p-value** == 8.272e-09 One Sided: ```{r} prop.test(x=6,n=21,p=.034, alternative = "greater") ``` * **p-value** == 4.136e-09 ***** ## Phototaxis Analysis > Note: Here, for all tests, we are testing the null hypothesis that the proportions of hatching direction was random (i.e. 50% to each side) Control experiment ```{r} prop.test(9,20) ``` * **p-value** == 0.8231 Iteration 1: First set of trials: ```{r} prop.test(30,45) ``` * **p-value** == 0.03689 Iteration 2: Second set of trials: ```{r} prop.test(29,40) ``` * **p-value** == 0.00719 Comparison of two trials: ```{r} prop.test(x=c(29,40), n=c(30,45)) ``` * **p-value** == 0.4343 Since the two sets are data are not significantly different from each other, I've pooled the data and run the test on the whole combined set... Pooled analysis: ```{r} prop.test(59,85) ``` * **p-value** == 0.0005187 ***** ## GLMMs > I created these following models using a GLMM where the "hatching direction" data is the response variable and the side of insertion data as the predictor variable. I also added clutch as a random effect in these models. The models will be run using a **binomial family distribution**, and all follow a **logit link function** of correction. Again we will obtain a p-value using LRTs. ```{r} #some data adjustment so model can run properly phototaxis$Hatching.direction = as.factor(phototaxis$Hatching.direction) phototaxis$test.number = as.character(phototaxis$test.number) #the models: p1=glmer(Hatching.direction ~ Position.Entered + (1|clutch), data=phototaxis, family="binomial") summary(p1) # make a second, simpler model without our variable of interest: p2=glmer(Hatching.direction ~ (1|clutch), data=phototaxis, family="binomial") summary(p2) ``` ### Model comparison using **Likelihood Ratio Tests (LRT)** #### GLM EFFECT OF INSERTION SIDE ON HATCHING DIRECTION ```{r} anova(p2, p1) ``` * **p-value** == 0.4068 **NS** Here we can clearly see that there was no significant effect of position entered on hatching direction Again, we can use the Anova() function for Wald chi-square tests with one function... ```{r} library(car) Anova(p1) ``` * **p-value** == 0.4159 **NS** And we see the same exact results. Thus we will **report results from the LRT** in our manuscript. ************************* ************************* ************************* ## Visualization of Hatching Orientation Experiment First a smiple barplot ```{r} library(dplyr) #allows for data manipulation l= phototaxis %>% group_by(experiment, Hatching.direction) %>% #group hatching direction data by experiment and hatching direction colum to summarize by data! summarise(count= n()) # sample size of experiment and hatching direction! l ``` ```{r} legend_title="Hatching \ndirection" #make legend title here g = ggplot(data=l, aes(x= experiment, y= count, fill=Hatching.direction)) + #use whole data of yes #1 hatched geom_bar(stat="identity", position= "dodge", width = .8) + #group it by cue type with color labs(y="Number of embryos", x="Experiment") + scale_y_continuous(limits=c(0,62),oob = rescale_none, breaks= pretty_breaks(n=4)) + scale_fill_manual(legend_title,values = c("black", "gray")) + geom_text(aes(y = c(13, 11, 28, 61), label = c("a", "a", "b", "c")), position = position_dodge(width = .8), color= "black", vjust = 0, size = 8) + geom_text(aes(y= c(1, 1, 1, 1), label= l$count), position = position_dodge(width = .8), color= "white", vjust = 0, size = 4) + theme_gray(base_size = 22) g ```