Statistics How To: Elementary Statistics for the rest of us!

0

 Alternate forms reliability is a measure of reliability between two different forms of the same test. Two equivalent (but different) tests are administered, scores are correlated, and a reliability coefficient is calculated. A test would be deemed reliable if differences in one test’s observed scores correlate with differences in an equivalent test’s scores.

 Parallel forms are very similar, but with one major difference: the observed score has the same mean and variance. This isn’t a requirement for alternate forms reliability, which just uses different versions of the same test. That said, you can only interpret correlation between tests in a meaningful way if the alternate forms are also parallel. Proving that two tests are parallel is practically impossible (Furr & Bacharach, 2008); Although interpreting correlations is theoretically possible, it isn’t usually a feasible “real life” option. In addition, although two tests might seem equivalent, a different question here and there might results in the test measuring completely different constructs.

 As noted above, it’s extremely challenging to interpret reliability with parallel forms. However, you can take several steps to ensure that your reliability estimate is as good as possible:

 Practice and transfer effects can be eliminated if half the subjects take test A followed by test B, and half the subjects take test B followed by test A. Note that although this seems a little strange (what’s the point in subjects taking two different tests instead of one?), remember that you’re assessing reliability here, not subject performance. Once you’ve determined that the tests are reliable, you can administer test A or test B to a subject, with the knowledge that the two tests are equivalent in every way.

 If you’re working on data analysis, there are many tools available to provide insights to your data. These tools include ANOVA and regression analysis. At first glance, the two methods may look similar—so similar in fact, that you wouldn’t be the first to completely confuse the two.

 Both result in continuous output (Y) variables. And both can have continuous variables as (X) inputs—or categorical variables. If you use exactly the same structure for both tests (see the demonstration of dummy coding here for an example), they are effectively the same; In fact, ANOVA is a “special case” of multilevel regression.

 The preferred inputs for ANOVA are categorical variables. You can think of ANOVA as a regression with a categorical predictors (Pruim, n.d.). However, you can choose to use continuous variables. The opposite is true: use continuous variables for regression with categorical variables as a second option. The reason that categorical variables are a second option in regression analysis is that you can’t just plug in categorical data into your regression model; You have to code dummy variables first. Dummy coding is where you give your categorical variables a numeric value, like “1” for black and “0” for white.

 A bounded region has either a boundary or some set of or constraints placed upon them. In other words, a bounded shape cannot be an infinitely large area—it’s defined by a set of measurements or parameters. A square, drawn on a Cartesian plane, has a natural boundary (four sides). Other shapes and surfaces can be more challenging to visualize. For example, the surface area of a cylinder has constraints of length, height and circumference.

 In general, look for regular shapes (triangles, rectangles, squares) or as close to regular shapes as you can get (like the “curvy” triangle A). They should have bases that follows a line horizontal to the x-axis; These shapes are easier to integrate. Caution: Sometimes it’s actually easier to divide the shape up horizontally, instead of the vertical slice shown above. Refer to this pdf (from MIT) for an example of when you would want to slice horizontally.

 The left-hand bound is easy to see from the graph (x = 0); The right-hand side is x = 4; You can find it with the intersection feature of a graphing calculator or with algebra (See: How to find the intersection of two lines).

 When you integrate √(x) + 1 along the x-axis, you’ll get the entire area on the left. But you need to find the area A on the right; In order to do that, you also have to integrate the function y = 1 then subtract the two areas.

 Solving the integral (using the power functions rule and the fact that the integral of a constant function is equal to c x. For example, the integral of f(x) = 10 is 10x). We get an area of 16/3.

 Step 5: Repeat steps 3 and 4 for the remaining shapes. For this example we only have one remaining shape (with integral bounds of 4 to 6). Integrating area B, we get 2.

 Ascertainment bias happens when the results of your study are skewed due to factors you didn’t account for, like a researcher’s knowledge of which patients are getting which treatments in clinical trials or poor Data Collection Methods that lead to non-representative samples.

 Ascertainment bias in clinical trials happens when one or more people involved in the trial know which treatment each participant is getting. This can result in patients receiving different treatments or co-treatments, which will distort the results from the trial. A patient who knows they are receiving a placebo might be less likely to report perceived benefits (the “placebo effect“).

 The effect isn’t limited to the person giving the treatment and the person receiving it: even the person writing up the results of the trial can introduce ascertainment bias if they know which people are getting which treatments. The best way to prevent this from happening is by using blinding and allocation concealment.

 Ascertainment bias can happen in experiments during data collection; it is a failure to collect a representative sample, which skews the results of your studies. For example, the sex ratio[1] for the entire world population is approximately 101 males to 100 females. Let’s say you wanted to recalculate this figure by taking a sample of 1,000 women at your women-only college and asking them how many male and female children are in their family. The result of this survey will show a heavy bias towards women, because of the simple fact that all the women have at least one female (themselves) in their family. The survey excludes any family where there are only male children. Although this is an extreme example, having uneven numbers (i.e. 400 women and 600 men) will still introduce bias into your results.

 The assumption of independence is used for T Tests, in ANOVA tests, and in several other statistical tests. It’s essential to getting results from your sample that reflect what you would find in a population. Even the smallest dependence in your data can turn into heavily biased results (which may be undetectable) if you violate this assumption.

 A dependence is a connection between your data. For example, how much you earn depends upon how many hours you work. Independence means there isn’t a connection. For example, how much you earn isn’t connected to what you ate for breakfast. The assumption of independence means that your data isn’t connected in any way (at least, in ways that you haven’t accounted for in your model).

Statistics Magic

 The observations between groups should be independent, which basically means the groups are made up of different people. You don’t want one person appearing twice in two different groups as it could skew your results.

 The observations within each group must be independent. If two or more data points in one group are connected in some way, this could also skew your data. For example, let’s say you were taking a snapshot of how many donuts people ate, and you took snapshots every morning at 9,10, and 11 a.m.. You might conclude that office workers eat 25% of their daily calories from donuts. However, you made the mistake of timing the snapshots too closely together in the morning when people were more likely to bring bags of donuts in to share (making them dependent). If you had taken your measurements at 7, noon and 4 p.m., this would probably have made your measurements independent.

 Unfortunately, looking at your data and trying to see if you have independence or not is usually difficult or impossible. The key to avoiding violating the assumption of independence is to make sure your data is independent while you are collecting it. If you aren’t an expert in your field, this can be challenging. However, you may want to look at previous research in your area and see how the data was collected.

 An autoregressive (AR) model predicts future behavior based on past behavior. It’s used for forecasting when there is some correlation between values in a time series and the values that precede and succeed them. You only use past data to model the behavior, hence the name autoregressive (the Greek prefix auto– means “self.” ). The process is basically a linear regression of the data in the current series against one or more past values in the same series.

 In an AR model, the value of the outcome variable (Y) at some point t in time is — like “regular” linear regression — directly related to the predictor variable (X). Where simple linear regression and AR models differ is that Y is dependent on X and previous values for Y.

 The AR process is an example of a stochastic process, which have degrees of uncertainty or randomness built in. The randomness means that you might be able to predict future trends pretty well with past data, but you’re never going to get 100 percent accuracy. Usually, the process gets “close enough” for it to be useful in most scenarios.

 An AR(p) model is an autoregressive model where specific lagged values of yt are used as predictor variables. Lags are where results from one time period affect following periods.

 The value for “p” is called the order. For example, an AR(1) would be a “first order autoregressive process.” The outcome variable in a first order AR process at some point in time t is related only to time periods that are one period apart (i.e. the value of the variable at t – 1). A second or third order AR process would be related to data two or three periods apart.

 An axis of rotation (also called an axis of revolution) is a line around which an object rotates. In calculus and physics, that line is usually imaginary. The radius of rotation is the length from the axis of rotation to the outer edge of the object being rotated.

 A simple example is one axle or hinge that allows rotation, but not translation (movement). The following image shows a two-dimensional shape (a half bell) rotating around a single, vertical axis of rotation. If the shape travels 360 degrees, the result is a three-dimensional bell:

 The disc method or washer method are used to find the volume of objects of revolution in calculus. The disc method is used for solid objects, while the washer method is a modified disc method for objects with holes. More specifically:

 Basis functions (called derived features in machine learning) are building blocks for creating more complex functions. In other words, they are a set of k standard functions, combined to estimate another function—one which is difficult or impossible to model exactly.

 For example, individuals powers of x— the basis functions 1, x, x2, x3…— can be strung together to form a polynomial function. The set of basis functions used to create the more complex function is called a basis set.

 It’s possible to create many complex functions by hand; IDeally, you’ll want to work with a set of as few functions as possible. However, many real-life scenarios involve thousand of basis functions, necessitating the need for a computer.

 B-Spline basis: a set of k polynomial functions, each of a specified order d. An order is the number of constants required to define the function (Ramsay and Silverman, 2005; Ramsay et al., 2009). Popular for non-periodic data.

 Fourier basis: a set of sine functions and cosine functions: 1, sin(ωx), cos(ωx), sin(2ωx), cos(2ωx), sin(3ωx), cos(3ωx)&hekkip;. These are often used to form periodic functions. Derivatives for these functions are easy to calculate but aren’t suitable for modeling discontinuous functions (Svishcheva et al., 2015).

Post a Comment

0Comments
* Please Don't Spam Here. All the Comments are Reviewed by Admin.
Post a Comment (0)

#buttons=(Accept !) #days=(30)

Our website uses cookies to enhance your experience. Learn More
Accept !
To Top