目的：生成某特定随机变量的随机数

简介

XR1 <- rnorm(100000)
hist(XR1)

XR2 <- runif(100000)
hist(XR2)

R 中生成随机数的函数：

rnorm: generate random Normal variates with a given mean and standard deviation
dnorm: evaluate the Normal probability density (with a given mean/SD) at a point (or vector of points)
pnorm: evaluate the cumulative distribution function for a Normal distribution
rpois: generate random Poisson variates with a given rate

Probability distribution functions usually have four functions associated with them. The functions are prefixed with a

d for density
r for random number generation
p for cumulative distribution
q for quantile function

 Nsim=10^4 #number of random numbers
 x=runif(Nsim)
 x1=x[-Nsim] #vectors to plot
 x2=x[-1] #adjacent pairs
 par(mfrow=c(1,3))
 hist(x)
 plot(x1,x2)
 acf(x)

随机数的生成都依赖均匀分布的随机数，但均匀分布的随机数所有的算法都是生成伪随机数，只要生产函数的种子确定了，随机数的序列就确定了。

set.seed(1)
runif(5)

## [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819

set.seed(1)
runif(5)

## [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819

set.seed(2)
runif(5)

## [1] 0.1848823 0.7023740 0.5733263 0.1680519 0.9438393

反函数法

下面假定 \(F(x)\) 严格单调：

原理

随机变量 \(X\sim F(x)\)，则 \(F(X)\) 服从均匀分布 \(U(0,1)\)，也就是: \(F^{-1}(U)\sim F(x)\)。

证明：设\(X\sim F^{-1}(U)\)，其分布函数为： \[\begin{align*} F_X (x)&=P\{X\le x\}\\ &=P\{F^{-1}(U)\le x\}\\ &=P\{F(F^{-1}(U)) \le F(x)\}\\ &=P\{U\le F(x) \}\\ &=F(x) \end{align*}\]

例1

假定 \(X\sim Exp(1)\)， \(F(x)=1-e^{-x}\)，反函数为\(x=-log(1-u)\)，所以：假定 \(U\sim U(0,1)\) 则：\(X=-logU\sim Exp(1)\)

set.seed(30)
Nsim=10^6
U=runif(Nsim)
X=-log(U) #transforms of uniforms
Y=rexp(Nsim) #exponentials from R
par(mfrow=c(1,2)) #plots
hist(X,freq=F,breaks=18,main="Exp from Uniform")
hist(Y,freq=F,breaks=18,main="Exp from R")

练习：

假定 \(X_i\sim Exp(1)\) 且相互独立，求下列分布： \[\begin{align*} Y=&\sum_{j=1}^n X_j\\ Y=&b\sum_{j=1}^a X_j\\ Y=&\frac{\sum_{j=1}^a X_j}{\sum_{j=1}^{a+b} X_j}\\ \end{align*}\]

正态分布的生成

Box-Muller 算法：

假定： \(U_1,U_2\sim U(0,1)\) 且相互独立，则：

\[\begin{align*} X_1&=\sqrt{-2log(U_1)} cos (2\pi U_2)\\ X_2&=\sqrt{-2log(U_1)} sin (2\pi U_2) \end{align*}\] 相互独立且 \(\sim N(0,1)\) 。

筛法

假设我们已经有了一种生成密度函数为\(g(x)\) 的随机变量的方法，现在利用这种方法生成密度函数为 \(f(x)\) 的随机变量。

筛法步骤

step 1: Generate \(Y\) having density \(g\).
step 2: Generate a random number \(U\).
step 3: If \(U\le \frac{f(Y)}{Mg(Y)}\), set \(X = Y\) . Otherwise, return to Step

筛法原理

我们先生成来自 \(g(x)\) 的随机变量 \(Y\)，然后以正比于 \(F(Y)\over g(Y)\) 的概率接收此值。

设： \[\frac{{f(y)}}{{g(y)}} \le M,\quad \forall y\]

证明： sim2

例2：

使用参数为1的指数分布，利用筛法求下列密度函数（半对数分布）的随机数：已知，参数为 \(\lambda\) 的指数分布的密度为：可得：对于所有的 \(x\ge 0\) : 此处输入图片的描述上述函数在 \(\lambda=1\) 时取最小值：此时： \[M= \sqrt{2\over{\pi }}exp({1\over 2})\]

设计R 程序实现上述过程：

set.seed(20)

f <- function(x) {
  return((x>0) * 2 * dnorm(x,0,1))
}
g <- function(x) { return(dexp(x,1)) }
c <- sqrt(2 * exp(1) / pi)

rhalfnormal <- function(n) {
  res <- numeric(length=n)
  i <- 0
  while (i<n) {
    U <- runif(1, 0, 1)
    X <- rexp(1, 1)
    if (c * g(X) * U <= f(X)) {
      i <- i+1
      res[i] <- X;
    }
  }
  return(res)
}

X <- rhalfnormal(10000)
hist(X, breaks=50, prob=TRUE, ylim=c(0,1),
     main=NULL, col="gray80", border="gray20")
curve(f, min(X), max(X), n=500,
      ylim=c(0,1), ylab="f", add=TRUE)

混合分布

\[ X = \left\{ \begin{array}{ll} X_1 & \alpha\\ X_2 & 1-\alpha \end{array} \right. \]

某个随机变量以 \(\alpha\) 的概率来自随机变量 \(X_1\)，而以 \(1-\alpha\) 的概率来自随机变量 \(X_2\)

算法：

产生随机变量\(X_1\)
产生随机变量\(X_2\)
产生均匀分布随机数 \(U\): 如果 \(U\le \alpha\)，则令 \(X=X_1\)，否则令 \(X=X_2\).

课堂作业

使用system.time比较生成正态随机数的效率：

中心极限定理
Box-Muller方法
使用rnorm生成

system.time(rnorm(1000000))

##    user  system elapsed 
##    0.07    0.00    0.08

结果：

中心极限定理：

system.time(f1(nsim))

##    user  system elapsed 
##    3.79    0.00    3.79

Box-Mulur:

system.time(f2(nsim))

##    user  system elapsed 
##    0.17    0.00    0.17

rnorm:

system.time(rnorm(nsim))

##    user  system elapsed 
##    0.07    0.01    0.08

par(mfrow=c(1,3))
hist(f1(nsim))
hist(f2(nsim))
hist(rnorm(nsim))

返回课程主页。

随机变量的生成

王宁宁

简介