*------------------------------------------------------------------------------------ BIOS 524: Biostatistical Computing Example of selecting a stratified random sample using SURVEYSELECT. *-----------------------------------------------------------------------------------*; libname classlib "c:\bios524\classlib"; options nodate pageno=1; *** Get all data from the class Health Survey, sort it by income level. ***; proc sort data=classlib.hlthsrvy out=hlthsrvy; by income; run; *** Create the Physical Component Score (PCS) value for each subject. ***; data hlthsrvy; set hlthsrvy; PCS=0.1825*PF+0.1039*RP+0.1348*BP+0.1237*GH+.0138*VT -.0034*SF-.0582*RE-.1225*MH+20.1360; run; *** What is the mean PCS for each income level? ***; proc means data=hlthsrvy missing; class income; var PCS; run; *** What is the overall PCS mean? ***; proc means data=hlthsrvy; var PCS; run; *** Select a stratified random sample, using income-levels as strata ***; proc surveyselect data=hlthsrvy out=srvysamp /* Specify input and output data sets */ /**/ samprate=100 nmax=10 /**/ /* Select all subjects in each stratum, up to 10 */ /* samprate=0.25 /**/; /* Or, use a sampling rate of 25% */ strata income; ** Specify stratification variable; id PCS; ** Keep this variable in the output data set; run; *** How many subjects did we select from each stratum? ***; proc freq data=srvysamp; tables income/missing; run; *** What are the mean and standard error of PCS? What are the 95% confidence limits for the mean? ***; proc means data=srvysamp n mean stderr clm; var PCS; ** Are the results biased?; run; *** Use sampling weights to make the sample better represent the population ***; proc means data=srvysamp n mean stderr clm; var PCS; ** Yields an unbiased mean estimate. What about the STDERR?; weight samplingweight; ** Adjusts the mean to better represent the population; run; *** Use V8 SAS Survey Means to estimate the mean and provide proper standard errors; proc surveymeans data=srvysamp missing /* Missing, includes missing values for Strata */ rate=srvysamp(rename=(selectionprob=_rate_)); ** Use selection rates to adjust the STDERR; strata income; var PCS; weight samplingweight; run;