Sudaan version 10 and higher offers analysis capabilities that include crosstabulation, frequency, ratio, and multiple regression modeling techniques. This can be the result of stratified sampling, cluster sampling, subpopulation oversampling, designed unequal probability sampling, and so on. This paper presents practical guidance on three common survey data analysis techniques. This example was run in sascallable sudaan, and the sas program and. This paper presents how to conduct a general subpopulation analysis for jackknife replicate weight designs using the health information national. It is recommended that you use sas to do as many of the problems as possible. Estimating lorenz and concentration curves in stata. Calculating subset weighted analysis using proc surveyfreq and. In some survey data analysis programs, such as sudaan, this information will be.
The subpopulation is defined as white mracrpi21 and at least 25 years old. The sascallable code for this example is presented in exhibit 1. How to prepare and analyze pair data in the national. Estimating lorenz and concentration curves in stata ben jann this paper is forthcoming in the stata journal. Without the rlabel statement, sas variable labels would be produced if already defined. How can i analyze a subpopulation of my survey data in. Basic statistical and modeling procedures using sas onesample tests the statistical procedures illustrated in this handout use two datasets. Description subpopulation estimation focuses on part of the population. Hypothesis testing for contrasts of subpopulation estimates. In sas, the domain statement is used to identify records to include in analyses. This entry discusses subpopulation estimation and explains why you should use the subpop option instead of if and in for your survey data analysis.
Variances will be computed using the taylor linearization method, assuming a with replacement wr design sample weight. Comparing results from cox proportional hazards models. Each analysis application emphasizes procedure selection, syntax generation, and interpretation of results. How to prepare and analyze pair data in the national survey on drug use and health substance abuse and mental health services administration center for behavioral health statistics and quality. In statistics, a population is a set of similar items or events which is of interest for some question or experiment. The subpopx statement is used to subset the data set to adults with arthritis, because these are the only. A guide to design, analysis, and discovery chapter 15. A statistical population can be a group of existing objects e. Coxs semiparametric model is widely used in the analysis of survival data to explain the effect of explanatory variables on survival times. The remainder of the paper consists of examples that demonstrate these procedures.
Subpopulation treatment effect pattern plot stepp for. Nces 2014364 will help you use subsetting commands in sas and stata to. However, the subpopulation analysis for jackknife repeated replication is not available in sas 9. Using stata for subpopulation analysis of complex sample. Analysis of complex survey data with sas lex jansen. These sample survey procedures use sas syntax that will be familiar to those. Request pdf using stata for subpopulation analysis of complex sample survey data in this presentation, i provide an overview of important considerations that analysts of large publicuse. We chose to take 200 patients in each subpopulation and 199 overlapping patients in tfd analysis. Several statistical packages, including stata, sas, sudaan. Subpopulation estimation from the technical point of view. As stated in the stata survey manual, when the subpop option is used, the subpopulation is actually defined by the 0s false, which indicate those cases to be excluded from the subpopulation.
The formation of these subpopulations of interest might be unrelated to the sample design. Jackknife and balanced repeated replication methods for variance estimation, subpopulation analysis with use of the domain option and the subsetting approach and the use of the. The paper analyzes a simulated data set and considers questions related to estimating the effect. The domain statement lists the variables that define domains for subpopulation analysis. The analysis of subpopulations is one place where survey data and.
The first, pulse, has information collected in a classroom setting, where students were asked to take their pulse two times. Number of observations for subpopulation estimation. In general, first a data file must be created using a data step. Stepp analysis results were suggestive of heterogeneous relative treatment effects, although this was not statistically significant p. An introduction to the sas survey analysis procs lex jansen. The survival time of each member of a population is assumed to follow its own hazard. To include a finite population correction fpc in taylor series or bootstrap variance estimation, you can input either the sampling rate or the population total by using the rate or total option, respectively, in the proc surveymeans statement. Subdomain analyses with proc glimmix apr 2, 2019 1. July 16 thurs lecture 89 west regression analysis of complex sample survey data.
Sudaan sudaan version 10 and higher is designed to analyze data from complex surveys and experimental studies. A closer examination of subpopulation analysis of complex. Continuous factor analysis lisrel discrete fa irt item response discrete latent profile growth mixture latent class analysis, regression general software. Introduction to survey sampling and analysis procedures book excerpt. This example was run in sas callable sudaan, and the sas program and. In stata, this command is subpop, while in sas the command is domain.
Unequal probability of selection is an inevitable feature of complex sampling surveys. Before the introduction of the survey procedures, sas could not be used for variance. Part iii contains appendices dealing with more advancedfeatures of sas, such as matrix algebra. Proc surveymeans also performs poststratification, which adjusts the sampling weights so that their distribution matches known auxiliary information.
Applications focus on key sas survey procedures including proc surveymeans, proc surveyfreq, proc surveyreg, proc surveylogistic, and proc surveyphreg. Domain analysis takes this variability into account by using the entire sample to estimate the variance of domain estimates. Ucla further adds that using if in the subpop option does not remove cases from the analysis. We are assuming sascallable sudaan program is installed in your computer. Most of code shown in this seminar will work in earlier versions of sas and sas stat. Poststratification is often used to improve the efficiency of the analysis and adjust for. Sas manual for introduction to thepracticeofstatistics. If only one subpopulation analysis is desired, a sas technique that assigns a minuscule weight to all nondomain cases in a dataset step prior to the analysis limits the analysis to a selected domain and eliminates unnecessary processing. Integrating the pdf over a range of survival times gives the probability of observing a survival time within that interval. Appendices a and b are based on more advanced material from references 1 and 2 in appendix e. Cdc produced a report software for analyis of yrbs data, describing the use of sas, sudaan, stata, spss, and epi info for analyzing data. The correct bibliographic citation for this manual is as follows. Keeping n1 close to n2 produces limited patient turnover from window to window, thus smoothing enough the resulting curve.
The correct bibliographic citation for the complete manual is as follows. The cluster statement specifies cluster identification variables in a clustered sample design. Remarks and examples subpopulation estimation involves computing point and variance estimates for part of the. Analysis of survey data although some survey data analysis procedures are available in sas 9. The phreg procedure performs regression analysis of survival data based on the cox proportional hazards model.
Comparing results from cox proportional hazards models using. For the subpopulation with the highest ki67 li, the hazard ratio of a dfs event for patients in the letrozole group was less than half that for patients in the tamoxifen group. You cannot specify both of these options in the same proc surveymeans statement. The sascallable sudaan code for this example is displayed in exhibit 1. The sas manual suggests that using a where statement or creating a dataset with. Analysis of survey data using the sas survey procedures. This document is an individual chapter from sasstat 9. However, the subpopulation analysis for jackknife repeated replication is not available. With complex survey data, the analyst explicitly acknowledges that the data were. Sas versions 8 and higher include special sample survey procedures that are appropriate for analyzing complex survey data. The purpose of this workshop is to explore some issues in the analysis of survey data using sas 9.
For many organizations, the complexity and volume of their data has outgrown the capabilities of other statistical software. How to prepare and analyze pair data in the national survey. The formulas for the estimates of a total and its variance for a simple sampling design with no clusters or stratification are shown in the entry for subpopulation estimation in the svy survey data reference manual and are transcribed below. Introduction to sas for data analysis uncg quantitative methodology series 8 composing a program sas requires that a complete module of code be executed in order to create and manipulate data files and perform data analysis. The weight statement names the sampling weight variable. To avoid confusion between the two uses, the term analysis class will replace the term analysis domain in. Software for analysis of yrbs data centers for disease. The following sas statements perform the analysis of the survey data. Subpopulation command nces us department of education. Half the class was asked to run in place between the two readings and the other.
Sasstat software fact sheet organizations in every field depend on data and analysis to provide new insights, gain competitive advantage and make informed decisions. Multilevel modeling with complex survey data 263 analysis. The mixtran, distrib, and indivint sas macros are provided as separate programs. Non0 values are included in the analysis, except for missing values, which are excluded from the analysis. Interim analysis group sequential design randomization interim 1 finalinterim 2 chemotherapy new drug figure from. Subpopulation definition of subpopulation by the free. See chapter 85, the surveymeans procedure, for more information. The surveymeans procedure can also apply poststrati. Sascallable sudaan code for data manipulation and crosstab. These programs can be saved in a sas macro library, or called using the %include statement in a sas program. Subpopulation command national center for education. This scenario provides relatively precise estimates of treatment effect and a large number of windows.
A part or subdivision of a population, especially one originating from some other population. Threelevel analysis where time is the first level, individual is the second level, and cluster is the. Analysis for domains that uses the entire sample is called domain analysis also called subgroup analysis, subpopulation analysis, or subdomain analysis. See chapter 96, the surveymeans procedure, for more information.
The domain statement is available only for the taylor series method. The rate and total options apply only to taylor series or. Sas manual for introduction to thepracticeofstatistics third. Domain analysis is also known as subgroup analysis, subpopulation analysis, and subdomain analysis. How can i analyze a subpopulation of my survey data in stata. Thus, when data on an analysis variable are missing for all respondents in one. To find out what version of sas and sas stat you are running, open sas and look at the information in the log file. In this article, we aim to focus in more detail on the practical issues underlying subpopulation analysis of complexsample survey data.
The ratio statement requests ratio analysis for means or proportions of analysis variables. Commonly used techniques such as subpopulation analyses and hypothesis tests are included. We use the subpopx statement to subset the analysis to the subpopulation of interest. When the subpopulation options is used, only the cases defined by the subpopulation are used in the calculation of the estimate, but all cases are used in the calculation of the standard errors. Longitudinal models are twolevel models in conventional multilevel programs, whereas they are singlelevel models in mplus. The 2x2 table to be used for estimation of odds ratios and relative risks is defined by the last two variables. Sas does not have a statement that allows a subpopulation e. In january 2009, cdc produced a report software for. Group sequential trial with a biomarker subpopulation. The analysis of domains, sometimes called subpopulations, is discussed. Basic statistical and modeling procedures using sas. For more information about domain analysis, see lohr, cochran, fuller et al. Oct 10, 2010 for the subpopulation with the highest ki67 li, the hazard ratio of a dfs event for patients in the letrozole group was less than half that for patients in the tamoxifen group.