Skip to content
Back to formatted view

Raw Message

Message-ID: <165655F6-AE63-4519-9534-E19D77DA588B@comcast.net>
Date: 2011-12-27T03:38:43Z
From: David Winsemius
Subject: Summary tables of large datasets including character and numerical variables
In-Reply-To: <1324896293510-4234296.post@n4.nabble.com>

On Dec 26, 2011, at 5:44 AM, sparandekar wrote:

> Hello !
>
> I am attempting to switch from being a long time SAS user to R, and  
> would
> really appreciate a bit of help ! The first thing I do in getting a  
> large
> dataset (thousands of obervations and hundreds of variables) is to  
> run a SAS
> command PROC CONTENTS VARNUM command - this provides me a table with  
> the
> name of each variable, its type and length;  then I run a PROC MEANS  
> - for
> numerical variables it gives me a table with the number of non-missing
> values, min, max, mean and std. dev.  My data usually has errors and  
> this
> first step helps me to spot the errors and 'clean' the dataset.
>
> The 'summary' function in R and other function as part of Hmisc or  
> Psych
> package do not work for me.
>
> How can I get a table from an R data.frame that has the following  
> structure
> (header row and example).
>
> Rowname  Character/Integer  Length   Non-Missing    Minimum
> Maximum              Mean                   SD
>
> HHID            Integer                       12            32,344
> 114455007701   514756007812       2.345 x 10^10    1.456 x 10^10
> Head            Character                   38            24,566
> -                                   -                         -
> -

I generally use ( in order of increasing information content and  
increasing length of output):

names(dfrm)

str(dfrm)

Hmisc::describe(dfrm)

(Several other packages have their own versions of 'describe'.)

-- 

David Winsemius, MD
West Hartford, CT