Data initialization {sjPlot/sjmisc}

This document shows basic usage of the sjmisc package and how to prepare data labels for use with the functions of the sjPlot package.

Ressources:

(back to table of content)

Reading and preparing data

When visualizing data - for instance, frequencies of factor variables with labels - plots automatically use axis labels depending on the factor levels.

x <- factor(sample(1:2, 200, replace = T, prob = c(0.6, 0.4)), labels = c("female", "male"))
plot(x)

The same applies to functions from the sjPlot package.

library(sjPlot)
sjp.frq(x)

However, when reading data files - especially SPSS data etc. - variables have numeric values and are not labelled factors. Instead, the imported variables have additional attributes for the value and variable labels. See following example, taken from the sample data set in the sjmisc package, that contains data read from an SPSS file:

library(sjmisc)
data(efc)
str(efc$e42dep)
##  atomic [1:908] 3 3 3 4 4 4 4 4 4 4 ...
##  - attr(*, "label")= chr "elder's dependency"
##  - attr(*, "labels")= Named num [1:4] 1 2 3 4
##   ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"

The function of the sjPlot-package make use of these attributes, i.e. most functions automatically read these attributes and use them as labels for plots and tables, which saves a lot of work when annotating a figure etc.

Example: Reading SPSS data

Thus, before using functions of the sjPlot-package, it might be useful to assign value and variable labels to variables (vectors) or data frames. If you are using the sjmisc::read_spss function to import SPSS data (or the related read_* functions), value and variable labels are automatically attached to the data frame.

my_dataframe <- read_spss("path/to/spss-file.sav")

The sjmisc::read_* functions are convenient wrapper function for the haven and foreign packages. By default, sjmisc::read_* uses the haven-package to read data. This package adds both value and variable label attributes to each variable in the imported data frame. If you prefer using the foreign-package to read data, variable label attributes are added as data frame attribute, however, variables themselves do not have attached variable label attributes. With the autoAttachVarLabels parameter, variable labels will be automatically attached, too.

my_dataframe <- read_spss("path/to/spss-file.sav",    # file path to sav-file
                          option = "foreign",         # force to use foreign-package
                          enc = "UTF-8",              # may be necessary
                          autoAttachVarLabels = TRUE) # attach variable labels
                                                      # to vectors as well

With attached value and variable labels, most functions of the sjPlot-package automatically detect labels and use them as axis, legend or title labels in plots (sjp.-functions) respectively as column or row headers in table outputs (sjt.-functions).

Note that factor variables do not necessarily be converted to numeric vectors. Factor levels will automatically be used as variable labels (see very first example above).

Overview of read- and labelling-functions in the sjmisc-package

Attaching labels from data frame attributes

When importing a data set with the read_spss function (or the related read_* functions like read_sas), value and variable label attributes will be automatically attached to vectors.

You can retrieve value and variable labels of data frames (that have been imported with either sjmisc-, haven- or foreign-package - you don’t need to stick to the sjmisc-package to read data and benefit from the label-detection in sjPlot) with get_labels and get_label. Value and variables labels can be attached to a data frame using set_labels and set_label.

Manually attach labels

You can also manually attach value and variable labels with the above shown functions:

# load libraries
library(sjPlot)
library(sjmisc)
# init default theme for plots
sjp.setTheme(geom.label.size = 2.5, axis.title.size = .9, axis.textsize = .9)
# create dummy variable
dummy <- sample(1:4, 200, replace = TRUE)
# show frequency table, w/o value labels
sjp.frq(dummy)

# manually attach value and variable labels
dummy <- set_labels(dummy, c("very low", "low", "mid", "hi"))
dummy <- set_label(dummy, "This is a dummy")
# check structure of dummy
str(dummy)
##  atomic [1:200] 3 1 1 1 2 3 1 4 4 3 ...
##  - attr(*, "labels")= Named num [1:4] 1 2 3 4
##   ..- attr(*, "names")= chr [1:4] "very low" "low" "mid" "hi"
##  - attr(*, "label")= chr "This is a dummy"
# show frequency table, with value labels
# setting title to NULL will automatically use 
# variable label as title
sjp.frq(dummy, title = NULL)

Use case

If the data you use with the sjPlot-package has attached value and variable labels, you don’t need to specify these information within function calls. See the following example that shows how you can save work if you have attached label attributes:

# load sample data set. this data frame has value and variable
# label attributes that can be accessed with "get_labels"
# and "get_label"
data(efc)
# Function call when label attributes are attached
sjp.xtab(efc$e42dep, efc$e16sex)
# Equivalent function call when label attributes are not attached, 
# if axis labels should be printed
sjp.xtab(efc$e42dep, 
         efc$e16sex, 
         axisLabels.x = c("independent", 
                          "slightly dependent", 
                          "moderately dependent", 
                          "severely dependent"), 
         legendLabels = c("male", "female"))

The next two examples demonstrate how you can save time, because labels don’t have to be specified each time you want to plot a figure.

Function call with automatic label detection

Here is a function call that demonstrates the automatic label detection:

sjp.xtab(efc$e42dep, efc$e16sex)

Function call with manually defined axis and legend labels

In this example, the value and variable labels are passed as parameters to the function:

sjp.xtab(efc$e42dep, efc$e16sex, 
         axis.labels = c("independent", "slightly dependent", 
                          "moderately dependent",  "severely dependent"), 
         axis.titles = "how dependent is the elder? - subjective perception of carer",
         legend.labels = c("male", "female"),
         legend.title = "elder's gender")

Converting data to sjPlot

There are some packages that add specific class-attributes to vectors, for instance the haven- or Hmisc-package, which create labelled-class objects when creating new (labelled) variables or reading data.

If you consider any problems with objects of class labelled or to avoid problems and incompatibilities with haven-imported data, there’s a function to ‘convert’ labelled objects into an sjPlot-friendly format, unlabel. When using the sjmisc::read_spss function, this conversion is done automatically.

The original haven-structure of imported data:

str(mydf$e42dep)
## Class 'labelled'  atomic [1:908] 3 3 3 4 4 4 4 4 4 4 ...
##   ..- attr(*, "label")= chr "how dependent is the elder? - subjective perception of carer"
##   ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
##   .. ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"

The result after conversion:

str(unlabel(mydf$e42dep))
##  atomic [1:908] 3 3 3 4 4 4 4 4 4 4 ...
##  - attr(*, "label")= chr "how dependent is the elder? - subjective perception of carer"
##  - attr(*, "labels")= Named num [1:4] 1 2 3 4
##   ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"

unlabel either accepts a single vector or a complete data frame as parameter, and simply removes the labelled class-attribute from vectors.

Writing data to SPSS

The haven-package offers fantastic possibilities to write R data frames to other formats, currently SPSS and STATA are supported.

To make sure that value labels are written as well, variables either need to be of class labelled (see haven::labelled()) or labelled factors. However, both vector-types do not support variable labels, thus data is saved without variable labels.

The write_spss function of the sjmisc-package converts the data into a format that also exports variable labels. When writing data to SPSS or STATA, it is recommended to do so with sjmisc-write-functions:

write_spss(my_data_frame, "path/to/spss-file.sav")