Data description
Starting R for the first time
The recommended way to use R
is to create a new empty directory (named
something like "Digging Numbers" and start R from the command line
into that directory. This way, data and command history will be saved
just for this workspace. Here's an example for UNIX operating systems.
$ mkdir diggingnumbers # just the first time
$ cd diggingnumbers
$ R
After that, your R
session is active in the current directory. You can
always check the directory you are working in with the command
> getwd()
Importing data
To read data from the raw data file into R:
> spearheads <- read.csv("spearheads.csv", header=TRUE)
From that moment on you can access the dataset with the data frame
object named spearheads
.
You can save time and fingers typing
> attach(spearheads)
every time you start a new session into that workspace. This enables you
to call variables directly, like Maxle
instead of spearheads$Maxle
Once you have read in a dataset, you can verify the names of the variables using the "names" command:
> names(spearheads)
This will display a list of the column names in the table. It is also a handy means of verifying capitalization and spelling of the field (column) names , since a missing or added capital in a field name will result in an error.
For additional information regarding the data set enter:
> str(spearheads)
This displays a more elaborated list of the data as follows:
'data.frame': 40 obs. of 14 variables:
$ Num : int 1 2 3 4 5 6 7 8 9 10 ...
$ Mat : int 2 2 2 2 2 2 2 2 2 2 ...
$ Con : int 3 3 3 3 3 3 3 2 2 1 ...
$ Loo : int 1 1 1 1 1 1 1 1 1 1 ...
$ Peg : int 2 2 2 NA 1 2 2 2 2 2 .... etc.
The ouput shows, first that data is stored in memory as a dataframe. It also tells you that there are 40 records - observations - of 14 variables. The output then lists the variable name, the type of data, and a partial list of values stored in the variable following importation. This is particularly important information since some of the variables listed as "int" types are not actually numerical data. Material type - Mat - for example, is categorical data that has been entered as a numeric code. R will need to be informed that the variable really contains levels of a factor (a categorical variable) for some commonly used statistical routines. R could otherwise yield nonsensical results. There is no point, for example, in asking for an average value of Mat.
A note about importing data from external sources
Especially when you are importing files that you haven't produced
yourself, always inspect text-format data with a text editor (e.g.
vi
, emacs
, gedit
, wordpad
). Don't make assumptions based on
the file extension (like ".csv"), instead just go looking at the data
first. That's just good practice and something any user of external
data should keep in mind.
You might find that files produced in a different country use different
locale settings of decimal separators (comma vs point). R by default
tries to load files with English settings. If your file doesn't load,
inspect it and make good use of some of the options of the read.csv()
command like sep
(for field separator) and dec
(for decimal
separator).
Quitting R
When you are done with your first tutorial, quit the R
session with
the q()
command, and answer y
to the Save workspace image
question.
> q()
Save workspace image? [y/n/c]:
This leaves all the variables you created as they are for your next session.
If you want to be sure R data is actually saved in that
directory, just ls -a
after quitting R
and you should find two files
.RData
and .Rhistory
.