R Data Frames

A data frame is a two-dimensional tabular data structure similar to a spreadsheet or a SQL table.
Stores different data types.
Creates seperate coloumn for each data set in the data frame.


# Creating a data frame
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 22),
  City = c("New York", "San Francisco", "Los Angeles")


Accessing Elements

We can access the data frames with index number inside a [["coloum_name"] double bracket or by mentioning row or coloum number and by $coloum_name.


# Creating a data frame
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 22),
  City = c("New York", "San Francisco", "Los Angeles")
# Accessing columns
name_column <- df$Name
age_column <- df[["Age"]]

# Accessing rows
first_row <- df[1, ]

Summerizing the Data Frame

summmary() Function is used to Generates a summary of the data frame.


# Creating a data frame
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 22),
  City = c("New York", "San Francisco", "Los Angeles")

df$Salary <- c(50000, 60000, 45000)

# Data Frame Summary

Data Frames Functions

Functions are a built in methods that performs operations fast and easy.





Number of Rows


Returns the number of rows in a data frame.

num_rows <- nrow(df)

Number of Columns


Returns the number of columns in a data frame.

num_cols <- ncol(df)

Data Frame Structure


Displays the structure of a data frame.

str_df <- str(df)

Data Types of Columns

sapply(df, class)

Returns the data types of columns in a data frame.

column_types <- sapply(df, class)

Summary Statistics


Generates a summary of the data frame.

summary_df <- summary(df)

Descriptive Statistics


Computes additional descriptive statistics.

stats_df <- summary.stats(df

Checking for Missing Values


Checks if there are any missing values in the data frame.

has_missing <- any(is.na(df))

Head of Data Frame


Displays the first few rows of a data frame.

head_df <- head(df)

Tail of Data Frame


Displays the last few rows of a data frame.

tail_df <- tail(df)

Sorting Data Frame

df_sorted <- df[order(df$column_name), ]

Sorts a data frame based on a specific column.

df_sorted <- df[order(df$Age), ]

Filtering Rows

subset(df, condition)

Filters rows based on a specified condition.

young_people <- subset(df, Age < 30)

Selecting Columns

df_subset <- df[, c("col1", "col2")]

Selects specific columns from a data frame.

df_subset <- df[, c("Name", "Age")]


# Creating a data frame
employee_data <- data.frame(
  EmployeeID = c(101, 102, 103, 104, 105),
  Name = c("Alice", "Bob", "Charlie", "David", "Eva"),
  Age = c(28, 35, 24, 40, 30),
  Department = c("HR", "IT", "Marketing", "Finance", "IT"),
  Salary = c(60000, 75000, 50000, 90000, 80000),
  StringsAsFactors = FALSE

# Displaying the data frame
print("Original Data Frame:")

# Number of rows and columns
num_rows <- nrow(employee_data)
num_cols <- ncol(employee_data)
print(paste("Number of Rows:", num_rows))
print(paste("Number of Columns:", num_cols))

# Structure of the data frame
print("Data Frame Structure:")
str_df <- str(employee_data)

# Data types of columns
print("Data Types of Columns:")
column_types <- sapply(employee_data, class)

# Summary of the data frame
print("Summary of the Data Frame:")
summary_df <- summary(employee_data)

# Descriptive statistics
print("Descriptive Statistics:")
stats_df <- describe(employee_data)

# Checking for missing values
has_missing <- any(is.na(employee_data))
print(paste("Has Missing Values:", has_missing))

# Sorting the data frame by Age
df_sorted <- employee_data[order(employee_data$Age), ]
print("Sorted Data Frame by Age:")

# Filtering rows for employees younger than 30
young_employees <- subset(employee_data, Age < 30)
print("Young Employees:")

# Selecting specific columns
selected_cols <- employee_data[, c("Name", "Salary")]
print("Selected Columns:")

Quick Recap - Topics Covered

R Data frame

