售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Red Hat Enterprise Linux Troubleshooting Guide
Table of Contents
Red Hat Enterprise Linux Troubleshooting Guide
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Troubleshooting Best Practices
Styles of troubleshooting
The Data Collector
The Educated Guesser
The Adaptor
Choosing the appropriate style
Troubleshooting steps
Understanding the problem statement
Asking questions
Tickets
Humans
Attempting to duplicate the issue
Running investigatory commands
Establishing a hypothesis
Putting together patterns
Is this something that I've encountered before?
Trial and error
Start by creating a backup
Getting help
Books
Team Wikis or Runbooks
Man pages
Reading a man page
Name
Synopsis
Description
Examples
Additional sections
Info documentation
Referencing more than commands
Installing man pages
Red Hat kernel docs
People
Following up
Documentation
Root cause analysis
The anatomy of a good RCA
The problem as it was reported
The actual root cause of the problem
A timeline of events and actions taken
Any key data points to validate the root cause
A plan of action to prevent the incident from reoccurring
Establishing a root cause
Sometimes you must sacrifice a root cause analysis
Understanding your environment
Summary
2. Troubleshooting Commands and Sources of Useful Information
Finding useful information
Log files
The default location
Common log files
Finding logs that are not in the default location
Checking syslog configuration
Checking the application's configuration
Other examples
Using the find command
Configuration files
Default system configuration directory
Finding configuration files
Using the rpm command
Using the find command
The proc filesystem
Troubleshooting commands
Command-line basics
Command flags
The piping command output
Gathering general information
w – show who is logged on and what they are doing
rpm – RPM package manager
Listing all packages installed
Listing all files deployed by a package
Using package verification
df – report file system space usage
Showing available inodes
free – display memory utilization
What is free, is not always free
The /proc/meminfo file
ps – report a snapshot of current running processes
Printing every process in long format
Printing a specific user's processes
Printing a process by process ID
Printing processes with performance information
Networking
ip – show and manipulate network settings
Show IP address configuration for a specific device
Show routing configuration
Show network statistics for a specified device
netstat – network statistics
Printing network connections
Printing all ports listening for tcp connections
Delay
Performance
iotop – a simple top-like I/O monitor
iostat – report I/O and CPU statistics
Manipulating the output
vmstat – report virtual memory statistics
sar – collect, report, or save system activity information
Using the sar command
Summary
3. Troubleshooting a Web Application
A small back story
The reported issue
Data gathering
Asking questions
Duplicating the issue
Understanding the environment
Where is this blog hosted?
Lookup IPs with nslookup
What about ping, dig, or other tools?
Ok, it's within our environment; now what?
What services are installed and running?
Validate the web server
Validating the database service
Validating PHP
A summary of installed and running services
Looking for error messages
Apache logs
Finding the location of Apache's logs
Reviewing the logs
Using curl to call our web application
Requesting a non-PHP page
Reviewing generated log entries
What we learned from httpd logs
Verifying the database
Verifying the WordPress database
Finding the installation path for WordPress
Checking the default configuration
Finding the database credentials
Connecting as the WordPress user
Validating the database structure
What we learned from the database validation
Establishing a hypothesis
Resolving the issue
Understanding database data files
Finding the MariaDB data folder
Resolving data file issues
Validating
Final validation
Summary
4. Troubleshooting Performance Issues
Performance issues
It's slow
Performance
Application
CPU
Top – a single command to look at everything
What does this output tell us about our issue?
Individual processes from top
Determining the number of CPUs available
Threads and Cores
lscpu – Another way to look at CPU info
ps – Drill down deeper on individual processes with ps
Using ps to determine process CPU utilization
Putting it all together
A quick look with top
Digging deeper with ps
Memory
free – Looking at free and used memory
Linux memory buffers and caches
Swapped memory
What free tells us about our system
Checking for oomkill
ps - Checking individual processes memory utilization
vmstat – Monitoring memory allocation and swapping
Putting it all together
Taking a look at the system's memory utilization with free
Watch what is happening with vmstat
Finding the processes that utilize the most memory with ps
Disk
iostat – CPU and device input/output statistics
CPU details
Reviewing I/O statistics
Identifying devices
Who is writing to these devices?
ps – Using ps to identify processes utilizing I/O
iotop – A top top-like command for disk i/o
Putting it all together
Using iostat to determine whether there is a I/O bandwidth problem
Using iotop to determine which processes are consuming disk bandwidth
Using ps to understand more about processes
Network
ifstat – Review interface statistics
Quick review of what we have identified
Comparing historical metrics
sar – System activity report
CPU
Memory
Disk
Network
Review what we learned by comparing historical statistics
Summary
5. Network Troubleshooting
Database connectivity issues
Data collection
Duplicating the issue
Finding the database server
Testing connectivity
Telnet from blog.example.com
Telnet from our laptop
Ping
Troubleshooting DNS
Checking DNS with dig
Looking up DNS with nslookup
What did dig and nslookup tell us?
A bit about /etc/hosts
DNS summary
Pinging from another location
Testing port connectivity with cURL
Showing current network connections with netstat
Using netstat to watch for new connections
Breakdown of netstat states
Capturing network traffic with tcpdump
Taking a look at the server's network interfaces
What is a network interface?
Viewing device configuration
Specifying the interface with tcpdump
Reading the captured data
A quick primer on TCP
Types of TCP packet
Reviewing collected data
Taking a look on the other side
Identifying the network configuration
Testing connectivity from db.example.com
Looking for connections with netstat
Tracing network connections with tcpdump
Routing
Viewing the routing table
The default route
Utilizing IP to show the routing table
Looking for routing misconfigurations
More specific routes win
Hypothesis
Trial and error
Removing the invalid route
Configuration files
Summary
6. Diagnosing and Correcting Firewall Issues
Diagnosing firewalls
Déjà vu
Troubleshooting from historic issues
Basic troubleshooting
Validating the MariaDB service
Troubleshooting with tcpdump
Understanding ICMP
Understanding connection rejections
A quick summary of what you have learned so far
Managing the Linux firewall with iptables
Verify that iptables is running
Show iptables rules being enforced
Understanding iptables rules
Ordering matters
Default policies
Breaking down the iptables rules
Putting the rules together
Viewing iptables counters
Correcting the iptables rule ordering
How iptables rules are applied
Modifying iptables rules
Testing our changes
Summary
7. Filesystem Errors and Recovery
Diagnosing filesystem errors
Read-only filesystems
Using the mount command to list mounted filesystems
A mounted filesystem
Using fdisk to list available partitions
Back to troubleshooting
NFS – Network Filesystem
NFS and network connectivity
Using the showmount command
NFS server configuration
Exploring /etc/exports
Identifying the current exports
Testing NFS from another client
Making mounts permanent
Unmounting the /mnt filesystem
Troubleshooting the NFS server, again
Finding the NFS log messages
Reading /var/log/messages
Read-only filesystems
Identifying disk issues
Recovering the filesystem
Unmounting the filesystem
Filesystem checks with fsck
The fsck and xfs filesystems
How do these tools repair a filesystem?
Mounting the filesystem
Repairing the other filesystems
Recovering the / (root) filesystem
Validation
Summary
8. Hardware Troubleshooting
Starting with a log entry
What is a RAID?
RAID 0 – striping
RAID 1 – mirroring
RAID 5 – striping with distributed parity
RAID 6 – striping with double distributed parity
RAID 10 – mirrored and striped
Back to troubleshooting our RAID
How RAID recovery works
Checking the current RAID status
Summarizing the key information
Looking at md status with /proc/mdstat
Using both /proc/mdstat and mdadm
Identifying a bigger issue
Understanding /dev
More than just disk drives
Device messages with dmesg
Summarizing what dmesg has provided
Using mdadm to examine the superblock
Checking /dev/sdb2
What we have learned so far
Re-adding the drives to the arrays
Adding a new disk device
When disks are not added cleanly
Another way to watch the rebuild status
Summary
9. Using System Tools to Troubleshoot Applications
Open source versus home-grown applications
When the application won't start
Exit codes
Is the script failing, or the application?
A wealth of information in the configuration file
Watching log files during startup
Checking whether the application is already running
Checking open files
Understanding file descriptors
Getting back to the lsof output
Using lsof to check whether we have a previously running process
Finding out more about the application
Tracing an application with strace
What is a system call?
Using strace to identify why the application will not start
Resolving the conflict
Summary
10. Understanding Linux User and Kernel Limits
A reported issue
Why is the job failing?
Background questions
Is the cron job even running?
User crontabs
Understanding user limits
The file size limit
The max user processes limit
The open files limit
Changing user limits
The limits.conf file
Future proofing the scheduled job
Running the job again
Kernel tunables
Finding the kernel parameter for open files
Changing kernel tunables
Permanently changing a tunable
Temporarily changing a tunable
Running the job one last time
A look back
Too many open files
A bit of clean up
Summary
11. Recovering from Common Failures
The reported problem
Is Apache really down?
Why is it down?
What else was happening at that time?
Searching the messages log
Breaking down this useful one-liner
The cut command
The sort command
The uniq command
Tying it all together
What happens when a Linux system runs out of memory?
Minimum free memory
A quick recap
How oom-kill works
Adjusting the oom score
Determining whether our process was killed by oom-kill
Why did the system run out of memory?
Resolving the issue in the long-term and short-term
Long-term resolution
Short-term resolution
Summary
12. Root Cause Analysis of an Unexpected Reboot
A late night alert
Identifying the issue
Did someone reboot this server?
What do the logs tell us?
Learning about new processes and services
What caused the high load average?
What are the run queue and load average?
Load average
Investigating the filesystem being full
The du command
Why wasn't the queue directory processed?
A checkpoint on what you learned
Sometimes you cannot prove everything
Preventing reoccurrence
Immediate action
Long-term actions
A sample Root Cause Analysis
Problem summary
Problem details
Root cause
Action plan
Further actions to be taken
Summary
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜