Beginners Guide To Bioinformatics For High Throughput Sequencing

Beginners Guide To Bioinformatics For High Throughput Sequencing

by Eric Cheng-yu Lee, Tin Wee Tan
ISBN-10:
9813231661
ISBN-13:
9789813231665
Pub. Date:
11/29/2018
Publisher:
World Scientific Publishing Company, Incorporated
ISBN-10:
9813231661
ISBN-13:
9789813231665
Pub. Date:
11/29/2018
Publisher:
World Scientific Publishing Company, Incorporated
Beginners Guide To Bioinformatics For High Throughput Sequencing

Beginners Guide To Bioinformatics For High Throughput Sequencing

by Eric Cheng-yu Lee, Tin Wee Tan
$58.0 Current price is , Original price is $58.0. You
$58.00 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Overview

Biologists find computing bewildering; yet they are expected to be able to process the voluminous data available from the machines they buy and the datasets that has accumulated in genomic databanks worldwide. It is now increasingly difficult for them to avoid dealing with large volumes of data, that goes beyond just doing manual programming.Most books in this realm are full of equations and complex code but this book gives a much gentler entry point particularly for biologists, with code snippets users can use to cut and paste, and run on their Linux or MacOSX operating system or cloud instance. It also provides a step by step installation instructions which they can easily follow. Those who are in the field of genome sequencing and already familiar with the procedures of analysis, may also find this book useful in closing some knowledge gaps.High throughput sequencing requires high throughput and high performance computing. This book provides a gentle entry to high throughput sequencing by dealing with simple skills which the average biologist is increasingly required to master. You will find this book a breeze to read, and some suggestions in this book maybe new to you, something you might want to try out.

Product Details

ISBN-13: 9789813231665
Publisher: World Scientific Publishing Company, Incorporated
Publication date: 11/29/2018
Pages: 276
Product dimensions: 6.69(w) x 9.61(h) x 0.58(d)

Table of Contents

Preface v

Chapter 1 Preparing Your Computing Environment 1

1.1 Buying Your Own Computer 1

1.2 Setting up a Computing Server 4

1.3 Establishing a Remote Connection to a Server 6

Chapter 2 Learning Basic Linux Commands 17

2.1 No Need to be a Linux Guru to use Linux Effectively 17

2.2 Folder (Directory) Operations 18

Controlling your command prompt 25

2.3 File Operations 32

2.4 Assignment of Permissions 39

The path 46

2.5 Understanding System Status 47

UNIX redirection and pipes 52

2.6 Other Useful Commands 54

Chapter 3 Checking Sequence Quality 59

3.1 Basic High-throughput Sequencing 59

3.2 Challenges of High Throughput Genome Sequencing 61

3.3 Standards of Quality Score 62

3.4 Quality Check 65

FastQC 65

FASTX-Toolkit 72

Chapter 4 Sequence Alignment 81

4.1 The Purpose of Sequence Alignment 81

Sequence assembly 83

4.2 Selection of the Sequence Alignment Tools 83

Burrows Wheeler 85

The BWT encoding-decoding algorithm 88

4.3 Actual Operation of the Sequence Alignment 90

Download and installation of Bow tie 90

Executing sequence alignment 96

4.4 Sequence Alignment Results File Conversion 99

Downloading SAMtools 99

4.5 Using the Genome Browser 108

Chapter 5 Speeding-up with GPUs 117

5.1 Computational Advantages of the Graphics Card 117

5.2 Industry Standards and Usage of GPU Computing 119

5.3 Practical CUDA Applications in Bioinformatics 138

Preparing the reference sequence 140

Alignment with CUSHAW2-GPU 143

5.4 The Reason for the Limited Success of GPUs 145

Chapter 6 Establishing a Research Workflow Pipeline 147

6.1 Automating Your Computational Workflow 147

6.2 Scripting Language 148

Script command 150

6.3 Testing and Debugging 157

Keeping track of the current project 158

Complementing tests of code blocks 159

Calculating the execution time 160

6.4 Implementation Case Studies 162

6.5 Case Study of Common Mistakes 170

Mistake 1 Confusing mess of relative paths 170

Mistake 2 Failure to change the necessary permissions 172

Mistake 3 The disk becomes full during execution 172

Mistake 4 Ignoring cross-platform shell portability considerations 174

Chapter 7 Using a Bioinformatics Cloud Computing Platform 177

7.1 Simple Introduction to the Cloud Computing Platform 177

7.2 Amazon Web Service 178

7.3 Bioinformatics Cloud Computing Platforms 182

Logging in to use Galaxy services 184

Uploading sequence data 187

Sequence quality testing 195

Execution of sequence alignment 202

Selecting other Galaxy servers 205

Design and use of research workflows 207

Establishing new research workflows 207

Sharing and publishing process 209

Execution of research workflows 212

Downloading or exporting research workflows 212

Importing research workflows 214

7.4 Installing and Setting up your Own Galaxy Server 215

Downloading the latest version of the Galaxy 216

Starting your Galaxy server 217

Allowing external execution 220

Installation of bioinformatics tools 220

Adding new reference sequences 229

Appendix Learning Regular Expressions through Practising Simple Data Processing 235

Regular Expressions 236

One character pattern match 236

Numbering a file and printing line number of a hit 236

Counting number of grepped hits 237

UNIX redirection using pipes 237

Grep and output several lines of context around the hit 237

Grepping for non-matching lines 238

Grepping for unwanted characters 238

Mistake of logic 238

Egrep or grep -E extended regular expression grep 239

Egrep and the character class 239

Egrep character class negation 240

Regular expression: Beginning of line anchor ∧ 241

Case-sensitive and case-insensitive grep 241

Regular expression: End of line anchor 242

More about regular expressions 242

Even more regular expression 244

Substitution with SED Awk and Perl 244

Using Excel to do data processing 250

Index 259

From the B&N Reads Blog

Customer Reviews