idba_ud gene assembly 宏基因组序列无参考基因组装工具idba-ud的介绍及详细使用方法

小程序开发数据库 2024-04-23 10 0

介绍

idba-ud工具是一种用于组装无参考基因组的工具，它可以将高通量测序数据转化为基因组序列。它是idba工具的升级版本，专门用于组装多样性的无参考基因组。

idba-ud的主要作用是通过组装测序数据，生成无参考基因组的序列。它能够处理短读长和长读长两种类型的测序数据，并且能够在组装过程中处理高度异质性的数据。idba-ud还具有高度并行化的特点，可以充分利用计算资源进行快速的基因组组装。

idba-ud的背景产生源于生物学领域对于无参考基因组组装的需求。在研究某些物种的基因组时，可能找不到合适的参考序列进行比对，这时就需要利用无参考基因组组装工具来获得该物种的基因组序列。由于不同物种的基因组具有不同的特点，idba-ud针对多样性的无参考基因组进行了优化和改进。

idba-ud工具的开发和改进是基于前人在无参考基因组组装领域的研究工作。它采用了一种称为de Bruijn图的数据结构，用于将测序数据转化为序列片段，并通过比对、连接和确定序列的方向性来组装基因组。idba-ud还充分考虑了数据的异质性，采用了多种策略来处理高度异质的数据，提高了基因组组装的准确性和可靠性。

总的来说，idba-ud工具的作用是通过组装无参考基因组，获得物种的基因组序列，为生物学研究提供重要的基础数据。它的背景产生源于对无参考基因组组装的需求，并基于前人的研究工作进行了改进和优化，使得它能够处理多样性的无参考基因组数据，并具有高度并行化和处理异质性数据的能力。

安装

git clone https://github.com/loneknightpy/idba.git

$ ./configure

$ make

放入系统环境这个大家按需求去设置吧，个人直接用绝对路径

使用

序列转换

idba默认使用fasta文件作为输入，因此fastq文件和双端pair的fastq文件需要使用fq2fa进行转换

fq2fa read.fq read.fa

#双端转换

fq2fa --merge --filter read_1.fq read_2.fq read.fa

序列组装：

超级简单吧，不过这个要注意机子内存，虽然没那么耗内存，但对稍微大一点的数据集也会耗不少。

idba_ud -r read.fa -o idba_assembly

# -r 输入reads序列

# -o 输出结果目录

全参数帮助信息：

idba_ud --help

idba_ud: unrecognized option '--help'

uknown option

IDBA-UD - Iterative de Bruijn Graph Assembler for sequencing data with highly uneven depth.

Usage: idba_ud -r read.fa -o output_dir

Allowed Options:

-o, --out arg (=out) output directory

-r, --read arg fasta read file (<=600)

--read_level_2 arg paired-end reads fasta for second level scaffolds

--read_level_3 arg paired-end reads fasta for third level scaffolds

--read_level_4 arg paired-end reads fasta for fourth level scaffolds

--read_level_5 arg paired-end reads fasta for fifth level scaffolds

-l, --long_read arg fasta long read file (>600)

--mink arg (=20) minimum k value (<=312)

--maxk arg (=100) maximum k value (<=312)

--step arg (=20) increment of k-mer of each iteration

--inner_mink arg (=10) inner minimum k value

--inner_step arg (=5) inner increment of k-mer

--prefix arg (=3) prefix length used to build sub k-mer table

--min_count arg (=2) minimum multiplicity for filtering k-mer when building the graph

--min_support arg (=1) minimum supoort in each iteration

--num_threads arg (=0) number of threads

--seed_kmer arg (=30) seed kmer size for alignment

--min_contig arg (=200) minimum size of contig

--similar arg (=0.95) similarity for alignment

--max_mismatch arg (=3) max mismatch of error correction

--min_pairs arg (=3) minimum number of pairs

--no_bubble do not merge bubble

--no_local do not use local assembly

--no_coverage do not iterate on coverage

--no_correct do not do correction

--pre_correction perform pre-correction before assembly

好文推荐

评论可见，请评论后查看内容，谢谢！！！

您阅读本篇文章共花了：

idba_ud 基因组装 gene assembly

本文由用户于 2024-04-23 发布在金钥匙，如有疑问，请联系我们。
本文链接：https://www.51969.com/post/18472980.html

金钥匙

idba_ud gene assembly 宏基因组序列无参考基因组装工具idba-ud的介绍及详细使用方法

数据库 dba Oracle的用户与表空间的关系与设置方法

运维 dba 数据库架构 hgdb hac集群重启后所有节点显示Replica，无Leader节点的原因之一

发表评论取消回复

金钥匙

idba_ud gene assembly 宏基因组序列无参考基因组装工具idba-ud的介绍及详细使用方法

数据库 dba Oracle的用户与表空间的关系与设置方法

运维 dba 数据库架构 hgdb hac集群重启后所有节点显示Replica，无Leader节点的原因之一

相关文章

发表评论取消回复