新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> Biomatics, Gene Ontology(基因本体)
    [返回] 中文XML论坛 - 专业的XML技术讨论区计算机技术与应用『 生物信息学 』 → 生物芯片(5) 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 5395 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: 生物芯片(5) 举报  打印  推荐  IE收藏夹 
       本主题类别:     
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18406
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 生物信息学 』 的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 生物芯片(5)


    生物芯片(5)                         


    发信人: teddy (沈小聪聪), 信区: Bioinformatics
    标  题: 生物芯片(5)
    发信站: 北大未名站 (2001年03月14日16:01:19 星期三), 站内信件

    发信人: Teddy (real), 信区: Electronics
    标  题: 数据分析(转载)
    发信站: 大话西游站 (2001年03月14日15:12:05 星期三), 站内信件

    【 以下文字转载自 Classroom 讨论区 】
    【 原文由 Teddy 所发表 】

    Olga Ermolaeva1,2,Mohit Rastogi3,Kim D.Pruitt2,Gregory D.Schuler2,
    Michael L.Bittner1,Yidong Chen1,Richard Simon4,Paul Meltzer1,Jeffrey M.

    Trent1 & Mark S.Boguskj2,3

    Microarray technology makes it possible to simultaneously study the
    expression of thousands of genes during a single experiment.We have
    developed an information system,ArrayDB,to manage and analyse
    large-scale expression data.The underlying relational database was
    designed to allow flexibility in the nature and structure of data
    input and also in the generation of standard or customized reports
    through a web-browser interface.ArrayDB provides varied options for
    data
    retrieval and analysis tools that should facilitate the
    interpretation of complex hybridization results.A sampling of ArrayDB
    storage,retrieval and analysis capabilities is available(www.nhgri.nih.

    gov/DIV/LCG/15K/HTML),along with information on a set of approximately
    15,000 genes used go fabricate several widely used microarrays.
    15,000 genes used go fabricate several widely used microarrays.
    Information stored in ArrayDB is used to provide inetgrated gene
    expression reports by linking array target sequences with NCB1's
    Entrez retrieval system,Unigene and KEGG pathway views.The integration
    of esternal information resources is essenteal in inerpreting
    intrinsic patterns and relationships in large-scale gene expression
    data.

    Our modern concept of gene expression datas to 1961,when messenger RNA
    was discovered,the genetic code deciphered and the theory or genetic
    regulation lr protein synthesis described1-3.The first attempts at
    global surveys of gene expression were undertaken in the mid-1970s.
    Kinetic studies of the hybridization of mRNA pools with radioactively
    labelled cDNA produced the general concepts os varying mRNA abundance
    classes that are related to the functional class(structural,catalytic
    and so on)of the translated proteins4,5.These experiments also
    provided insight into:(i)the number of members of these classes;
    (ii)the presence of a large number of ubiquitously expressed(`
    house-keeping')genes thought to be necessary for the structural and
    functional integrity of all cell types;and(iii)the  existence of
    significant numbers of genes that are apparently cell-type-specific.
    This
    period coincided with the establishment and popularization of the
    phrase‘gene expression’through its usage in the titles of a series
    phrase‘gene expression’through its usage in the titles of a series
    of influential books6-9.Interest in gene expression increased steadily
    during the 1980s,as shown by the fact that the frequency of usage of
    the
    phrase increased more than 10-fold in the titles of publications over
    this decade(unpub.obs).

    In the 1990s,a new of era or gene expression studies has unfolded as a
    result of data sufficiency(that is,complete genomes of comprehensive
    cDNA surveys)and technological advances10-12.As a consequence of
    large-scale DNA sequencig activities,there are now more DNA sequences
    in
    GenBank than there are related publicarions in the literature (Fig.1).

    Thus,we have reached a turning point in biomedical research:in the
    past we have had many publications about a relatively small number of
    genes,whereas now,and in the future,single publications will begin to
    encompass aspects of thousands of genes12-17.Large-scale study of gene
    expression is a hallmark of the transition from ‘structural’to ‘
    functional’genomics18,where knowing the complete sequence of a genome
    is only the first step in understanding how it works.

    There are several new technologies for studying the simultaneous
    expression of large numbers of genes.These technologies may be
    generally
    divided into serial and parallel methods.The serial methods involve
    direct,large-scale sequencing of cDNA (for revirw,see ref.19);the
    direct,large-scale sequencing of cDNA (for revirw,see ref.19);the
    parallel approaches are based on hybridization to cDNA immobilized on
    glass (termed ‘microarrays’;ref.11)or to synthetic oligonucleotides
    immobilized on silica wafers or ‘chips’(termed ‘probe arrays’;refs
    10,20).In both parallel methods,hybridized probes are detected using
    incorporated fluorescent nucleotide analogs.These methods are the
    conceptual descendents of filter-immobilized targets detected by
    radioactive probes21,22,and filter-based technology is undergoing a
    renaissance as a low-cost alternative to the newer methods.Regardless,
    arrays of hybridization targets,generated at high density in small
    areas(for example,10,000 cDNAs on a 2×2cm filter or glass slide)are
    now
    commonly referred to as microar-

    Fig.1 Cumulative growth of molecular biology and genetics
    literature(blue)compared with DNA sequences(green).Articles in the
    "G5"(molecular biology and genetics)subset of MEDLINE are plotted
    alongside DNA sequence records in GenBank over the same time period.
    The former data was obtained with the help of R.M.Woodsmall of NCB1
    and the latter data is available(ft[://ncbi.nim.nih.gov/genbank/gbrel.
    txt).No attempt has been made to eliminate data redundancy among
    either the DNA sequence rdcords or information contained in the
    direct,large-scale sequencing of cDNA (for revirw,see ref.19);the
    parallel approaches are based on hybridization to cDNA immobilized on
    glass (termed ‘microarrays’;ref.11)or to synthetic oligonucleotides
    immobilized on silica wafers or ‘chips’(termed ‘probe arrays’;refs
    10,20).In both parallel methods,hybridized probes are detected using
    incorporated fluorescent nucleotide analogs.These methods are the
    conceptual descendents of filter-immobilized targets detected by
    radioactive probes21,22,and filter-based technology is undergoing a
    renaissance as a low-cost alternative to the newer methods.Regardless,
    arrays of hybridization targets,generated at high density in small
    areas(for example,10,000 cDNAs on a 2×2cm filter or glass slide)are
    now
    commonly referred to as microar-

    Fig.1 Cumulative growth of molecular biology and genetics
    literature(blue)compared with DNA sequences(green).Articles in the
    "G5"(molecular biology and genetics)subset of MEDLINE are plotted
    alongside DNA sequence records in GenBank over the same time period.
    The former data was obtained with the help of R.M.Woodsmall of NCB1
    and the latter data is available(ft[://ncbi.nim.nih.gov/genbank/gbrel.
    txt).No attempt has been made to eliminate data redundancy among
    either the DNA sequence rdcords or information contained in the
    either the DNA sequence rdcords or information contained in the
    literature.

    Box1·The 10K/15K gene sets

    The initial resources required to design and fabricate gene expression
    microarrays include cDNA sequence data,cDNA clones,orboth.
    ldentification
    of genes and clones of interest is problematic due to the quantity
    and redundancy of sequence data available.Some problems associated
    with the large-scale application of genome resources have been faced
    before in the context of building a transcript map of the human
    genome24
    ,and databases consisting of non-redundant collections of human and
    mouse genes and ESTs have been developed25 .The UniGene collection of
    human sequences (http://www,ncbi.nlm.nih,gov/UniGene/)currently
    represents more than 45,000 genes and it is possible to fabricate
    arrays
    containing this entire collection.lnitial work in our laboratories
    focused on a smaller,but still significant subset of approximately 10,
    000-15,000 transcribed human sequences referred to as the 10K and 15K
    sets, originally conceived by P.Brown,J.M.T.and M.L.B.developed by G.S.

    and arrayed by J.Hudson.Detailed information on the composition of
    these sets is available (http://www.nhgri.nih.gov/DlR/LCG/15K/HTML/).
    Briefly,the sets were designed to include a selection of human genes
    of known function, ESTs on the human transcript map24,ESTs with
    of known function, ESTs on the human transcript map24,ESTs with
    significant similarities to genes in other organisms and some
    handpicked
    genes of specific research interest.

    rays.Detailed discussion of these technologies is beyond the scope of
    this article(see http://www.ncbi.nlm.nih.
    gov/ncicgap/expression_tech_info.html and http://www.nhgri.nih.gor/).
    but
    we note that bioinformatics needs are similar and equally essential
    for
    all methods.

    Although a great deal of effort has gone into the development of the
    enabling technologies, relatively little attention has been paid to
    the computational biology underlying data analysis and interpretation
    We
    describe here some general aspects of gene expression informatics as
    well as our specific implementation of an integrated data management
    and
    analyses system(ArrayDB), designed as a database-backed web site23.
    Informatics plays an important role at every step in the process, from
    the design of arrays through through laboratory information management,

    to the processing and interpretation of experimental results. We also
    discuss the role of the public database in this new era of biomedical
    research.

    Design considerations
    Design considerations

    Array-based experiments aim to simultaneously catalogue the expression
    behaviour of thousands of genes in a single experiment. It is also
    expected that comparisons will be carried out across tissues,
    developmental and pathological states, or as temporal responses
    following a defined alteration to cells or their environment. Such
    experiments require the ability to manage large quantities of data
    both before and after the experiment. The design and construction of
    arrays that will detect gene expression requires direct access to all
    sequences, annotations and physical DNA resources for genes of an
    organism(Box 1).

    Following hybridization and readout of relative expression levels
    observed in various sites on an array, the data collected must be
    stored
    and preserved in a way to make it readily available for image
    processing26 and statistical and biological analysis. The latter
    includes identifying the changing and unchanging levels of expression
    and correlating these changes to identify sets of genes with similar
    profiles. Easy access to existing biological knowledge of gene
    function and interaction is necessary to fully interpret the
    biological implications of the observed patterns. An information
    system must also be flexible enough to accommodate new statistical
    system must also be flexible enough to accommodate new statistical
    data mining tools as they become available.

    Laboratory information management systems(LIMS)

    The successful use of large-scale functional genomics technologies
    depends on robust and efficient systems for tracking and managing
    material and information flow. An overview of the types of practical
    problems addressed by our LIMS is shown(Fig.2). The individual
    components and detailed design of LIMS is connected with specific
    laboratory environments, particularly for those technologies still
    under
    development,but some general principles have guided our work. These
    include the use of an industry standard relational database management
    system combined with platformindependent web browser interfaces for
    data
    entry and retrieval23.

    The microarray LIMS, ArrayDB, was developed to store, retrieve and
    analyse microarray experiment information. The ArrayDB system
    integrates
    the multiple processes involved in microarray expression experiments,
    including data management,user interface, robotic printing, array
    scanning, array scanning and image processing26. Data stored in the
    ArrayDB system includes information about the experimental resources,
    experimental parameters and conditions, and raw and processed
    The design  of ArrayDB allows for flexibility in the exact nature of
    the
    data stored. This design strategy permits data input from different
    sources. Most clone information stored in the ArrayDB is extracted
    from UniGene(for example,sequence definition and accession number).
    However, the design accommodates addition of newly isolated clones for
    which accession numbers or meaningful names are not yet available.
    Many data input and processing tasks are automated. Software
    automatically scans a directory for new intensity data that are
    uploaded
    into the database without requiring an operator’s assistance.
    Additional automated

    Fig.2 Schematic overview of the ArrayDB information management system.
    The basic information in the database consists of arrays,‘ probes’
    and
    images, Arrays of specific cDNA clone inserts(and accompanying
    annotation) are as described in the text, Box 1 and the legend to
    Figure
    3. The section labelled‘probes ’signifies details of a particular
    experiment as described in the text. Details regarding image
    processing are provided26, An ad hoc raw image format is used for
    processing, but this is converted to standard formats (JPEG,GIF)for
    subsequent analysis and display. A complete relational schema of the
    database is available on request.
    database is available on request.

    Fig.3 Screen captures of various data retridval and analysis tools
    within ArrayDB. a, ArrayViewer histogram (additional details in test).
    b, ArrayViewer image and results. The ArrayViewer Java Applet displays
    the scanned array image in the top window. Boxes and the ranking
    number are overlaid on the image for clones that have satisfied the
    query criteria; clones are ranked according to ascending ratio value.
    The boxed clones, and related quantitative data, are listed under the
    image. Quantitative data presented in the lower window include: the
    ranking number, IMAGE clone ID  the ratio, probe Aintensity, probe B
    intensity, probe size, probe B pixel size and the clone tatle. c.
    ArrayViewer cluster report. The example shows a report fot
    Tryptophanyl-tRNA synthetase.‘Cl_id’is an internal database
    identifier. The ‘Clone’field contains the IMAGE clone identifier and
    is hyperlinked to the dbEST records containing the sequences of this
    clone. ‘FIags’summarizes the criteria by which this sequence was
    included in the 10K/15K sets. ‘Txmap’refers to the location of an
    STS derived from this sequence on the human transcript map24 and ‘
    Clust’ indicates the UniGene cluster dontaining this sequence(http:
    //www.ncbi.nlm.nih.gov/UniGene).‘EC’ contains the enzyme commission
    nomenclature number for this enzyme and ‘KEGG’links it to the
    biochemical pathway reports available through the KEGG web site (http:
    biochemical pathway reports available through the KEGG web site (http:
    //www.genome.ad.jp/kegg/)27.‘Pl/Row/Col’ refers to the microtitre
    plate and well from which the original clone was obtained. The ‘Genes’

    field contains GenBank accession numbers for
    annotated(non-EST)versions
    of the sequence and the ‘3’EST’and‘5’EST' fieles contain GenBank
    accession numbers for all ESTs corresponeing to the cDNA sequence
    Lastly. the 'Sequence' field contains only those accession numbers
    referring to those EST sequences derived from the actual lMAGE clone
    insert sdldcted for inclusion in the array.

    processes were developed to facilitate integration of intensity dara
    with clones data; for example, ArrayDB maintains the association
    between
    a spot on an image and all the data related to the clone located at
    that position on the microarray.

    The web-based user interface to the ArrayDB  system allows convenient
    retieval of distinct types of information, ranging from clone data to
    intensity data to analysis results. ArrayDB supports database queries
    by
    different fields, such as clone ID, title, experiment number,
    sequence accession number, or microtiter plate number, with a
    resulting report of the relevant clone (s). Additional information
    about
    each clone is avaible through hypertext links to other databases such
    as dbEST, GenBank or UniGene. Furthermore, metabolic pathway
    information
    as dbEST, GenBank or UniGene. Furthermore, metabolic pathway
    information
    is also available through links to the Kyoto encyclopedia of genes
    and genomes(KEGG)web site27.

    The inconsistency in gene nomenclature makes it more efficient and
    accurate to search for a gene of interest by doing a sequence
    similarity
    search. ArrayDB supports BLASTN searches against the 10K/15K set so
    that anyone can quickly detrmine if a gene of interest is included on
    our arrays. Matches against individual sequences are linked to a
    'cluster report'(Fig.3c),and from there to further annotation in
    external databases via hypertext links as described above.

     

    Data analysis

    The ultimate goal of ArrayDB is to identify patterns and relationships
    among intensity ratios both in individual and across multiple
    experiments. The ArrayViewer tool supports retrieval and analysis of
    single experiments; MultiExperiment viewer supports analysis of data
    from multiple experiments. In addition, the option to download
    intensity
    data, images and some analysis results to a local disk adds
    flexibility
    to the end-users analysis options: once downloaded, intensity data
    to the end-users analysis options: once downloaded, intensity data
    can be imported into other software packages for analysis.

    ArrayViewer facilitates identification of statistically significant
    hybridization results in single experiments. The data est for a single
    experiment includes intensity ratio data for two fluorescent
    hybridization probes. However, the inherent flexibility in the ArrayDB
    design strategy is compatible with results derived from single
    intensity
    (for example, radioactive probe) data. In the case of radioactive
    probes, a single image consists of the intensity data from two
    separate hybridization experiments using two different

    Fig.4 MultiExperiment viewer window.a,The main panel of the
    MultiExperiment viewer is divided into three sectios.The left side is
    composed of the control panel where the query criteria are selected .
    One
    also selects the experiments to analyse and other filters such as
    keywords,minium intensities and minimum pixel sizes.The data returned
    from a query can be downloaded in a tab delimited text file by
    selecting
    download list in this panel .the control panel can also be used to
    alter the y-axis format and scale of the data represented in the
    window on the right side .This window is a dot plot of the
    experimental data returned from the query.Selecting particular
    'dots'with a mouse highlights the ratio data for that clone across all
    'dots'with a mouse highlights the ratio data for that clone across all
    selected experiments in both the dot plot and the quantitative data in
    the lower right window .The lower right window displays the calibrated
    ratio of the returned genes (clones).Selecting the ranking number
    highlights that data in the dot -plot .The IMAGE consortium Clone id
    is linked to the cluster reports (Fig.3,legend).Selecting ratio and
    title will launch a new window (B)that displays the red and green
    intensities and sizes for that clone .by selecting advanced options in
    the control panel ,a new winow (C)islaunched that allows greater
    flexibility and control in defining a query .Greater precision is
    achieved by allowing one to specify experiments where only
    up-regulated clones or only down-regulated clones are of interest.

    Probes.Ratios of the imtensity values obtained with each probe,for
    each clone ,are determined and stored in the database.(The
    mathematical basis for our image analysis approach is reported
    elsewhere26)The basic premise of Array Viewer is that significant
    hybridization result can be determined from the ratio values.Therefore,

    ArrayViewer initially displays a histogram that is created on demand
    using the ratios stored in ArrayDB(Fig.3a).

    From the ArrayViewer  histogram,there are three basic ways to query
    the data and return information on the nature and expression of
    specific
    the data and return information on the nature and expression of
    specific
    genes .The first method uses a confidence algorithm26.Querying by
    confidence values will  return a list of those genes with
    statistically significant ratio values that are less than a lower
    confidence limit and greater than an upper confidence limit ,The
    default
    confidence value is 99%,but this can be changed and the lower and
    upper
    confidence limits re-calculated .The second method allows the user to
    select a range of ratios on the histogram and will return informaion
    on genes with expression ratios in this range.The last method is to
    simply view the image of the hybridization results and select spots in
    the array using a computer mouse or other pointing device .One can
    further refine the ArrayViewer query by adjusting optional filters for
    minimum intensity,maximum intensity ,minimum size ,or keyword.

    Query results are provided in a new window that displays the array
    image
    and a list of clones with their associated intensity data (Fig.3b).
    Additional information about each data point or clone can be obtained
    by
    clicking on the ranking number(red)or the  clone Id number (blue),
    respectively,Selecting the ranking number opens a new window
    presenting A×10magnification of the  hybridized target spot plus a
    reiteration of the hybridization result ,Selecting the clone Id number
    open a new window con -taining a cluster report for that clone (Fig.
    3c).Lastly,the data in the results window can be downloaded to a tab
    3c).Lastly,the data in the results window can be downloaded to a tab
    delimited text file by clicking on 'Download List".

    To realize the full potential of microarray expression analysis,
    MultiExperiment viewer was developed.This wed-based tool edables users
    to query the database across multiple experiments to identify clones
    that share some pattern of espression across those experiments.For
    example,one can use this tool to identify genes that are up-regulated
    or
    down-regulated across a series of experiments.In addition,the user
    can track the behaviour of a particular gene or genes of interest by
    specifying key words from gene descriptions in the 15K set .Analysis
    results are presented in both a graphical and tabular rormat.Also
    provided is a download option of the result table to facilitate
    storage of results for future reference and/or additional analysis.

    The MultiExperiment viewer window(Fig.4)provides a control panel for
    selecting the query criteria,an area to display a dot plot of the
    query results and a section where the table of quantitative
    information is displayed.To develop the query,one must first select
    the experiments from the list in the upper left corner;several filters
    are also provided which enables the user to ‘fine-tune’the query.The
    are also provided which enables the user to ‘fine-tune’the query.The
    MultiExperiment viewer then queries the database to identify clones
    exhibiting ratios thar meet the query requirements,returns the ratio
    for
    each clone and draws a dot-plot of the results for each experiment
    selected.This provides a convenient method to identify clones with
    particularly high or low ratios in an experimental series,such as a
    time
    course.There are two ways to visualize the expression pattern shown
    by an individual clone across the selected set of experiments.The
    position of the clone is highlighted in the dot plot diagram (Fig.4,
    red boxes)for each experiment by either clicking on a desired spot in
    the diagram or by clicking on the ranking number(left column)of a
    cline with interesting quantitative data.As previously described,
    additional information about each gene product is readily available in
    the clone's cluster report (Fig.3c) via the hyperlinked clone ID
    column.

    The comparison of data across multiple experiments requires a way of
    normalizing ratio results between experiments;to date,

    Box 2·public access to expression array data

    As large-scale gene expression data accumulates,public data access
    becomes critical issue.What is the best forum for making the data
    becomes critical issue.What is the best forum for making the data
    accessible?Summaries and conclusions of individual experiments will,of
    course,be published in traditional peerreviewed journals,but
    electronic access to full data sets is essential.There are three
    models for data publication:first authors can make data available on
    their own wed sites (for example,http://cmgm.stanford.
    edu/pbrown/explore14);second,journals that publish the results of
    these studies can provide the complete data sets as electronic
    supplements (this approach fulfills the traditional archival
    responsibility of the literature);and the third approach is to submit
    the data to a centralized public data repos itory such as GenBank.The
    primary disadvantages of the first two models are that data is widely
    dispersed and lacks uniform structure and retrieval modalities.In
    addition,the first case is complicated further by an uncertain life
    span
    for the data and the second case incurs new expenses for curating and
    maintaining this data that journals may not wish to bear.Clearly,the
    successful history of public sequence databases provides an attractive
    model for the most efficient management of and vonvenient access to
    large-scale expression data.However,it would be highly dwsirable to
    arrive at some type of data formast standards that are independent of
    particular expression technology.this has only been possible by using
    a single reference state as the source of one of the hybridization
    probe
    mixtures for all of the experiments to be compared.For example,such
    mixtures for all of the experiments to be compared.For example,such
    an approach has been used in comparing points along a time course,and
    in
    comparing multiple samples of  a particular type of tumour
    (unpublished
    observations).In diauxic shift experiments14,the reference sample was
    cDNA prepared from yeast cells harvested at the first interval after
    inoculation.Although the use of such a reference comparator alllows
    ratio comparisons within a series of experiments,there is clearly a
    need
    for a more broadly applicable reference standard to serve as a
    benchmark for all expression experiments .A number of microarray
    laboratories have given thought to formulating such a standard.An
    ideal standard would provide modest signals for every human gene,so
    that
    expresssion of  any gene in the experimental probe xould be assigned
    a rdliable ratio value.The standard would also need to be readily and
    reproducibly generated and easily disseminated.Efforts to produce such
    a
    reference standard are underway.

    Discussion

    Given the great potential of large-scale expression analysis,and
    biologists'desire to exploit this new technology,we anticipate a
    deluge of data soon.The acpacity to ask questions and perform analyses
    across hundreds,thousands,or tens of thousands of experiments should
    dramatically enhance our ability to identify 'fingerprints'of gene
    dramatically enhance our ability to identify 'fingerprints'of gene
    expression that exemplify particular diseases or other biological
    states.But first we will need to empirically define 'housekeeping'
    genes,identify reproducible artifacts and detect subtle patterns
    through
    the application of powerful statistical analysis techniques.

    This potential cannot be fully realized without efficient data
    management and analysis sysems.ArrayDB provides a first-gener-ation,
    convenient,flexible and extendable microarray data management and
    analysis system.Planned future extensions to the ArrayDB include more
    sophisticated links between the database and external data sources and
    more powerful data mining capabilinies.Currently,querying multiple
    databases such as NCBI's PubMed,GenBank,or dbEST databases can
    assemble a great deal of valuable information,but it can be a tedious
    and time-consuming process to repeatedly query each database for
    information on even a small number of genes.However,by fully
    exploiting the applications programming interfaces in the Entrez
    system,sophis ticated 'executive summaries' can,in principle,be
    generated.

    Althoug these types of reports can be generated by the thoughtful
    integration of external data resources,the larger probitself.In the
    world outside of biological databases,the term'data mining' has been
    information on even a small number of genes.However,by fully
    applied to this type of knowledge discovery28,Because of the
    complexity of the data,data mining tools are essential to fully
    exploit the power of microarray expression analysis.Data mining tools,
    similar to mathematical techniques that identify patterns in complex
    data sets,will enable identification of multiple expression profiles
    in complex  biological processes.This will provide a means to identify
    genes that share an expression profile,genes that are expressed in
    succession,or genes showing opposing expression profiles.For instance,
    cluster analysis29 of a time course experiment can identify different
    expression profiles exhibited by groups of genes.We are currently
    developing a data mining tool for the ArrayDB system to help address
    this need.
    --
    ※ 来源:·大话西游站 dhxy.dhs.org·[FROM: 大话西游站] --
    ※ 转载:·大话西游站 dhxy.dhs.org·[FROM: 大话西游站]

    --
    ※ 来源:·北大未名站 bbs.pku.edu.cn·[FROM: 159.226.61.251]


       收藏   分享  
    顶(0)
      




    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2004/9/23 2:05:00
     
     GoogleAdSense
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 生物信息学 』 的所有贴子 点击这里发送电邮给Google AdSense  访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/5/15 8:50:02

    本主题贴数1,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    140.625ms