当前在线人数13915
首页 - 分类讨论区 - 电脑网络 - 数据库版 -阅读文章
未名交友
[更多]
[更多]
文章阅读:Re: question on large tables (>=800 million records, 10 G
[同主题阅读] [版面: 数据库] [作者:babycry] , 2007年01月18日21:47:59
babycry
进入未名形象秀
我的博客
[上篇] [下篇] [同主题上篇] [同主题下篇]

发信人: babycry (babycry), 信区: Database
标  题: Re: question on large tables (>=800 million records, 10 G b
发信站: BBS 未名空间站 (Thu Jan 18 21:47:59 2007)



Thanks! I like this suggestion.

This is actually the approach we are currently using.
It is pretty ad hoc, however, it saves a lot of software-engineering time.
We dislike software-engineering since we are not accredited for doing it.
The current query time is normally 2~5 minutes.
This query time is not good for webapps,
but is acceptable for data mining.

Since we do not update/insert,
the data integrity issue of having several copies of the same data
is not a problem. 


If a database storage engine supports a data type like record/row number
(which are numbered continuely from 1 to maximum number of rows in the
record),
and if each record in a table have fixed length,
then indexing on this data type would cost zero storage,
and the accessing time to any record with specified row number is constant.
This idea is valid for read-only tables.


A search with google reveals that storage engines like
NitroEDB and BrightHouse should be promising,
since they are asserted to have unique indexing techniques,
and can manage multi-billion records.
They are scheduled to be available some times in 2007.

http://solutions.mysql.com/engines.html

We would hope those techniques to be free-of-charge for academic uses,
and to be helpful for future data sets.




【 在 Assailant (反恐精英 勇救人质 拆弹专家) 的大作中提到: 】
: how is the data collected, or updated/inserted? do you get a data feed at
: certain time of the day, or is this going to be a static set of data you
are
:  working with?
: have you considered breaking the data into smaller groups of files, and
load
:  only the needed data into a temp table when requested.
: cabID_0000.txt
: cabID_0001.txt
: ....
: each of those files would contain all the data associated with that cabID.
: and when a request comes in, read the file(s) and write them to a table
for
: ...................



--

※ 来源:·BBS 未名空间站 http://mitbbs.com·[FROM: 18.85.]

[上篇] [下篇] [同主题上篇] [同主题下篇]
[转寄] [转贴] [回信给作者] [修改文章] [删除文章] [同主题阅读] [从此处展开] [返回版面] [快速返回] [收藏] [举报]
 
回复文章
标题:
内 容:

未名交友
将您的链接放在这儿

友情链接


 

Site Map - Contact Us - Terms and Conditions - Privacy Policy

版权所有,未名空间(mitbbs.com),since 1996