当前在线人数16563
首页 - 分类讨论区 - 电脑网络 - 数据库版 -阅读文章
未名交友
[更多]
[更多]
文章阅读:Re: question on large tables (>=800 million records, 10 G
[同主题阅读] [版面: 数据库] [作者:babycry] , 2007年01月19日09:59:11
babycry
进入未名形象秀
我的博客
[上篇] [下篇] [同主题上篇] [同主题下篇]

发信人: babycry (babycry), 信区: Database
标  题: Re: question on large tables (>=800 million records, 10 G b
发信站: BBS 未名空间站 (Fri Jan 19 09:59:11 2007)


1.
Well, as I said, indexing large tables is not a good idea.
If an index cannot be all loaded into the memory,
then we are going to have O(log n) disk access,
compared with O(n) memory access.
This explains why data base servers require small index files and small
tables.

I can cite this link if you do not understand
why we normally do table scans for large tables.

http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/


2.
: >>Then the index file is going to be more than 5 G bytes, and cannot be
fit
: into the memory.
: Did you index those 5 columns plus the [record_id] column all together in
: one index?

You severely underestimated the size of an index file.
Let us estimate the size of the index file for ONLY ONE field:

redord_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY

According to the manual,

http://dev.mysql.com/doc/refman/5.0/en/estimating-performance.html

the size is 800 million * 8 * 3/2 = 9.6 G bytes.

I halved the size of the index file when I originally asked the question.

Just think how a B-tree works to understand the formula.
You can also serch mysql internals manual to have a better understanding
of the file format of an index file.

Of course an index file of cabId can be much smaller by suffix compression.
But we need to uncompress the file in the memory when we use it.

3.
I also wanted to emphasize that the following three indices are different:
* an index on (cabId, timestamp),
* an index on (timestamp, cabId),
* two indices on (timestamp) and on (cabId) respectively

The index on (cabId, timestamp) means first index on cabId, and
for records with the same cabId, index on timestamp.
The index on (cabId, timestamp) does not necesserily "accelerate"
an query conditioned on timestamp, assuming index CAN accelerate  queries.

4.
The easiest way to test the situation is to make such an table and
play with it.



【 在 tolive (Dream Weaver) 的大作中提到: 】
: >>Then the index file is going to be more than 5 G bytes, and cannot be
fit
: into the memory.
: Did you index those 5 columns plus the [record_id] column all together in
: one index?
: Assume a table with only 2 columns:
: cabId CHAR(8); record_id INT
: with the 800 million records in this table, do you think you can index the
: cabId and search by cabId in this table without problem?



--

※ 来源:·BBS 未名空间站 http://mitbbs.com·[FROM: 18.85.]

[上篇] [下篇] [同主题上篇] [同主题下篇]
[转寄] [转贴] [回信给作者] [修改文章] [删除文章] [同主题阅读] [从此处展开] [返回版面] [快速返回] [收藏] [举报]
 
回复文章
标题:
内 容:

未名交友
将您的链接放在这儿

友情链接


 

Site Map - Contact Us - Terms and Conditions - Privacy Policy

版权所有,未名空间(mitbbs.com),since 1996