发信人: wyr (遗忘小资), 信区: Database
标 题: Re: question on large tables (>=800 million records, 10 G b
发信站: BBS 未名空间站 (Sat Jan 20 10:12:43 2007), 转信
customized hash algorithm to help you partition your data based on the
features of your query. I do not know how mysql implement their algorithm.
Here is my 2 cents based on my understanding of Teradata ..
If you have a primary index (unique or not unique), you starting from trying
to distribute your data evenly into several segmentatoins using these
columns(I am assuming your query condition is primarily based on these
columns). let say 10 .
Then based on your PK, you build a hash algorithm to make the permutatoin of
all your PK columns map to 10 bucket .
For each bucket, try to build your btree within the bucket.
Your will have 10 copies of your partition search module which take care
only 1 parition.
Now you have a structure which contains 10 partitions and each partition has
its own btree as reference to its own data. and 10 worker thread
A query coming and access a specific record can go through your hash
algorithm to determin whick partition it will go to and then turned to a
woker thread to search a smaller index file.
If an aggregation query comes in, which involes data cross multiple
paritions, then, depends on where condition in the query, you may be able to
determin what are those paritions you need to send your request.
starting from here, using parallelism to make your application run faster. n
(logn) or what so ever.. if you can divide linearly by Node. if your n is
not infinitly.. then it is still runnable.
【 在 babycry (babycry) 的大作中提到: 】
: Question # 1:
: Why build a customized B-tree/Hash table ?
: How is it different from the B-tree implementation in a database server?
: Why the B-tree/Hash table implemented in mysql server is NOT good ?
: How can a customized B-tree/Hash table benefit ?
: Somebody cannot drive a car from Boston to S.F. in one hour
: does not necesserily mean you can do it if you drive by yourself.
: Question # 2:
: How upgrading hardware will make the application faster ...
: say from 5 minutes per query to 1 minute per query ?
※ 来源:·BBS 未名空间站 mitbbs.com·[FROM: 70.244.]