the hash approach of POSIX key

Wed May 23 15:16:01 UTC 2012

On Wed, May 23, 2012 at 10:21 AM, Ashi <ashi08104 at gmail.com> wrote:

>
>
> 2012/5/22 Gedare Bloom <gedare at rtems.org>
>
>> On Mon, May 21, 2012 at 9:17 PM, Ashi <ashi08104 at gmail.com> wrote:
>> >
>> >
>> > 2012/5/21 Gedare Bloom <gedare at rtems.org>
>> >>
>> >> On Mon, May 21, 2012 at 10:16 AM, Ashi <ashi08104 at gmail.com> wrote:
>> >> >
>> >> >
>> >> > 2012/5/21 Gedare Bloom <gedare at rtems.org>
>> >> >>
>> >> >> On Mon, May 21, 2012 at 9:30 AM, Ashi <ashi08104 at gmail.com> wrote:
>> >> >> > Hi, all.
>> >> >> > I'm working on the POSIX key project(here is a introduction of
>> >> >> > it:[1]).
>> >> >> > Since using hash is a good candidate of the solution, I'm trying
>> to
>> >> >> > do
>> >> >> > the
>> >> >> Hashing is a good candidate solution for what problem? It is not
>> clear
>> >> >> to me what the hash solves, that is why to use a hash at all?
>> >> >
>> >> >
>> >> > Sorry for the obscure description. Here, we use hash table to
>> dynamicly
>> >> > manage POSIX key, then the key can be added and deleted flexibly
>> without
>> >> > losing the speed of key query, add, delete operation. While key and
>> it's
>> >> > data field is fixed in current implementation, more description of
>> the
>> >> > problem of current POSIX implementation is in [1]. This hash
>> approach is
>> >> > an
>> >> > alternative to another approach(the rbtree approach), I have decide
>> >> > which is
>> >> > better.(Probably, both should be implemented).
>> >> >
>> >> A hash solves the problem of giving you a (unique) key based on some
>> >> input data. You then can use that key to map into a searching
>> >> structure in order to lookup a value associated with that key.
>> >
>> >
>> > If a proper hash function applied, then all key is well distributed in
>> the
>> > hash table,  can I search a value associated with that key by
>> traversing all
>> > values? It seems we can't do like that, but I'm not clear about the
>> exact
>> > reason.
>> >
>> >>
>> >> Regardless of whether you use a hash to create the key or derive it
>> >> from some other means, you still need to implement an efficient search
>> >> mechanism. I don't know of any structure better than a balanced search
>> >> tree to provide bounded worst-case performance times at least for
>> >> arbitrary keys.
>> >>
>> >> I wonder if you need to hash anything, or if you can rely on the
>> >> Objects_Id of the Posix_Keys_Control structure itself, or get the key
>> >> from the user. Then the map function can just use the Objects_Id as
>> >> the search key, or for an score implementation can accept a key as an
>> >> argument.
>> >
>> > Is hash not necessary? But this wiki page[3] mentioned we need hash. I
>> need
>> > help to make it clear.
>> >
>> I don't think a hash is necessary. It's not even clear to me what
>> would be hashed.
>
>
> As UThash said anything can be hashed, I think the pair of thread_id and
> POSIX_Keys_Control's Object.id is unique globally, we can use a structure
> contains this two member as the key.
>
> I think it might be good enough to use the POSIX_Keys_Control's Objects_Id
as a key. You could just use Objects_Id to obtain the POSIX_Keys_Control
containing the user's value by indexing directly in an Objects_Information
table. lookup time is O(1), memory requirements scale with the number of
keys, and we would rely on the Object Manager to handle all of the memory
management and data structures.


> The pthread_key_t *key is opaque to the application,
>> so the POSIX_Keys Manager can create any arbitrary key: the trick is
>> to create a unique key that can be used to access the stored value
>> efficiently.
>>
>> If you embed an Objects_Control in each POSIX_Keys_Control then you
>> can use the objects_id field as a key directly. Since every
>> Keys_Control would have a unique object_id, you satisfy one
>> requirement of the key.
>>
> The next requirement is efficient searching. I
>> wonder if you could just use the Object Manager directly. It already
>> implements a way to "get" an object given its id. So if the user is
>> given a pthread_key_t that is just a copy of an objects_id, then you
>> can rely on the Object Manager to handle storing/retrieving the
>> POSIX_Keys_Control for that key (objects_id). This should work fine
>> for posix keys; I'm not sure about notepads.
>>
> The notepad is removed from my project, Joel said investing effort in
> notepads is not worth.
>
> Well, if the POSIX Keys can be solved by using the Object Manager as I
describe above, then you will need to find something else to occupy your
time when you complete it :)


>
> >Searching by hash tables gives good average case performance, but please
> remember that RTEMS is supposed to be a real-time operating system.  Thus
> we are more interested in the worst-case performance and deterministic
> behaviour.
>
> I try to do the analysis on the worst-case performance:
> for the searching of key's value,  I think the worst case is when all hash
> keys are in one bucket in the hash table, then if we use the red black tree
> structure to do the searching, we can get the O(nlgn) performance. Is this
> idea viable? And after read the code of UThash, I find the
> search(HASH_FIND_IN_BKT
>
It would be O(lg n).


> function) in the UThash is done by iterating all item in one bucket. So if
> we modify this part of UThash, let it do searching in the red black tree
> structure, we can get the O(nlgn) worst-case performance. This worse-case
> performance is same as the previous rbtree approach of this project[4], but
> generally if all keys in hash table is well distributed(in the UThash
> doc[5], I find about 90% of keys are well distributed), the n of O(nlgn),
> which is equal to the number of keys in the same bucket, can be much
> smaller than the number of all key, say N. I think this can be the
> advantage of hash. I don't know whether this could be the reason of
> implementing the hash version of POSIX key?
>
> We'd have to weigh the costs of executing the hash function against the
frequency of searches, and also how expensive it would be to maintain one
rbtree per bucket. The latter I think could be bad for memory. Maybe you
could implement one rbtree that captures all of the collisions?


> [4]http://www.rtems.org/pipermail/rtems-devel/2012-May/000998.html
> [5]http://uthash.sourceforge.net/userguide.html
>
>>
>> -Gedare
>>
>> > links:
>> > [3]http://www.rtems.com/wiki/index.php/UseHashOrMapInNotepadsAndKeys
>> >>
>> >>
>> >> -Gedare
>> >>
>> >> >> > design using hash. I'm not clear about how to deal with UThash[2]
>> in
>> >> >> > RTEMS.
>> >> >> > Shall we directly include this lib or reimplemented in RTEMS? If
>> >> >> > assumed
>> >> >> > directly use, I think we can manage all keys in one hash table as
>> >> >> > like
>> >> >> > in
>> >> >> > the rbtree approach. Since UThash is able to use structure as it's
>> >> >> > hash
>> >> >> > key,
>> >> >> > then the POSIX_Keys_Control can be defined as:
>> >> >> >
>> >> >> > typedef struct {
>> >> >> >     Objects_Control Object;
>> >> >> >     void (*destructor) (void *);
>> >> >> >     hash_key_t hash_key; /* use struct hash_key_t as a hash key */
>> >> >> >     void *value;
>> >> >> >     UT_hash_handle hh;
>> >> >> > } POSIX_Keys_Control;
>> >> >> >
>> >> >> > the hash_key_t type is a structure include Object id and
>> >> >> > pthread_key_t:
>> >> >> >
>> >> >> > typedef struct {
>> >> >> >     Objects_Id thread_id;
>> >> >> >     pthread_key_t key;
>> >> >> > } hash_key_t;
>> >> >> >
>> >> >> > Then all the keys' data is managed in one hash table. I've a
>> question
>> >> >> > about
>> >> >> > this design: is it a waste of memory that every key data holds a
>> >> >> > POSIX_Keys_Control structure? If it is, I think we could manage
>> it in
>> >> >> > a
>> >> >> > two
>> >> >> > level structure: first level manages all keys by one hash table,
>> and
>> >> >> > each
>> >> >> > key manages its values in second level by one bash table:
>> >> >> >
>> >> >> > typedef struct {
>> >> >> >     Objects_Control Object;
>> >> >> >     Objects_Id Object.id;    /* use Object.id as hash key, I don't
>> >> >> > know
>> >> >> > whether we could directly use Object as a hash key? I know it's no
>> >> >> > problem
>> >> >> > in UThash  */
>> >> >> >     key_node_t * key_node;
>> >> >> >     UT_hash_handle hh;
>> >> >> > } POSIX_Keys_Control;
>> >> >> >
>> >> >> > typedef struct {
>> >> >> >    Objects_Id thread_id; /* use thread_id as hash key */
>> >> >> >    void *value;
>> >> >> >    void (*destructor) (void *);
>> >> >> >    UT_hash_handle hh;
>> >> >> > } key_node_t;
>> >> >> >
>> >> >> > And as Gedare suggested in my last discuss, the implementation
>> should
>> >> >> > consider do actually work in in the supercore layer and shared
>> >> >> > between
>> >> >> > Posix
>> >> >> > Keys and Classic Task vars, however, I haven't think about it yet,
>> >> >> > I'm a
>> >> >> > little behind my work.
>> >> >> >
>> >> >> >
>> >> >> > links:
>> >> >>   [1]: http://www.rtems.com/ml/rtems-users/2012/april/msg00110.html
>> >> >> > [2]:http://uthash.sourceforge.net/
>> >> >> >
>> >> >> > --
>> >> >> > Best wishes!
>> >> >> > zw_yao
>> >> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Best wishes!
>> >> > zw_yao
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > rtems-devel mailing list
>> >> > rtems-devel at rtems.org
>> >> > http://www.rtems.org/mailman/listinfo/rtems-devel
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Best wishes!
>> > zw_yao
>> >
>>
>
>
>
> --
> Best wishes!
> zw_yao
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20120523/76480a55/attachment-0001.html>