the hash approach of POSIX key

Thu May 24 09:23:07 UTC 2012

2012/5/23 Gedare Bloom <gedare at rtems.org>

>
>
> On Wed, May 23, 2012 at 10:21 AM, Ashi <ashi08104 at gmail.com> wrote:
>
>>
>>
>> 2012/5/22 Gedare Bloom <gedare at rtems.org>
>>
>>> On Mon, May 21, 2012 at 9:17 PM, Ashi <ashi08104 at gmail.com> wrote:
>>> >
>>> >
>>> > 2012/5/21 Gedare Bloom <gedare at rtems.org>
>>> >>
>>> >> On Mon, May 21, 2012 at 10:16 AM, Ashi <ashi08104 at gmail.com> wrote:
>>> >> >
>>> >> >
>>> >> > 2012/5/21 Gedare Bloom <gedare at rtems.org>
>>> >> >>
>>> >> >> On Mon, May 21, 2012 at 9:30 AM, Ashi <ashi08104 at gmail.com> wrote:
>>> >> >> > Hi, all.
>>> >> >> > I'm working on the POSIX key project(here is a introduction of
>>> >> >> > it:[1]).
>>> >> >> > Since using hash is a good candidate of the solution, I'm trying
>>> to
>>> >> >> > do
>>> >> >> > the
>>> >> >> Hashing is a good candidate solution for what problem? It is not
>>> clear
>>> >> >> to me what the hash solves, that is why to use a hash at all?
>>> >> >
>>> >> >
>>> >> > Sorry for the obscure description. Here, we use hash table to
>>> dynamicly
>>> >> > manage POSIX key, then the key can be added and deleted flexibly
>>> without
>>> >> > losing the speed of key query, add, delete operation. While key and
>>> it's
>>> >> > data field is fixed in current implementation, more description of
>>> the
>>> >> > problem of current POSIX implementation is in [1]. This hash
>>> approach is
>>> >> > an
>>> >> > alternative to another approach(the rbtree approach), I have decide
>>> >> > which is
>>> >> > better.(Probably, both should be implemented).
>>> >> >
>>> >> A hash solves the problem of giving you a (unique) key based on some
>>> >> input data. You then can use that key to map into a searching
>>> >> structure in order to lookup a value associated with that key.
>>> >
>>> >
>>> > If a proper hash function applied, then all key is well distributed in
>>> the
>>> > hash table,  can I search a value associated with that key by
>>> traversing all
>>> > values? It seems we can't do like that, but I'm not clear about the
>>> exact
>>> > reason.
>>> >
>>> >>
>>> >> Regardless of whether you use a hash to create the key or derive it
>>> >> from some other means, you still need to implement an efficient search
>>> >> mechanism. I don't know of any structure better than a balanced search
>>> >> tree to provide bounded worst-case performance times at least for
>>> >> arbitrary keys.
>>> >>
>>> >> I wonder if you need to hash anything, or if you can rely on the
>>> >> Objects_Id of the Posix_Keys_Control structure itself, or get the key
>>> >> from the user. Then the map function can just use the Objects_Id as
>>> >> the search key, or for an score implementation can accept a key as an
>>> >> argument.
>>> >
>>> > Is hash not necessary? But this wiki page[3] mentioned we need hash. I
>>> need
>>> > help to make it clear.
>>> >
>>> I don't think a hash is necessary. It's not even clear to me what
>>> would be hashed.
>>
>>
>> As UThash said anything can be hashed, I think the pair of thread_id and
>> POSIX_Keys_Control's Object.id is unique globally, we can use a structure
>> contains this two member as the key.
>>
>> I think it might be good enough to use the POSIX_Keys_Control's
> Objects_Id as a key. You could just use Objects_Id to obtain the
> POSIX_Keys_Control containing the user's value by indexing directly in an
> Objects_Information table. lookup time is O(1), memory requirements scale
> with the number of keys, and we would rely on the Object Manager to handle
> all of the memory management and data structures.
>

Oh, Gedare, I find if we do like you said, it seems that there is not much
work to do!
To make it clear, all approaches are listed here:

0. The current implementation:
the Keys_Control structure is as below, all POSIX Key is managed by Object
Manager
POSIX_typedef struct {
   Objects_Control     Object;
   void              (*destructor)( void * );
   void              **Values[ OBJECTS_APIS_LAST + 1 ];
}  POSIX_Keys_Control;
the problem of current implementation is:
 - all data(value) of one POXIS Key is pre-allocated, which leads to a
waste of memory, because not all POSIX thread will use POSIX Keys' value
 - since all data is pre-allocated, size of POSIX Keys' value is fixed,
which is a bug when POSIX threads are configured as "unlimited". Because
POSIX thread can be dynamicly added, while its POSIX Key can't.

1. Manage POSIX key data only by Object Manager:
As Gedare said above, then the Keys_Control structure is:
POSIX_typedef struct {
   Objects_Control     Object;
   void              (*destructor)( void * );
   void              *Value; /* not void **values as before */
}  POSIX_Keys_Control;
Each POSIX Key's Value is associated with a Objects_Control, then POSIX
Key's Value can be obtained by indexing directly in an Objects_Information
table, namely all POSIX Key's Value is managed by Objector Manager. No hash
table here, no rbtree here, and performace is O(1). As far as i can see,
there are problems:
 - it seems that the notion of POSIX Key is obscure now, there is only
Thread's value. Is it unacceptable to RTEMS?
 - I think each value associated with a Objects_Control object can be a
heavy overhead in memory usage

2. the rbtree approach:
As discussed before[4], the POSIX_Keys_Control can like this:
typedef struct {
          Objects_Control     Object;
          void     (*destructor)( void * );
         rtems_rbtree_node Node;
         Objects_Id thread_id;
         void *Value;
} POSIX_Keys_Control;
after notice approach 1 above, I realize this design is defective: this is
also can just be managed by Object Manger, because each value is associated
with on Objects_Control Object, then no rbtree is needed. I think this
approach can be modified to reduce memory overhead and still profit from
rbtree:
 - first, let the pthread_key_t get a unique value, maybe can be binded to
an Objects_Control structure
 - second, create new structure like this:
typedef struct {
        rtems_rbtree_node Node;
        void     (*destructor)( void * );
        Objects_Id thread_id;
        pthread_key_t key;
        void *Value;
}my_value_node;
   then the rbtree compare function can compare two node by comparing
thread_id and key. Then all the key's values are also managed in one rbtree
as previous discussed[4].
I think this approach's problem is:
 - we may need to weigh the memory usage of this approach and approach 1,
if approach 1 work well, then this approach is not neccesary.

3. the hash approach:
As Gedare said, hashing the POSIX key may not be neccesary, because if all
POSIX Keys associated with a Object_Control, then all keys can be managed
by Object Mananger in RTEMS. We can hash the POSIX key's value, use
thread_id as the hash key, like:
typedef struct {
       Objects_Id thread_id;
       void (*destructor) (void *);
       void *value;
      UT_hash_handle hh;
} key_value_t;
and to get better worst-case performance, We need modify the UThash's
behavior in find item in same hash table bucket, there are two ideas now:
 - manage one bucket's collisions in one rbtree
 - manage all buckets' collisions in one rbtree
I can't determine which is better now, maybe after implement these two
strategies, then do some test on them to determine which is better.
this approach's problem is:
 - same as approach 2, if approach 1 works well, this approach may also not
be neccesary.
 - this approach seems more complex than previous one, and the worst-case
performance is still same as approach 2, but I think we could get better
average performance than approach 2(but not sure about this). However, the
worst-case is more important in RTEMS.

This reply takes much more time than I expected. I think it also can be my
project's design summary now. Thanks for any comment on this! I hope we can
determine which approaches are viable or best soon, and I can start coding
earlier;-)

>
>> The pthread_key_t *key is opaque to the application,
>>> so the POSIX_Keys Manager can create any arbitrary key: the trick is
>>> to create a unique key that can be used to access the stored value
>>> efficiently.
>>>
>>> If you embed an Objects_Control in each POSIX_Keys_Control then you
>>> can use the objects_id field as a key directly. Since every
>>> Keys_Control would have a unique object_id, you satisfy one
>>> requirement of the key.
>>>
>> The next requirement is efficient searching. I
>>> wonder if you could just use the Object Manager directly. It already
>>> implements a way to "get" an object given its id. So if the user is
>>> given a pthread_key_t that is just a copy of an objects_id, then you
>>> can rely on the Object Manager to handle storing/retrieving the
>>> POSIX_Keys_Control for that key (objects_id). This should work fine
>>> for posix keys; I'm not sure about notepads.
>>>
>> The notepad is removed from my project, Joel said investing effort in
>> notepads is not worth.
>>
>> Well, if the POSIX Keys can be solved by using the Object Manager as I
> describe above, then you will need to find something else to occupy your
> time when you complete it :)
>
Yeah, of course!

>
>
>>
>> >Searching by hash tables gives good average case performance, but please
>> remember that RTEMS is supposed to be a real-time operating system.  Thus
>> we are more interested in the worst-case performance and deterministic
>> behaviour.
>>
>> I try to do the analysis on the worst-case performance:
>> for the searching of key's value,  I think the worst case is when all
>> hash keys are in one bucket in the hash table, then if we use the red black
>> tree structure to do the searching, we can get the O(nlgn) performance. Is
>> this idea viable? And after read the code of UThash, I find the
>> search(HASH_FIND_IN_BKT
>>
> It would be O(lg n).
>
I will learn related knowledge more carefully !

> function) in the UThash is done by iterating all item in one bucket. So if
>> we modify this part of UThash, let it do searching in the red black tree
>> structure, we can get the O(nlgn) worst-case performance. This worse-case
>> performance is same as the previous rbtree approach of this project[4], but
>> generally if all keys in hash table is well distributed(in the UThash
>> doc[5], I find about 90% of keys are well distributed), the n of O(nlgn),
>> which is equal to the number of keys in the same bucket, can be much
>> smaller than the number of all key, say N. I think this can be the
>> advantage of hash. I don't know whether this could be the reason of
>> implementing the hash version of POSIX key?
>>
>> We'd have to weigh the costs of executing the hash function against the
> frequency of searches, and also how expensive it would be to maintain one
> rbtree per bucket. The latter I think could be bad for memory. Maybe you
> could implement one rbtree that captures all of the collisions?
>
>
>> [4]http://www.rtems.org/pipermail/rtems-devel/2012-May/000998.html
>> [5]http://uthash.sourceforge.net/userguide.html
>>
>>>
>>> -Gedare
>>>
>>> > links:
>>> > [3]http://www.rtems.com/wiki/index.php/UseHashOrMapInNotepadsAndKeys
>>> >>
>>> >>
>>> >> -Gedare
>>> >>
>>> >> >> > design using hash. I'm not clear about how to deal with
>>> UThash[2] in
>>> >> >> > RTEMS.
>>> >> >> > Shall we directly include this lib or reimplemented in RTEMS? If
>>> >> >> > assumed
>>> >> >> > directly use, I think we can manage all keys in one hash table as
>>> >> >> > like
>>> >> >> > in
>>> >> >> > the rbtree approach. Since UThash is able to use structure as
>>> it's
>>> >> >> > hash
>>> >> >> > key,
>>> >> >> > then the POSIX_Keys_Control can be defined as:
>>> >> >> >
>>> >> >> > typedef struct {
>>> >> >> >     Objects_Control Object;
>>> >> >> >     void (*destructor) (void *);
>>> >> >> >     hash_key_t hash_key; /* use struct hash_key_t as a hash key
>>> */
>>> >> >> >     void *value;
>>> >> >> >     UT_hash_handle hh;
>>> >> >> > } POSIX_Keys_Control;
>>> >> >> >
>>> >> >> > the hash_key_t type is a structure include Object id and
>>> >> >> > pthread_key_t:
>>> >> >> >
>>> >> >> > typedef struct {
>>> >> >> >     Objects_Id thread_id;
>>> >> >> >     pthread_key_t key;
>>> >> >> > } hash_key_t;
>>> >> >> >
>>> >> >> > Then all the keys' data is managed in one hash table. I've a
>>> question
>>> >> >> > about
>>> >> >> > this design: is it a waste of memory that every key data holds a
>>> >> >> > POSIX_Keys_Control structure? If it is, I think we could manage
>>> it in
>>> >> >> > a
>>> >> >> > two
>>> >> >> > level structure: first level manages all keys by one hash table,
>>> and
>>> >> >> > each
>>> >> >> > key manages its values in second level by one bash table:
>>> >> >> >
>>> >> >> > typedef struct {
>>> >> >> >     Objects_Control Object;
>>> >> >> >     Objects_Id Object.id;    /* use Object.id as hash key, I
>>> don't
>>> >> >> > know
>>> >> >> > whether we could directly use Object as a hash key? I know it's
>>> no
>>> >> >> > problem
>>> >> >> > in UThash  */
>>> >> >> >     key_node_t * key_node;
>>> >> >> >     UT_hash_handle hh;
>>> >> >> > } POSIX_Keys_Control;
>>> >> >> >
>>> >> >> > typedef struct {
>>> >> >> >    Objects_Id thread_id; /* use thread_id as hash key */
>>> >> >> >    void *value;
>>> >> >> >    void (*destructor) (void *);
>>> >> >> >    UT_hash_handle hh;
>>> >> >> > } key_node_t;
>>> >> >> >
>>> >> >> > And as Gedare suggested in my last discuss, the implementation
>>> should
>>> >> >> > consider do actually work in in the supercore layer and shared
>>> >> >> > between
>>> >> >> > Posix
>>> >> >> > Keys and Classic Task vars, however, I haven't think about it
>>> yet,
>>> >> >> > I'm a
>>> >> >> > little behind my work.
>>> >> >> >
>>> >> >> >
>>> >> >> > links:
>>> >> >>   [1]:
>>> http://www.rtems.com/ml/rtems-users/2012/april/msg00110.html
>>> >> >> > [2]:http://uthash.sourceforge.net/
>>> >> >> >
>>> >> >> > --
>>> >> >> > Best wishes!
>>> >> >> > zw_yao
>>> >> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Best wishes!
>>> >> > zw_yao
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > rtems-devel mailing list
>>> >> > rtems-devel at rtems.org
>>> >> > http://www.rtems.org/mailman/listinfo/rtems-devel
>>> >> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Best wishes!
>>> > zw_yao
>>> >
>>>
>>
>>
>>
>> --
>> Best wishes!
>> zw_yao
>>
>>
>

-- 
Best wishes!
zw_yao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20120524/0445904f/attachment-0001.html>