[GSoC]use hash/map in POSIX key and Classic notepad

阿四 ashi08104 at gmail.com
Sun Apr 8 13:52:08 UTC 2012


2012/4/7 Joel Sherrill <joel.sherrill at oarcorp.com>

>  On 04/06/2012 11:24 AM, 阿四 wrote:
>
>
>
> 2012/4/6 Gedare Bloom <gedare at rtems.org>
>
>>  On Fri, Apr 6, 2012 at 11:27 AM, 阿四 <ashi08104 at gmail.com> wrote:
>> >
>> >
>> > 2012/4/6 Gedare Bloom <gedare at rtems.org>
>> >>
>> >> On Thu, Apr 5, 2012 at 12:52 PM, Joel Sherrill
>> >> <joel.sherrill at oarcorp.com> wrote:
>> >> > On 04/05/2012 11:09 AM, Gedare Bloom wrote:
>> >> >
>> >> > On Thu, Apr 5, 2012 at 10:44 AM, 阿四 <ashi08104 at gmail.com> wrote:
>> >> >
>> >> > And another idea of hash + map, if the map part is a sub rbtree, then
>> >> > the
>> >> > size of hash table can be not too big, the search operation could
>> >> > benifit
>> >> > from the rbtree's O(lg(n)) speed, could it be a candidate of the
>> >> > solution?
>> >> >
>> >> > You mean to use a hash-array with an rbtree to handle collisions?
>> >> >
>> >> > Yeah, I mean it.
>> >> >
>> >> > It
>> >> > could work, and be faster on average than just using an rbtree
>> without
>> >> > losing any worst-case time. The space overhead might be quite a bit
>> >> > more to have multiple rbtree control headers and also the extra
>> >> > overhead of rbtree nodes. I think that any rbtree- or chain-based
>> >> > approach shifts the problem of determining the number of Keys to be
>> >> > the problem of determining the number of nodes, since we cannot
>> assume
>> >> > that the Key data has a node embedded inside of it.
>> >> >
>> >> > Gedare, I didn't quite understand this. Does rbtree approach has the
>> >> > problem
>> >> > of determining the number of nodes? I think when the key_create()
>> >> > function
>> >> > called, only a empty rbtree create. And when key_setspecific()
>> function
>> >> > called, a node is added to rbtree. Or you mean create a 'proper' big
>> >> > rbtree
>> >> > when key_create(), which frontloads the insert node cost to
>> key_create()
>> >> > as
>> >> > you mention above? I understand when using the hash approach,
>> determing
>> >> > the
>> >> > number of slots(I'm confused with the Keys you use here, I guess Keys
>> >> > here
>> >> > is the slots in hash table) of hash table is the problem of
>> determining
>> >> > the
>> >> > number of nodes.
>> >> >
>> >> > It depends on the restrictions on key_create and key_setspecific; if
>> >> > creation is allowed to block/fail then dynamic allocation might be
>> ok.
>> >> > I haven't looked closely enough to tell. In that case you can create
>> >> > an rbtree on-the-fly by allocating it when there is a collision and
>> >> > inserting the colliding nodes to it. You could also just as easily
>> use
>> >> > a linked-list (chain). The colliding nodes will need to be embedded
>> >> > inside some other structure like
>> >> > struct {
>> >> >   RBTree_Node r; // or Chain_Node
>> >> >   void * user_data;
>> >> > }
>> >> >
>> >> > this structure would have to be allocated as well whenever inserting
>> a
>> >> > new key (setspecific).  We need to better define the restrictions on
>> >> > the keys in order to understand what kind of solutions we can use.
>> >> > Restrictions means timing and memory constraints, whether operations
>> >> > can block..maybe others.
>> >> >
>> >> >
>> >> >
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_setspecific.html
>> >> > allows pthread_setspecific to return ENOMEM. Is that a hint?
>> >> >
>> >> OK; then it just matters how 'fast' we think it should be on avg or in
>> >> worst-case.
>> >>
>> >> > I don't see why we don't just put a pointer to the Key in the tcb.
>> >> >
>> >> > Does it mean delete the POSIX Key data structure, and let the Thread
>> >> > Specific Data be managed in TCB?
>> >> >
>> >> > Yeah. We could define generic thread-specific data at the score
>> level.
>> >> > I'm not sure if this is a good idea.
>> >> >
>> >> > It isn't if generic. It could be part of the POSIX API thread
>> extension
>> >> > structure.
>> >> >
>> >> > But it helps with unlimited threads but not with unlimited keys
>> >> > if you use a simple array.
>> >> >
>> >> > But a hash/rbtree for "keys used by this thread" might be an
>> >> > implementation
>> >> > option. That is different than "threads using this key" which puts
>> >> > the management structure in keys.
>> >> >
>> >> Yes that makes sense. I could see having an rbtree of the keys in a
>> >> posix thread extension. This would also avoid problems with collisions
>> >> between threads, especially since threads are allowed to specify the
>> >> same key for different data. Holding the keys in separate per-thread
>> >> structures seems the best solution. If the number of keys is expected
>> >> to be small then a linked list/chain would be fine too.
>> >>
>> > Gedare, I haven't thought about the problem of collisions about same
>> thread
>> > specify the same key for different data. In current implementation, if
>> the
>> > same thread specify different data in the same key by
>> pthread_setspecific(),
>> > the old data is replaced by new data, isn't it? And is it allowed?
>> since it
>> > would lead to memory leak. However, I find this project can support this
>> > kind operation by adding some index.
>> >
>>  I was unclear: different threads can use the same key for different
>> data. So you must be able to distinguish them. This would be hard if
>> all keys are managed in a single structure, but if there is a
>> structure of keys per-thread it is less difficult.
>>
>
> Sorry, Gedare, I feel a structure of keys per-thread would make thing
> easy. However, I'm still confused with the hard of all keys are managed in
> a single structure. I need some help to make myself clear about key manager.
>
>  First, in current implementation I know the the key is identified by
> the_key->Object.id (I copy it from keycreate.c, line 96), and the_key's
> type is Key_Control, namely, the key is identified by a Object id, which is
> managed by Chain Manager. So keys can be distinguished by Object id. And
> for different thread data in the same key, they are distinguished by
> _Thread_Executing->Object.id (this is the current executing thread's TCB's
> object id). Then all things are distinguished in current implementation.
>
> Each key has a unique id and is used for a unique purpose. Think of
> it in terms of say a Java run-time where each Java thread is implemented
> as a POSIX thread. The run-time would use a key to hold a pointer to
> the Java run-time specific data.
>
> So a key holds a pointer to the same type of data for the same purpose.
> Each thread can have its own "instance of the library's context"
>
> No thread will have two values for the same key simultaneously.
> Each set operation overwrites the previous value:
>
> The scenarios of interest are:
>
> + key create (what do do)
> + key delete (clean up all references to this key)
> + key get specific - should be very fast
> + key set specific - not so fast because generally only
>    done once when you initialize a thread to use
>    a specific library
> + thread delete - if key data is held in the TCB,
>    then it needs to be freed.  Clean up generally.
>
Joel, thanks. This makes me more clear.

Another question about this projects. Since the accept result of GSoC is
not available until *April 20 at 19:01 UTC. Shall I continue discussing
this project on maillist before april 20. Since I know you may be very busy
dealing with* many GSoC proposals.



>
>
>  And in this project, like in the hash approach, I think different keys
> can be distinguished as before, and very key have a member pointer to hash
> table, and different thread's data in the same key is distinguish by its
> thread id (when in collisions, for different thread, their data can be
> distinguished by thread id again.) But for the different data of same
> thread in same key, I think they need extra information to be
> distinguished.
>
> Am I missed something?
> Or in this project, different keys can't be distinguished as before. I
> feel there's may be an unacceptiable cost of lookup with unlimited key
> configuration.
>
> Best Regard
> --zw_yao
>
>>
>> I suspect you can just use a chain (linked list) of keys for each
>> thread, with the chain control (linked list head) somewhere the thread
>> can get to easily. Converting the Key_Control to represent a single
>> key instead of all keys might be an easy way to go; it already embeds
>> a Chain_Node and some other metadata, and the Object allocation can
>> deal with limits on numbers of keys.
>>
>> >> > You also have to consider the case of deleting threads and deleting
>> >> > keys and what is required to clean up outstanding memory.
>> >> > It is simple now. But it is possible that you have a "management
>> >> > structure" in one object (key or thread) and a "clean up" structure
>> >> > on the other.
>> >> >
>> >> > I am sure you don't want to do a scan of all thread/keys and
>> >> > then see if it is impacted by a key/thread delete.
>> >> >
>> >> > I hope that makes sense.
>> >> > structure/list
>> >> >
>> >> >
>> >> > -Gedare
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Joel Sherrill, Ph.D.             Director of Research&  Development
>> >> > joel.sherrill at OARcorp.com        On-Line Applications Research
>> >> > Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>> >> >     Support Available             (256) 722-9985
>> >> >
>> >
>> >
>>
>
>
>
> --
> Joel Sherrill, Ph.D.             Director of Research&  Developmentjoel.sherrill at OARcorp.com        On-Line Applications Research
> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>     Support Available             (256) 722-9985
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20120408/0173940f/attachment-0001.html>


More information about the devel mailing list