-
-
Couldn't load subscription status.
- Fork 127
Open
Labels
help wantedExtra attention is neededExtra attention is neededquestionFurther information is requestedFurther information is requested
Description
Does anyone have an understanding into the trade-offs of providing an instruction point to each core, or only to each computer unit?
Things that occur to me, in favor of only having a single program counter per compute unit:
- one fewer register needed for each core...
- probably easier to cache instructions
- easier to heuristically consolidate memroy requests
- but a cache would hanlde that anyway: if two threads request data from the same cache line, the first thread to make the request will load that cache line, and the second one can use it too; as long as the threads don't diverge in time too much
On the other hand, in favor of separate program counters in each core:
- increases potential parallelization; and eg one thread might warm up the cache block when it hits the load first, and then the second thread to hit the load will have decreases latency
- easier to handle
ifstatements, and branching: each thread just executes what it needs to execute. no need for all threads to execute anifblock that only one thread actually needs, and then throwing away their results
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is neededquestionFurther information is requestedFurther information is requested