Поиск патентов

A software engine for decomposing work to be done into tasks, and distributing the tasks to multiple, independent CPUs for execution is described. The engine utilizes dynamic code generation, with run-time specialization to variables, to achieve high performance. Problems are decomposed according to methods that enhance parallel CPU operation, and provide better opportunities for specialization and optimization of dynamically generated code. A specific application of this engine, a software three dimensional (3D) graphical image renderer, is described.

Подробнее

Номер записи: 4

06-07-2018 дата публикации

Of the parallel program generating method and parallelizing compiler apparatus

Номер: CN0108255492A

Автор:

Принадлежит:

Подробнее

Номер записи: 5

18-11-2015 дата публикации

Hardware and software solutions to divergent branches in a parallel pipeline

Номер: CN0105074657A

Автор: YAZDANI REZA

Принадлежит:

Подробнее

Номер записи: 6

28-08-2015 дата публикации

Номер: EP2656204A2

Автор: LIN, Jin, RAVI, Nishkam, TIAN, Xinmin, NG, John L., VALIULLIN, Renat V.

Принадлежит:

Подробнее

Номер записи: 20

16-04-2014 дата публикации

Interleaving data accesses issued in response to vector access instructions

Номер: GB0201403770D0

Автор:

Принадлежит:

Подробнее

Номер записи: 21

07-02-2014 дата публикации

Номер: GB0002367406B

Автор: TOPHAM NIGEL PETER, NIGEL PETER * TOPHAM

Принадлежит: SIROYAN LTD, * SIROYAN LIMITED

Подробнее

Номер записи: 32

18-08-2004 дата публикации

Scheduling of consumer and producer instructions in a processor with parallel execution units.

Номер: GB2398411A

Автор: Fernandes,Marcio Merino, Livesley,Raymond Malcolm, FERNANDES MARCIO MERINO, LIVESLEY RAYMOND MALCOLM, MARCIO MERINO * FERNANDES, RAYMOND MALCOLM * LIVESLEY

Принадлежит:

A method of scheduling consumer instructions (c1 and c2) requiring a value produced by a producer instruction (p1) to execution units in a processor having a plurality of execution units comprises scheduling a consumer instruction in a loop kernel block before scheduling a producer instruction using a compiler. In operation the consumer instruction is allocated to a first execution unit before the producer instruction is allocated to a second execution unit. The scheduling of the producer instruction also requires the creation of a move instruction (mv) to create an availability chain such that a value is moved from a first point accessible by the basic block to a second point accessible by the loop block. The point may be a register file accessible by one of the execution units. Preferably an interface block is used to create an interface between the basic and loop kernel block. Within the interface block a dummy instruction may be created by the scheduling of the consumer instruction ...

Подробнее

Номер записи: 33

15-02-2023 дата публикации

Techniques for parallel execution

Номер: GB0002609700A

Автор: JUSTIN WANG [US], DZ-CHING JU [US]

Принадлежит:

Identification of instructions for advanced execution, the instructions that have been identified by a compiler to be speculatively performed in parallel. The instructions may be identified based on copy operations and the instructions may be performed in response to receiving a command from another processor. The command may be a kernel launch command from a host computer system. The instructions may implement a portion of an inferencing operation using a recurrent neural network.

Подробнее

Номер записи: 34

01-11-2018 дата публикации

Method and system for automated improvement of parallelism in program compilation

Номер: AU2013290313B2

Автор: CRAYMER LORING, Craymer, Loring

Принадлежит: Phillips Ormonde Fitzpatrick

A method of program compilation to improve parallelism during the linking of the program by a compiler. The method includes converting statements of the program to canonical form, constructing abstract system tree (AST) for each procedure in the program, and traversing the program to construct a graph by making each non-control flow statement and each control structure into at least one node of the graph.

Подробнее

Номер записи: 35

09-11-2010 дата публикации

GENERAL PURPOSE SOFTWARE PARALLEL TASK ENGINE

Номер: CA0002638453C

Автор: CAPENS, NICOLAS, JOHNSON, LUTHER, STATE, GAVRIEL, STATE GAVRIEL, CAPENS NICOLAS, JOHNSON LUTHER

Принадлежит: TRANSGAMING INC., TRANSGAMING TECHNOLOGIES INC, TRANSGAMING TECHNOLOGIES INC.

A software engine for decomposing work to be done into tasks, and distributing the tasks to multiple, independent CPUs for execution is described. The engine utilizes dynamic code generation, with run-time specialization of variables, to achieve high performance. Problems are decomposed according to methods that enhance parallel CPU operation, and provide better opportunities for specialization and optimization of dynamically generated code. A specific application of this engine, a software three dimensional (3D) graphical image renderer, is described.

Подробнее

Номер записи: 36

22-04-2015 дата публикации

CODE VERSIONING FOR ENABLING TRANSACTIONAL MEMORY REGION PROMOTION

Номер: CA0002830605A1

Автор: WANG KAI-TING AMY, GAO YAOQING, BOETTIGER HANS, OHMACHT MARTIN, WANG, KAI-TING AMY, GAO, YAOQING, BOETTIGER, HANS, OHMACHT, MARTIN

Принадлежит: BERESKIN & PARR LLP/S.E.N.C.R.L.,S.R.L.

An illustrative embodiment of a computer-implemented process for a computer-implemented process for code versioning for enabling transactional memory region promotion receives a portion of candidate source code and outlines the portion of candidate source code received for parallel execution. The computer-implemented process further wraps a critical region with entry and exit routines to enter into a speculation sub-process, wherein the entry and exit routines also gather conflict statistics at runtime. The outlined code portion is executed to determine to use a particular one of multiple loop versions according to the conflict statistics gathered at run time.

Подробнее

Номер записи: 37

17-08-2016 дата публикации

Номер: DE0059004631D1

Автор: ROESSIG STEPHAN, ROESSIG, STEPHAN, D-2900 OLDENBURG, DE

Принадлежит: SIEMENS AG, SIEMENS AG, 80333 MUENCHEN, DE

Подробнее

Номер записи: 47

27-04-2005 дата публикации

Processors and compiling methods for processors

Номер: GB0002398411B

Автор: FERNANDES MARCIO MERINO, LIVESLEY RAYMOND MALCOLM, MARCIO MERINO * FERNANDES, RAYMOND MALCOLM * LIVESLEY

Принадлежит: PTS CORP, * PTS CORPORATION

Подробнее

Номер записи: 48

22-08-2000 дата публикации

A SYSTEM AND METHOD FOR OPTIMIZING PROGRAM EXECUTION IN A COMPUTER SYSTEM

Номер: CA0002262277A1

Автор: TANDRI, Sudarsan, TANDRI SUDARSAN, TANDRI, SUDARSAN

Принадлежит:

A method, computer system and article of manufacture for optimizing a computer program, the method comprising the steps of executing an application program and profiling selected loops of the executing program. Characteristics of the profiled loops are then compared to corresponding predetermined threshold values and the results of the comparison are used to select an optimization to be applied to subsequent execution of the selected loops. In a preferred embodiment, the optimization is the selection of either a parallel version or a serial version of the loop. Further embodiments provide for the selection of the number of processors for parallel implemented loops and for the selection of an unroll factor in serially implemented loops.

Подробнее

Номер записи: 49

18-01-2019 дата публикации

Номер: GB0002548602B

Автор: THOMAS CHRISTOPHER GROCUTT, Thomas Christopher Grocutt

Принадлежит: ADVANCED RISC MACH LTD, ARM Limited

Подробнее

Номер записи: 55

09-02-2005 дата публикации

Processors and Compiling methods for Processors

Номер: GB0002398412B

Автор: FERNANDES MARCIO MERINO, LIVESLEY RAYMOND MALCOLM, MARCIO MERINO * FERNANDES, RAYMOND MALCOLM * LIVESLEY

Принадлежит: PTS CORP, * PTS CORPORATION

Подробнее

Номер записи: 56

05-08-2013 дата публикации

SYSTEMS, METHODS, AND APPARATUSES TO DECOMPOSE A SEQUENTIAL PROGRAM INTO MULTIPLE THREADS, EXECUTE SAID THREADS, AND RECONSTRUCT THE SEQUENTIAL EXECUTION

Номер: KR0101292439B1

Автор: 라울 마르티네즈, 안토니오 곤잘레즈, 알레잔드로 마르티네즈, 엔릭 기버트, 조셉 엠 코디나, 카를로스 마드릴레스, 페드로 로페즈, 페르난도 라토레

Принадлежит: 인텔 코포레이션

순차적 프로그램을 다수의 스레드로 분해하고, 이러한 스레드를 실행하며, 스레드의 순차적 실행을 재구성하는 시스템, 방법 및 장치가 설명된다. 복수의 데이터 캐시 유닛(DCU)은 추론적으로 실행된 스레드의 국부적으로 퇴거된 인스트럭션을 저장한다. 병합 레벨 캐시(MLC)는 DCU의 라인으로부터의 데이터를 병합한다. 코어간 메모리 일관성 모듈(ICMC)은 MLC에서 추론적으로 실행된 스레드의 인스트럭션을 전역적으로 퇴거시킨다. Systems, methods, and apparatus are described that decompose sequential programs into multiple threads, execute such threads, and reconstruct sequential execution of threads. The plurality of data cache units (DCUs) store locally retired instructions of speculatively executed threads. The merge level cache (MLC) merges data from the lines of the DCU. The intercore memory coherency module (ICMC) retires globally the instructions of a thread speculatively executed in MLC.

Подробнее

Номер записи: 57

17-05-2016 дата публикации

Номер: EP2872989B1

Автор: Craymer, Loring

Принадлежит: Craymer, Loring

Подробнее

Номер записи: 64

02-09-2020 дата публикации

Interleaving data accesses issued in response to vector access instructions

Номер: GB0002508751B

Автор: ALASTAIR DAVID REID, Alastair David Reid

Принадлежит: ADVANCED RISC MACH LTD, ARM Limited

Подробнее

Номер записи: 65

09-08-2000 дата публикации

Номер: US0008806466B2

Автор: Akira Tanaka, Hiroyuki Morishita, Akihiko Inoue

Принадлежит: Panasonic Corporation

A program generation apparatus references a source program including a loop for executing a block N times (N≧2) and having such dependence that a variable defined in a statement in the block pertaining to ith execution (1≦i Подробнее

Номер записи: 71

30-04-2020 дата публикации

AUTOMATIC GENERATION OF MULTI-SOURCE BREADTH-FIRST SEARCH FROM HIGH-LEVEL GRAPH LANGUAGE FOR DISTRIBUTED GRAPH PROCESSING SYSTEMS

Номер: US20200133663A1

Автор: Martijn Dwars, Martin Sevenich, Sungpack Hong, Hassan Chafi, DWARS MARTIJN, SEVENICH MARTIN, HONG SUNGPACK, CHAFI HASSAN, Dwars, Martijn, Sevenich, Martin, Hong, Sungpack, Chafi, Hassan

Принадлежит:

Techniques are described herein for automatic generation of multi-source breadth-first search (MS-BFS) from high-level graph processing language that can be executed in a distributed computing environment. In an embodiment, a method involves a computer analyzing original software instructions. The original software instructions are configured to perform multiple breadth-first searches to determine a particular result. Each breadth-first search originates at each of a subset of vertices of a graph. Each breadth-first search is encoded for independent execution. Based on the analyzing, the computer generates transformed software instructions configured to perform a MS-BFS to determine the particular result. Each of the subset of vertices is a source of the MS-BFS. In an embodiment, the second plurality of software instructions comprises a node iteration loop and a neighbor iteration loop, and the plurality of vertices of the distributed graph comprise active vertices and neighbor vertices. The node iteration loop is configured to iterate once per each active vertex of the plurality of vertices of the distributed graph, and the node iteration loop is configured to determine the particular result. The neighbor iteration loop is configured to iterate once per each active vertex of the plurality of vertices of the distributed graph, and each iteration of the neighbor iteration loop is configured to activate one or more neighbor vertices of the plurality of vertices for the following iteration of the neighbor iteration loop. 1. A method comprising:analyzing a first plurality of software instructions, wherein the first plurality of software instructions is configured to perform a plurality of breadth-first searches to determine a particular result, wherein each breadth-first search originates at each of a plurality of vertices of a distributed graph, wherein each breadth-first search is encoded for independent execution;based on said analyzing, generating a second plurality ...

Подробнее

Номер записи: 72

09-01-2001 дата публикации

Method of compiling a loop

Номер: US0006173443B1

Автор: Akiyoshi Wakatani

Принадлежит: Matsushita Electric Industrial Co Ltd

In a method of compiling, the contents of registers corresponding to data arrays having the same array names but having different indexes in sequence with the progress of a loop prior to loop return are moved, and only that having the smallest index among those which should be stored is stored. In this manner, the number of Load/Stores is reduced. Moreover, by unrolling loops, register moves may be omitted. Thus, by the application of the method of register allocation and changing the method of register allocation, execution of loops containing calculations of data arrays is speeded up by the extent of unnecessary memory accesses which have been eliminated.

Подробнее

Номер записи: 73

19-12-2001 дата публикации

METHOD OF UPDATING PROGRAM AND COMMUNICATION TERMINAL

Номер: EP0001164471A2

Автор: Topham, Nigel Peter

Принадлежит:

A processor, operable to execute instructions on a predicated basis, includes a series of predicate registers (135), a control information holding unit (131) and a plurality of operating units (133). Each predicate register of the series (135) is switchable between at least respective first and second states and each is assignable to one or more predicated-execution instructions. The control information holding unit (131) holds items of control information which correspond respectively to the predicate registers, and each operating unit also corresponds individually to one of the predicate registers. Each operating unit has a first control input connected to the control information holding unit (131) for receiving the control-information item corresponding to its unit's own corresponding predicate register and also has a second control input connected for receiving the control-information item corresponding to a further one of the predicate registers. Each operating unit is operable to ...

Подробнее

Номер записи: 74

09-03-2016 дата публикации

Номер: CN104572260A

Автор: BOETTIGER HANS, GAO YAOQING, HANS BOETTIGER, MARTIN OHMACHT

Принадлежит:

Подробнее

Номер записи: 78

06-10-2015 дата публикации

병렬 파이프라인에서의 분기에 대한 하드웨어 및 소프트웨어 해법

Номер: KR1020150112017A

Автор: 야즈다니 레자

Принадлежит:

... 프로세서 내의 하드웨어 병렬 실행 레인에서 명령어의 효율적 처리를 위한 시스템 및 방법이 개시된다. 식별되는 루프 내 주어진 분기점에 응답하여, 컴파일러는, 식별되는 루프 내 명령어들을 VLIW(very large instruction world)로 배열한다. 적어도 하나의 VLIW는 주어진 분기점과, 대응하는 집중점 사이에서 서로 다른 기본 블록으로부터 섞인 명령어들을 가리킨다. 컴파일러는 실행될 때, 주어진 VLIW 내의 명령어를 런타임 시에 목표 프로세서 내의 복수의 병렬 실행 레인에 할당하는 코드를 발생시킨다. 목표 프로세서는 SIMD(single instruction multiple word) 마이크로구조를 포함한다. 주어진 레인에 대한 할당은 주어진 분기점에서 주어진 레인에 대해 런타임 시에 발견되는 분기 방향에 기초한다. 목표 프로세서는 연관된 레인이 실행할, 인출되는 VLIW 내의 주어진 명령어를 표시하는 표시사항을 저장하기 위한 벡터 레지스터를 포함한다.

Подробнее

Номер записи: 79

12-11-2019 дата публикации

Номер: DE0068923666T2

Автор: HARRIS KEVIN W, NOYCE WILLIAM B, HARRIS, KEVIN W., NASHUA NEW HAMPSHIRE, US, NOYCE, WILLIAM B., HOLLIS NEW HAMPSHIRE, US

Принадлежит: DIGITAL EQUIPMENT CORP, DIGITAL EQUIPMENT CORP., MAYNARD, MASS., US

Подробнее

Номер записи: 87

24-03-2004 дата публикации

Processors and compiling methods for processors

Номер: GB0000403623D0

Автор: [UNK]

Принадлежит: PTS Corp

Подробнее

Номер записи: 88

05-12-2001 дата публикации

Processors and compiling methods for processors

Номер: GB0000124553D0

Автор:

Принадлежит:

Подробнее

Номер записи: 89

20-09-2007 дата публикации

Номер: EP2680132B1

Автор: Andrew Higham, Michael Perkins

Принадлежит: Analog Devices Inc

Подробнее

Номер записи: 159

04-05-1999 дата публикации

Method and system for optimizing code

Номер: US5901318A

Автор: Wei Chung Hsu

Принадлежит: Hewlett Packard Co

An optimizing compiler for optimizing code in a computer system having a CPU and a memory. The code has a loop wherein the loop includes statements conditionally executed depending on the evaluation of a control flow statement. The inventive compiler separates the code into a index collection phase and an execution phase. The index collection phase collects array indices indicating whether the control flow statement evaluates true for each particular loop iteration. The execution phase builds self loops without conditional statements. The self loops use the array indices to execute only the loop instructions that should be executed. Since those instruction are predetermined by the index collection phase, performance enhancement features of the CPU, such as branch prediction, pipelining, and a superscalar architecture can be fully exploited.

Подробнее

Номер записи: 160

07-08-2001 дата публикации

Method and apparatus for finding loop— lever parallelism in a pointer based application

Номер: US6272676B1

Автор: Milind Girkar, Mohammad R. Haghighat

Принадлежит: Intel Corp

A method and apparatus for finding loop_level parallelism in a sequence of instructions. In one embodiment, the method includes the steps of determining if a variable which identifies a memory address of a data structure is an induction variable; and determining if execution of the sequence of instructions terminates in response to a comparison of the variable to an invariant value. If the two conditions of the present invention are found to be true, the respective sequence of instructions is a candidate to be flagged for multi-threading execution, assuming the loop of the instructions terminates.

Подробнее

Номер записи: 161

23-02-2016 дата публикации

Methods and systems to vectorize scalar computer program loops having loop-carried dependences

Номер: US9268541B2

Автор: Albert Hartono, Jayashankar Bharadwaj, Nalini Vasudevan, Sara S. Baghsorkhi

Принадлежит: Intel Corp

Methods and systems to convert scalar computer program loops having loop carried dependences to vector computer program loops are disclosed. One example method and system generates a first predicate set associated with a first conditionally executed statement. The first predicate set contains a first set of predicates that cause a variable to be defined in a scalar computer program loop at or before the variable is defined by the first conditionally executed statement. The method and system also generates a second predicate set associated with the first conditionally executed statement. The second predicate set contains a second set of predicates that cause the variable to be used in the scalar computer program loop at or before the variable is defined by the first conditionally executed statement. The method and system determines whether the second predicate set is a subset of the first predicate set and, based on the determination, propagates a vector value in an element of a vector of the variable to a subsequent element of the vector.

Подробнее

Номер записи: 162

19-09-2002 дата публикации

Hardware supported software pipelined loop prologue optimization

Номер: US20020133813A1

Автор: Alexander Ostanevich

Принадлежит: Elbrus International

A method for optimizing a software pipelineable loop in a software code is provided. The loop comprises one or more pipelined stages and one or more loop operations. The method comprises evaluating an initiation interval time (IN) for a pipelined stage of the loop. A loop operation time latency (Tld) and a number of loop operations (Np) from the pipelined stages to peel based on IN and Tld is then determined. The loop operation is peeled Np times and copied before the loop in the software code. A vector of registers is allocated and the results of the peeled loop operations and a result of an original loop operation is assigned to the vector of registers. Memory addresses for the results of the peeled loop operations and original loop operation are also assigned.

Подробнее

Номер записи: 163

16-07-2014 дата публикации

Parallelizing sequential frameworks using transactions

Номер: CN101681272B

Автор: J·J·达菲, J·格雷, Y·莱瓦诺尼

Принадлежит: Microsoft Corp

公开了用于将顺序循环转换成并行循环以与事务存储器系统一起使用的各种技术和方法。提供了一种事务存储器系统。将包含原始顺序循环的第一部分代码转换成包含使用事务来保留原始的输入到输出映射的并行循环的第二部分代码。例如，可以通过取原始顺序循环的每一迭代并生成遵循预定提交次序过程的单独事务来将原始顺序循环转换成并行循环。各单独事务中的至少某一些在不同的线程中执行。当在执行并行循环时检测到在特定事务中发生了未经处理的异常时，则提交该特定事务和前导事务所作出的状态修改并丢弃后续事务所作出的状态修改。

Подробнее

Номер записи: 164

11-12-2016 дата публикации

Номер: JP6665720B2

Автор: 優太向井

Принадлежит: Fujitsu Ltd

Подробнее

Номер записи: 172

05-09-2007 дата публикации

Compiling method, compiling apparatus and computer system for a loop in a program

Номер: EP1828889A1

Автор: Fan Wu, Yanmeng Sun

Принадлежит: KONINKLIJKE PHILIPS ELECTRONICS NV

A method for compiling a program including a loop is provided. In the program, the loop includes K instructions (K>2) and repeats for M times (M>2). The compiling method comprises following steps: performing resource conflict analysis to the K instructions in the loop; dividing the K instructions in the loop into a first combined instruction section, a connection instruction section and a second combined instruction section, wherein there is no resource conflict between the instructions in the first combined instruction section and the instructions in the second combined instruction section respectively; and compiling the program, wherein the instructions in the first combined instruction section in the cycle N (N=2, 3, ...M) and the instructions in the second combined instruction section in the cycle N-I are combined to be compiled respectively. A compiling apparatus and a computer system for realizing the above-mentioned compiling method are further provided.

Подробнее

Номер записи: 173

11-05-2015 дата публикации

Data processing apparatus, data processing system, data structure, recording medium, storage device, and data processing method

Номер: JPWO2013118754A1

Автор: 武者野　満, 満武者野

Принадлежит: 株式会社Ｍｕｓｈ−Ａ

本発明は、ループ処理においてボトルネックを解消し、高速に並列処理することを目的とし、複数の処理部は、拡張識別情報の少なくとも一部に基づいて算出される宛先情報が当該処理部を示すパケットのみを取得する入出力部と、入出力部によって取得されたパケットの処理命令のうち最初に実行されるべき処理命令を実行し、当該実行によって生成されるデータに、実行された処理命令の次に実行されるべき処理命令を最初に実行されるべき処理命令とする拡張識別情報が付加されたパケットを生成して入出力部に入力する演算部と、最初に実行されるべき処理命令が複数のパケットからなるパケット群を生成する処理命令である場合に、パケット群を生成するためのテンプレート情報が登録されるテンプレート記憶部と、テンプレート記憶部に登録されているテンプレート情報に基づいてパケット群を生成して入出力部に入力するパケット生成部と、をそれぞれ有する。 An object of the present invention is to eliminate bottlenecks in loop processing and perform parallel processing at high speed, and a plurality of processing units indicate destination information calculated based on at least part of extended identification information. An input / output unit that acquires only a packet and a processing instruction to be executed first among the processing instructions of the packet acquired by the input / output unit are executed, and the executed processing instruction is added to the data generated by the execution. An operation unit that generates a packet with extended identification information added to a processing instruction to be executed first as a processing instruction to be executed first and inputs the packet to an input / output unit; and a processing instruction to be executed first A template storage unit in which template information for generating a packet group is registered in the case of a processing instruction for generating a packet group including a plurality of packets; and a template It has a packet generator for inputting to the input-output unit to generate a packet group based on the template information registered in 憶部, respectively.

Подробнее

Номер записи: 174

26-11-2014 дата публикации

Method for optimising the parallel processing of data on a hardware platform

Номер: EP2805234A1

Автор: Eric Lenormand, Michel Barreteau, Paul Brelet, Remi Barrere

Принадлежит: Thales SA

The invention relates to a method for optimising the parallel processing of data on a hardware platform comprising at least one calculation unit comprising a plurality of processing units capable of executing a plurality of executable tasks in parallel, wherein all the data to be processed is broken down into subsets of data, a same sequence of operations being carried out on each subset of data. The method of the invention comprises obtaining (50, 52) the maximum number of subsets of data to be processed by a same sequence of operations, and a maximum number of tasks that can be executed in parallel by a calculation unit of the hardware platform, determining (54) at least two processing partitions, each of said processing partitions corresponding to the partition of all the data into a number of data groups, and to the assignment of at least one executable task, capable of executing said sequence of operations, to each subset of data from said data group, and selecting (60, 62) the processing partition that makes it possible to obtain an optimal measurement value depending on a predetermined criterion. Programming code instructions implementing said selected processing partition are then obtained. One use of the method of the invention is the selection of an optimal hardware platform according to a measurement of execution performance.

Подробнее

Номер записи: 175

22-02-2017 дата публикации

Extracting system architecture in high level synthesis

Номер: CN106462431A

Автор: S·A·诺伊恩多夫, 韩国凌

Принадлежит: Xilinx Inc

在高级综合中提取系统架构包括确定高级编程语言描述的第一函数和高级编程描述的控制流构造中包含的第二函数(210、215、220)。第二函数被确定为第一函数的数据消费函数(225)。在电路设计中，自动生成包括本地存储器的端口(240)。该端口在电路设计中将实施第一函数的第一电路模块和实施第二函数的第二电路模块相耦接。

Подробнее

Номер записи: 176

05-02-2015 дата публикации

Method and system for automated improvement of parallelism in program compilation

Номер: AU2013290313A1

Автор: Loring CRAYMER

Принадлежит: Individual

A method of program compilation to improve parallelism during the linking of the program by a compiler. The method includes converting statements of the program to canonical form, constructing abstract system tree (AST) for each procedure in the program, and traversing the program to construct a graph by making each non-control flow statement and each control structure into at least one node of the graph.

Подробнее

Номер записи: 177

22-06-2016 дата публикации

Parallelization and loop optimization method and system for a high-level language of reconfigurable processor

Номер: CN105700933A

Автор: 何卫锋, 田丰硕, 绳伟光

Принадлежит: Shanghai Jiaotong University

本发明提供了一种可重构处理器的高级语言的并行化和循环优化方法与系统，针对通用可重构处理器提出了一套端对端的语言转化系统，对于可重构处理器，计算密集型应用中的核心循环需要通过可重构部分进行并行计算，使得C语言不能满足他的并行特性，所以需要将应用程序中的串行部分和并行部分分别封装，并且根据系统特性来进行优化，最终生成一套新型的语言；在确定kernel函数的输入输出的数据类型和长度时，采用了编写decls.h的方法，简化了系统的复杂程度，并且使得系统的适用性大为提高；在进行循环优化的过程中，利用了多面体模型进行处理，使得系统适用性更加广泛，系统在不同的架构上的移植更加简单。

Подробнее

Номер записи: 178

23-08-2018 дата публикации

Information processing device, information processing method, and information processing program

Номер: WO2018150588A1

Автор: 友美竹内, 吉大小川, 峯岸　孝行, 弘樹村野

Принадлежит: 三菱電機株式会社

A process dividing unit (130) extracts each of the one or more loop processes included in a functional model (210). A parameter extraction unit (140) determines characteristics of each extracted loop process. On the basis of the characteristics of each loop process and on the basis of a computational resource architecture for implementing the functional model (210), a performance calculation basic formula selection unit (150) selects, from among a plurality of processing time calculation procedures for calculating processing time, a processing time calculation procedure for calculating the processing time required for each loop process. A performance estimation unit (160) calculates the processing time required for each loop process using the processing time calculation procedure selected for the loop process by the performance calculation basic formula selection unit (150).

Подробнее

Номер записи: 179

16-03-2018 дата публикации

The reuse of the instruction of decoding

Номер: CN107810477A

Автор: A·史密斯, D·C·巴格

Принадлежит: Microsoft Technology Licensing LLC

公开了用于重复使用基于块的处理器架构中的提取的和解码的指令的系统和方法。在所公开的技术的一个示例中，一种系统包括多个基于块的处理器核心和指令调度器。相应的核心能够运行程序的一个或多个指令块。指令调度器能够被配置为标识驻留在处理器核心中的第一处理器核心上并且要被再次运行的程序的给定指令块。指令调度器能够被配置为在运行中调整指令块的映射，使得在没有重新提取给定指令块的情况下，给定指令块被重新运行在第一处理器核心上。

Подробнее

Номер записи: 180

22-06-1999 дата публикации

Array summary analyzing method for loop containing skip-out sentence

Номер: JPH11167492A

Автор: Takayoshi Iizuka, 孝好飯塚

Принадлежит: HITACHI LTD

(57)【要約】【課題】ソースプログラムから目的プログラムを生成する言語処理系において、ループ飛び出し文を含むループに対して配列サマリ解析の精度を向上し、配列プライベート化の適用性を向上する。【解決手段】ループ飛び出し文およびループ飛び出し時のループ制御変数の値をスカラ変数に設定する文がループ内に含まれる場合には、ループの本体の配列サマリにおいてループ制御変数の上限をこのスカラ変数で置き換え、この結果から変数消去法でループ制御変数を消去した結果をループの配列サマリとすることにより、近似無しで配列サマリを計算する。【効果】ループ飛び出し時のループ制御変数の値をスカラ変数に設定する文を含むループに対して、配列プライベート化の適用性が向上する。

Подробнее

Номер записи: 181

10-02-2023 дата публикации

Номер: JPWO2021156955A1

Автор: [UNK]

Принадлежит: [UNK]

Подробнее

Номер записи: 186

08-05-2002 дата публикации

Predicated execution of instructions in processors

Номер: GB2363480B

Автор: Nigel Peter Topham

Принадлежит: Siroyan Ltd

Подробнее

Номер записи: 187

16-02-2016 дата публикации

Efficient implementation of RSA using GPU/CPU architecture

Номер: US9262166B2

Автор: Biju George, Ken Lueh, Xiaozhu Kang

Принадлежит: Intel Corp

Various embodiments are directed to a heterogeneous processor architecture comprised of a CPU and a GPU on the same processor die. The heterogeneous processor architecture may optimize source code in a GPU compiler using vector strip mining to reduce instructions of arbitrary vector lengths into GPU supported vector lengths and loop peeling. It may be first determined that the source code is eligible for optimization if more than one machine code instruction of compiled source code under-utilizes GPU instruction bandwidth limitations. The initial vector strip mining results may be discarded and the first iteration of the inner loop body may be peeled out of the loop. The type of operands in the source code may be lowered and the peeled out inner loop body of source code may be vector strip mined again to obtain optimized source code.

Подробнее

Номер записи: 188

Настройки

Небесная энциклопедия

Мониторинг СМИ

Форма поиска