OpenCSD - CoreSight Trace Decode Library  1.3.3
/usr/src/packages/BUILD/decoder/tests/auto-fdo/autofdo.md
Go to the documentation of this file.
1 AutoFDO and ARM Trace {#AutoFDO}
2 =====================
3 
4 @brief Using CoreSight trace and perf with OpenCSD for AutoFDO.
5 
6 ## Introduction
7 
8 Feedback directed optimization (FDO, also know as profile guided
9 optimization - PGO) uses a profile of a program's execution to guide the
10 optmizations performed by the compiler. Traditionally, this involves
11 building an instrumented version of the program, which records a profile of
12 execution as it runs. The instrumentation adds significant runtime
13 overhead, possibly changing the behaviour of the program and it may not be
14 possible to run the instrumented program in a production environment
15 (e.g. where performance criteria must be met).
16 
17 AutoFDO uses facilities in the hardware to sample the behaviour of the
18 program in the production environment and generate the execution profile.
19 An improved profile can be obtained by including the branch history
20 (i.e. a record of the last branches taken) when generating an instruction
21 samples. On Arm systems, the ETM can be used to generate such records.
22 
23 The process can be broken down into the following steps:
24 
25 * Record execution trace of the program
26 * Convert the execution trace to instruction samples with branch histories
27 * Convert the instruction samples to source level profiles
28 * Use the source level profile with the compiler
29 
30 This article describes how to enable ETM trace on Arm targets running Linux
31 and use the ETM trace to generate AutoFDO profiles and compile an optimized
32 program.
33 
34 
35 ## Execution trace on Arm targets
36 
37 Debug and trace of Arm targets is provided by CoreSight. This consists of
38 a set of components that allow access to debug logic, record (trace) the
39 execution of a processor and route this data through the system, collecting
40 it into a store.
41 
42 To record the execution of a processor, we require the following
43 components:
44 
45 * A trace source. The core contains a trace unit, called an ETM that emits
46  data describing the instructions executed by the core.
47 * Trace links. The trace data generated by the ETM must be moved through
48  the system to the component that collects the data (sink). Links
49  include:
50  * Funnels: merge multiple streams of data
51  * FIFOs: buffer data to smooth out bursts
52  * Replicators: send a stream of data to multiple components
53 * Sinks. These receive the trace data and store it or send it to an
54  external device:
55  * ETB: A small circular buffer (64-128 kilobytes) that stores the most
56  recent data
57  * ETR: A larger (several megabytes) buffer that uses system RAM to
58  store data
59  * TPIU: Sends data to an off-chip capture device (e.g. Arm DSTREAM)
60 
61 Each Arm SoC design may have a different layout (topology) of components.
62 This topology is described to the OS drivers by the platform's devicetree
63 or (in future) ACPI firmware.
64 
65 For application profiling, we need to store several megabytes of data
66 within the system, so will use ETR with the capture tool (perf)
67 periodically draining the buffer to a file.
68 
69 Even though we have a large capture buffer, the ETM can still generate a
70 lot of data very quickly - typically an ETM will generate ~1 bit of data
71 per instruction (depending on the workload), which results in 256Mbytes per
72 second for a core running at 2GHz. This leads to problems storing and
73 decoding such large volumes of data. AutoFDO uses samples of program
74 execution, so we can avoid this problem by using the ETM's features to
75 only record small slices of execution - e.g. collect ~5000 cycles of data
76 every 50M cycles. This reduces the data rate to a manageable level - a few
77 megabytes per minute. This technique is known as 'strobing'.
78 
79 
80 ## Enabling trace
81 
82 ### Driver support
83 
84 To collect ETM trace, the CoreSight drivers must be included in the
85 kernel. Some of the driver support is not yet included in the mainline
86 kernel and many targets are using older kernels. To enable CoreSight trace
87 on these targets, Arm have provided backports of the latest CoreSight
88 drivers and ETM strobing patch at:
89 
90  <https://gitlab.arm.com/linux-arm/linux-coresight-backports>
91 
92 This repository can be cloned with:
93 
94 ```
95 git clone https://git.gitlab.arm.com/linux-arm/linux-coresight-backports.git
96 ```
97 
98 You can include these backports in your kernel by either merging the
99 appropriate branch using git or generating patches (using `git
100 format-patch`).
101 
102 For 5.x based kernel onwards, the only patch which needs to be applied is the one enabling strobing - etm4x: `Enable strobing of ETM`.
103 
104 For 4.9 based kernels, use the `coresight-4.9-etr-etm_strobe` branch:
105 
106 ```
107 git merge coresight-4.9-etr-etm_strobe
108 ```
109 
110 or
111 
112 ```
113 git format-patch --output-directory /output/dir v4.9..coresight-4.9-etr-etm_strobe
114 cd my_kernel
115 git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git
116 ```
117 
118 For 4.14 based kernels, use the `coresight-4.14-etm_strobe` branch:
119 
120 ```
121 git merge coresight-4.14-etm_strobe
122 ```
123 
124 or
125 
126 ```
127 git format-patch --output-directory /output/dir v4.14..coresight-4.14-etm_strobe
128 cd my_kernel
129 git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git
130 ```
131 
132 The CoreSight trace drivers must also be enabled in the kernel
133 configuration. This can be done using the configuration menu (`make
134 menuconfig`), selecting `Kernel hacking` / `arm64 Debugging` /`CoreSight Tracing Support` and
135 enabling all options, or by setting the following in the configuration
136 file:
137 
138 ```
139 CONFIG_CORESIGHT=y
140 CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
141 CONFIG_CORESIGHT_SINK_TPIU=y
142 CONFIG_CORESIGHT_SOURCE_ETM4X=y
143 CONFIG_CORESIGHT_DYNAMIC_REPLICATOR=y
144 CONFIG_CORESIGHT_STM=y
145 CONFIG_CORESIGHT_CATU=y
146 ```
147 
148 Compile the kernel for your target in the usual way, e.g.
149 
150 ```
151 make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
152 ```
153 
154 Each target may have a different layout of CoreSight components. To
155 collect trace into a sink, the kernel drivers need to know which other
156 devices need to be configured to route data from the source to the sink.
157 This is described in the devicetree (and in future, the ACPI tables). The
158 device tree will define which CoreSight devices are present in the system,
159 where they are located and how they are connected together. The devicetree
160 for some platforms includes a description of the platform's CoreSight
161 components, but in other cases you may have to ask the platform/SoC vendor
162 to supply it or create it yourself (see Appendix: Describing CoreSight in
163 Devicetree).
164 
165 Once the target has been booted with the devicetree describing the
166 CoreSight devices, you should find the devices in sysfs:
167 
168 ```
169 # ls /sys/bus/coresight/devices/
170 etm0 etm2 etm4 etm6 funnel0 funnel2 funnel4 stm0 tmc_etr0
171 etm1 etm3 etm5 etm7 funnel1 funnel3 replicator0 tmc_etf0
172 ```
173 
174 The naming convention for etm devices can be different according to the kernel version you're using.
175 For more information about the naming scheme, please check out the [Linux Kernel Documentation](https://www.kernel.org/doc/html/latest/trace/coresight/coresight.html#device-naming-scheme)
176 
177 If `/sys/bus/coresight/devices/` is empty, you may want to check out your Kernel configuration to make sure your .config file is including CoreSight dependencies, such as the clock.
178 
179 ### Perf tools
180 
181 The perf tool is used to capture execution trace, configuring the trace
182 sources to generate trace, routing the data to the sink and collecting the
183 data from the sink.
184 
185 Arm recommends to use the perf version corresponding to the kernel running
186 on the target. This can be built from the same kernel sources with
187 
188 ```
189 make -C tools/perf CORESIGHT=1 VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
190 ```
191 
192 When specifying CORESIGHT=1, perf will be built using the installed OpenCSD library.
193 If you are cross compiling, then additional setup is required to ensure the build process links against the correct version of the library.
194 
195 If the post-processing (`perf inject`) of the captured data is not being
196 done on the target, then the OpenCSD library is not required for this build
197 of perf.
198 
199 Trace is captured by collecting the `cs_etm` event from perf. The sink
200 to collect data into is specified as a parameter of this event. Trace can
201 also be restricted to user space or kernel space with 'u' or 'k'
202 parameters. For example:
203 
204 ```
205 perf record -e cs_etm/@tmc_etr0/u --per-thread -- /bin/ls
206 ```
207 
208 Will record the userspace execution of '/bin/ls' using tmc_etr0 as sink.
209 
210 ## Capturing modes
211 
212 You can trace a single-threaded program in two different ways:
213 
214 1. By specifying `--per-thread`, and in this case the CoreSight subsystem will
215 record only a trace relative to the given program.
216 
217 2. By NOT specifying `--per-thread`, and in this case CPU-wide tracing will
218 be enabled. In this scenario the trace will contain both the target program trace
219 and other workloads that were executing on the same CPU
220 
221 
222 
223 ## Processing trace and profiles
224 
225 perf is also used to convert the execution trace an instruction profile.
226 This requires a different build of perf, using the version of perf from
227 Linux v4.17 or later, as the trace processing code isn't included in the
228 driver backports. Trace decode is provided by the OpenCSD library
229 (<https://github.com/Linaro/OpenCSD>), v0.9.1 or later. This is packaged
230 for debian testing (install the libopencsd0, libopencsd-dev packages) or
231 can be compiled from source and installed.
232 
233 The autoFDO tool <https://github.com/google/autofdo> is used to convert the
234 instruction profiles to source profiles for the GCC and clang/llvm
235 compilers.
236 
237 
238 ## Recording and profiling
239 
240 Once trace collection using perf is working, we can now use it to profile
241 an application.
242 
243 The application must be compiled to include sufficient debug information to
244 map instructions back to source lines. For GCC, use the `-g1` or `-gmlt`
245 options. For clang/llvm, also add the `-fdebug-info-for-profiling` option.
246 
247 perf identifies the active program or library using the build identifier
248 stored in the elf file. This should be added at link time with the compiler
249 flag `-Wl,--build-id=sha1`.
250 
251 The next step is to record the execution trace of the application using the
252 perf tool. The ETM strobing should be configured before running the perf
253 tool. There are two parameters:
254 
255  * window size: A number of CPU cycles (W)
256  * period: Trace is enabled for W cycle every _period_ * W cycles.
257 
258 For example, a typical configuration is to use a window size of 5000 cycles
259 and a period of 10000 - this will collect 5000 cycles of trace every 50M
260 cycles. With these proof-of-concept patches, the strobe parameters are
261 configured via sysfs - each ETM will have `strobe_window` and
262 `strobe_period` parameters in `/sys/bus/coresight/devices/<sink>` and
263 these values will have to be written to each (In a future version, this
264 will be integrated into the drivers and perf tool).
265 The `set_strobing.sh` script in this directory [`<opencsd>/decoder/tests/auto-fdo`] automates this process.
266 
267 To collect trace from an application using ETM strobing, run:
268 
269 ```
270 sudo ./set_strobing.sh 5000 10000
271 perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>"
272 ```
273 
274 The raw trace can be examined using the `perf report` command:
275 
276 ```
277 perf report -D -i perf.data --stdio
278 ```
279 
280 Perf needs to be built from your linux kernel version souce code repository against the OpenCSD library in order to be able to properly read ETM-gathered samples and post-process them.
281 If running `perf report` produces an error like:
282 
283 ```
284 0x1f8 [0x268]: failed to process type: 70 [Operation not permitted]
285 Error:
286 failed to process sample
287 ```
288 or
289 
290 ```
291 "file uses a more recent and unsupported ABI (8 bytes extra). incompatible file format".
292 ```
293 
294 You are probably using a perf version which is not using this library: please make sure to install this project in your system by either compiling it from [Source Code]( <https://github.com/Linaro/OpenCSD>) from v0.9.1 or later and compile perf using this library.
295 Otherwise, this project is packaged for debian (install the libopencsd0, libopencsd-dev packages).
296 
297 
298 For example:
299 
300 ```
301 0x1d370 [0x30]: PERF_RECORD_AUXTRACE size: 0x2003c0 offset: 0 ref: 0x39ba881d145f8639 idx: 0 tid: 4551 cpu: -1
302 
303 . ... CoreSight ETM Trace data: size 2098112 bytes
304  Idx:0; ID:12; I_ASYNC : Alignment Synchronisation.
305  Idx:12; ID:12; I_TRACE_INFO : Trace Info.; INFO=0x0
306  Idx:17; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF000008A4991C;
307  Idx:48; ID:14; I_ASYNC : Alignment Synchronisation.
308  Idx:60; ID:14; I_TRACE_INFO : Trace Info.; INFO=0x0
309  Idx:65; ID:14; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF000008A4991C;
310  Idx:96; ID:14; I_ASYNC : Alignment Synchronisation.
311  Idx:108; ID:14; I_TRACE_INFO : Trace Info.; INFO=0x0
312  Idx:113; ID:14; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF000008A4991C;
313  Idx:122; ID:14; I_TRACE_ON : Trace On.
314  Idx:123; ID:14; I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000000000407B00; Ctxt: AArch64,EL0, NS;
315  Idx:134; ID:14; I_ATOM_F3 : Atom format 3.; ENN
316  Idx:135; ID:14; I_ATOM_F5 : Atom format 5.; NENEN
317  Idx:136; ID:14; I_ATOM_F5 : Atom format 5.; ENENE
318  Idx:137; ID:14; I_ATOM_F5 : Atom format 5.; NENEN
319  Idx:138; ID:14; I_ATOM_F3 : Atom format 3.; ENN
320  Idx:139; ID:14; I_ATOM_F3 : Atom format 3.; NNE
321  Idx:140; ID:14; I_ATOM_F1 : Atom format 1.; E
322 .....
323 ```
324 
325 The execution trace is then converted to an instruction profile using
326 the perf build with trace decode support. This may be done on a different
327 machine than that which collected the trace (e.g. when cross compiling for
328 an embedded target). The `perf inject` command
329 decodes the execution trace and generates periodic instruction samples,
330 with branch histories:
331 
332 !! Careful: if you are using a device different than the one used to collect the profiling data,
333 you'll need to run `perf buildid-cache` as described below.
334 ```
335 perf inject -i perf.data -o inj.data --itrace=i100000il
336 ```
337 
338 The `--itrace` option configures the instruction sample behaviour:
339 
340 * `i100000i` generates an instruction sample every 100000 instructions
341  (only instruction count periods are currently supported, future versions
342  may support time or cycle count periods)
343 * `l` includes the branch histories on each sample
344 * `b` generates a sample on each branch (not used here)
345 
346 Perf requires the original program binaries to decode the execution trace.
347 If running the `inject` command on a different system than the trace was
348 captured on, then the binary and any shared libraries must be added to
349 perf's cache with:
350 
351 ```
352 perf buildid-cache -a /path/to/binary_or_library
353 ```
354 
355 `perf report` can also be used to show the instruction samples:
356 
357 ```
358 perf report -D -i inj.data --stdio
359 .......
360 0x1528 [0x630]: PERF_RECORD_SAMPLE(IP, 0x2): 4551/4551: 0x434b98 period: 3093 addr: 0
361 ... branch stack: nr:64
362 ..... 0: 0000000000434b58 -> 0000000000434b68 0 cycles P 0
363 ..... 1: 0000000000436a88 -> 0000000000434b4c 0 cycles P 0
364 ..... 2: 0000000000436a64 -> 0000000000436a78 0 cycles P 0
365 ..... 3: 00000000004369d0 -> 0000000000436a60 0 cycles P 0
366 ..... 4: 000000000043693c -> 00000000004369cc 0 cycles P 0
367 ..... 5: 00000000004368a8 -> 0000000000436928 0 cycles P 0
368 ..... 6: 000000000042d070 -> 00000000004368a8 0 cycles P 0
369 ..... 7: 000000000042d108 -> 000000000042d070 0 cycles P 0
370 .......
371 ..... 57: 0000000000448ee0 -> 0000000000448f24 0 cycles P 0
372 ..... 58: 0000000000448ea4 -> 0000000000448ebc 0 cycles P 0
373 ..... 59: 0000000000448e20 -> 0000000000448e94 0 cycles P 0
374 ..... 60: 0000000000448da8 -> 0000000000448ddc 0 cycles P 0
375 ..... 61: 00000000004486f4 -> 0000000000448da8 0 cycles P 0
376 ..... 62: 00000000004480fc -> 00000000004486d4 0 cycles P 0
377 ..... 63: 0000000000448658 -> 00000000004480ec 0 cycles P 0
378  ... thread: program1:4551
379  ...... dso: /home/root/program1
380 .......
381 ```
382 
383 The instruction samples produced by `perf inject` is then passed to the
384 autofdo tool to generate source level profiles for the compiler. For
385 clang/LLVM:
386 
387 ```
388 create_llvm_prof -binary=/path/to/binary -profile=inj.data -out=program.llvmprof
389 ```
390 
391 And for GCC:
392 
393 ```
394 create_gcov -binary=/path/to/binary -profile=inj.data -gcov_version=1 -gcov=program.gcov
395 ```
396 
397 The profiles can be viewed with:
398 
399 ```
400 llvm-profdata show -sample program.llvmprof
401 ```
402 
403 Or, for GCC:
404 
405 ```
406 dump_gcov -gcov_version=1 program.gcov
407 ```
408 
409 ## Using profile in the compiler
410 
411 The profile produced by the above steps can then be passed to the compiler
412 to optimize the next build of the program.
413 
414 For GCC, use the `-fauto-profile` option:
415 
416 ```
417 gcc -O2 -fauto-profile=program.gcov -o program program.c
418 ```
419 
420 For Clang, use the `-fprofile-sample-use` option:
421 
422 ```
423 clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c
424 ```
425 
426 
427 ## Summary
428 
429 The basic commands to run an application and create a compiler profile are:
430 
431 ```
432 sudo ./set_strobing.sh 5000 10000
433 perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>"
434 perf inject -i perf.data -o inj.data --itrace=i100000il
435 create_llvm_prof -binary=/path/to/binary -profile=inj.data -out=program.llvmprof
436 clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c
437 ```
438 
439 Use `create_gcov` for gcc.
440 
441 ## High Level Summary for recoding on Arm board and decoding on different host
442 
443 1. (on Arm board)
444 
445  sudo ./set_strobing.sh 5000 10000
446  perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>.
447  If you specify `-N, --no-buildid-cache`, perf will just take care of recording the target binary and nothing will be copied.<br> If you don't specify it, any recorded dynamic library will be copied to ~/.debug in the board.
448 
449 2. (on Arm board) `perf archive` which saves all the found libraries in a tar (internally, it looks into perf.data file and performs a lookup using perf-buildid-list --with-hits)
450 3. (on host) `scp` to copy perf.data and the .tar file generated from `perf archive`.
451 4. (on host) Run `tar xvf perf_data.tar.bz2 -C ~/.debug` to populate the buildid-cache
452 5. (on host) Double check the setup is correct:
453 
454  a. `perf buildid-list -i perf.data` gives you the list of dynamic libraries buildids whose trace has been recorded and saved in perf.data.
455  b. `perf buildid-cache --list` lists the dynamic libraries in the buildid cache that will be used by `perf inject`.
456  Make sure the output of (a) and (b) overlaps as in buildid value for those binaries you are interested into optimizing with afdo.
457 
458 6. (on host) `perf inject -i perf.data -o inj.data --itrace=i100000il` will check for the dynamic libraries using the buildid inside the buildid-cache and post-process the trace.<br> buildids have to be the same, otherwise it won't be possible to post-process the trace.
459 
460 7. (on host) `create_llvm_prof -binary=/path/to/binary -profile=inj.data -out=program.llvmprof` takes the output from perf-inject and tranforms it into a format that the compiler can read.
461 8. (on host) `clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c` to make clang use the produced profile.<br>
462  If you are confident enough that your profile is accurate, you can add the `-fprofile-sample-accurate` flag, which will penalize all the callsites without corresponding profile, marking them as cold.
463 
464 If you are using the same host for both building the binary to be traced and re-building it with afdo:
465 
466 1. You won't need to copy back any dynamic libraries from the board (since you already have them), and can use `--no-buildid-cache` when recording
467 2. You have to make sure the relevant dynamic libraries to be optimized are present in the buildid-cache.
468 
469 You can easily add a dynamic library manually into the build-id cache by running:
470 
471 `perf buildid-cache --add <path/to/library/or/binary> -vvv`
472 
473 You can easily check what is currently contained in you buildid-cache by running:
474 
475 `perf buildid-cache --list`
476 
477 You can check the buildid of a given binary/dynamic library:
478 
479 `file <path/to/dynamic/library>`
480 
481 ## References
482 
483 * AutoFDO tool: <https://github.com/google/autofdo>
484 * GCC's wiki on autofdo: <https://gcc.gnu.org/wiki/AutoFDO>, <https://gcc.gnu.org/wiki/AutoFDO/Tutorial>
485 * Google paper: <https://ai.google/research/pubs/pub45290>
486 * CoreSight kernel docs: Documentation/trace/coresight.txt
487 
488 
489 ## Appendix: Describing CoreSight in Devicetree
490 
491 
492 Each component has an entry in the device tree that describes its:
493 
494 * type: The `compatible` field defines which driver to use
495 * location: A `reg` defines the component's address and size on the bus
496 * clocks: The `clocks` and `clock-names` fields state which clock provides
497  the `apb_pclk` clock.
498 * connections to other components: `port` and `ports` field link the
499  component to ports of other components
500 
501 To create the device tree, some information about the platform is required:
502 
503 * The memory address of the CoreSight components. This is the address in
504  the CPU's address space where the CPU can access each CoreSight
505  component.
506 * The connections between the components.
507 
508 This information can be found in the SoC's reference manual or you may need
509 to ask the platform/SoC vendor to supply it.
510 
511 An ETMv4 source is declared with a section like this:
512 
513 ```
514  etm0: etm@22040000 {
515  compatible = "arm,coresight-etm4x", "arm,primecell";
516  reg = <0 0x22040000 0 0x1000>;
517 
518  cpu = <&A72_0>;
519  clocks = <&soc_smc50mhz>;
520  clock-names = "apb_pclk";
521  port {
522  cluster0_etm0_out_port: endpoint {
523  remote-endpoint = <&cluster0_funnel_in_port0>;
524  };
525  };
526  };
527 ```
528 
529 This describes an ETMv4 attached to core A72_0, located at 0x22040000, with
530 its output linked to port 0 of a funnel. The funnel is described with:
531 
532 ```
533  funnel@220c0000 { /* cluster0 funnel */
534  compatible = "arm,coresight-funnel", "arm,primecell";
535  reg = <0 0x220c0000 0 0x1000>;
536 
537  clocks = <&soc_smc50mhz>;
538  clock-names = "apb_pclk";
539  power-domains = <&scpi_devpd 0>;
540  ports {
541  #address-cells = <1>;
542  #size-cells = <0>;
543 
544  port@0 {
545  reg = <0>;
546  cluster0_funnel_out_port: endpoint {
547  remote-endpoint = <&main_funnel_in_port0>;
548  };
549  };
550 
551  port@1 {
552  reg = <0>;
553  cluster0_funnel_in_port0: endpoint {
554  slave-mode;
555  remote-endpoint = <&cluster0_etm0_out_port>;
556  };
557  };
558 
559  port@2 {
560  reg = <1>;
561  cluster0_funnel_in_port1: endpoint {
562  slave-mode;
563  remote-endpoint = <&cluster0_etm1_out_port>;
564  };
565  };
566  };
567  };
568 ```
569 
570 This describes a funnel located at 0x220c0000, receiving data from 2 ETMs
571 and sending the merged data to another funnel. We continue describing
572 components with similar blocks until we reach the sink (an ETR):
573 
574 ```
575  etr@20070000 {
576  compatible = "arm,coresight-tmc", "arm,primecell";
577  reg = <0 0x20070000 0 0x1000>;
578  iommus = <&smmu_etr 0>;
579 
580  clocks = <&soc_smc50mhz>;
581  clock-names = "apb_pclk";
582  power-domains = <&scpi_devpd 0>;
583  port {
584  etr_in_port: endpoint {
585  slave-mode;
586  remote-endpoint = <&replicator_out_port1>;
587  };
588  };
589  };
590 ```
591 
592 Full descriptions of the properties of each component can be found in the
593 Linux source at Documentation/devicetree/bindings/arm/coresight.txt.
594 The Arm Juno platform's devicetree (arch/arm64/boot/dts/arm) provides an example
595 description of CoreSight description.
596 
597 Many systems include a TPIU for off-chip trace. While this isn't required
598 for self-hosted trace, it should still be included in the devicetree. This
599 allows the drivers to access it to ensure it is put into a disabled state,
600 otherwise it may limit the trace bandwidth causing data loss.