Musings

Strive away from ‘hobbyist’ MCUs, particularly anything RPi. Poor documentation, may implement slow ‘mass storage bootloader’, non-standards like mailboxes etc.

Could prevent having to create own PCB if can buy say in a pro-micro form factor

TODO: Have browser new-tab page load a local .html with organised frequently visited links, e.g. SEARCH: google; TOOLS: github, stackoverflow; etc.

IMPORTANT: dlclose() will only unload if reference count 0. So, with thread_local variables will not actually unload

niagara renderer first 3D game: https://www.pardcode.com/cpp-3d-game-tutorial-series

TODO: Windows WM_PAINT when window moved/obscured, main loop otherwise therefore, 2 places to possibly draw. does X11 have this also?

resilio sync video downloads: https://hero.handmade.network/forums/code-discussion/t/3242-downloading_videos_using_resilio_sync_formerly_bittorrent_sync

X11 copy and paste: https://handmade.network/forums/articles/t/8544-implementing_copy_paste_in_x11 font rasteriser: https://handmade.network/forums/wip/t/7610-reading_ttf_files_and_rasterizing_them_using_a_handmade_approach

Often legacy supported code adds lots of corner cases, e.g. opengl i.e don’t have consistent performance characteristics

immediate mode and retained mode are about lifetimes. for immediate, the caller does not need to know about the lifetime of an object.

discovering for linux docs (xlib, alsa) are code and for development may have to add to groups (uinput)

Visualising essential for debugging insidious problems… Print rows of text aligned Conversion from single-state to recording data over time will require memory which we will cast to appropriate state structure, e.g. DebugState state structures will have b32 is_initialised field which will initialise a memory arena Then have CounterState and CounterSnapshot (move per frame data into this) Each frame, copy over src to dst (with a rolling buffer) In the rendering of the records, to display snapshots, generate statistics like min, max, avg. These will be converged into a Statistic struct for each value Have helper function for begin, update and end statistic with value Now we have max, loop over again to generate graph height scale and say red colour scale Alternatively we could have absolute height scale, e.g total / 0.033f (So in drawing, will typically have ‘raw’ structures that just have data, and will then loop over these again to generate relationships to actually draw from etc.)

(TODO: UI have layout and font information…) Drawing have left_edge, top_edge Drawing charts, define chart_height, bar_width, bar_spacing etc. in pixels Can draw single pixel height reference line Cycle through colours with arr[index % ARRAY_COUNT(arr)]

Drawing just take state, and input. After base drawing, look at input and alter if appropriate

Draw routines with origin vector and vector axis (basis vectors) Axis that aren’t perpendicular causes shearing

artist creates in SRGB space (in photoshop) so if we do any math on it (like a lerp), will have to convert it to linear space and then back to srgb for the monitor (if we were to just blit directly, it would be fine) to emulate complicated curves, could table drive it we compromise on r² and √2 without gamma correction, resultant image will look very dim (due to nature of monitor gamma curve) so with rgb values, preface with linear or srgb

TODO: memfault debugging with IoT https://embeddedartistry.com/wp-content/uploads/2022/07/IoT-Device-Observability-Requirements-Checklist.pdf?mc_cid=582b1ec73e&mc_eid=UNIQID

working with MPU: https://www.state-machine.com/null-pointer-protection-with-arm-cortex-m-mpu?mc_cid=582b1ec73e&mc_eid=UNIQID

RTOS debouncing: https://blog.golioth.io/how-to-debounce-button-inputs-in-a-real-time-operating-system/?mc_cid=582b1ec73e&mc_eid=UNIQID

TODO: Fun graphical displays https://tcc.lovebyte.party/ YouTube LoveByte and FieldFX byte jams

TODO: how to best architect timing requirements given that ISR should be small? e.g. this executes every 1 second, this every 500ms etc.

Thermal imaging coupled with camera more ‘deterministic/maintainable’ than AI vision

There are of course international standards (IEC 60601, or IEC 62304) that you have to follow in order to get your device CE/FDA approval. These standards usually require to provide a lot of documentation and to pass a series of tests in order to verify that your device is working as intended and it’s not dangerous.

could even be simplex (one way only) full-duplex can still be serial, requiring at least two wires (duplex meaning bidirectional) serial (synchronous with clock data; asynchronous) + parallel

If you put “used Bluetooth” then I would expect you to talk about which transceivers you used, what the api was like (deficiencies it had, how low level, how was it good), and of course a high level overview of where the Bluetooth spec ends and application begins.

Simply bieng able to say “I don’t know, but I do know that in the spec/experience it says xyz, so maybe it would do abc, but I would have to look for jkl to be sure.” is huge.

BUTTON BOX 3D PRINT: * print out small test part with holes to test if button/screw/etc. inserts fit correctly * consider leaving holes in 3D scaffolding for making wiring easier * for stylistic finish, can print a separate plate with different resin and attach with hot glue * will have a bottom plate to account for wiring (have threaded inserts (male-to-female) to account for box moving about when pressing a button) 1. solder wires to buttons 2. solder wires to mcu 3. secure mcu

no heap allocation, just use statically allocated pools: https://mcuoneclipse.com/2022/11/06/how-to-make-sure-no-dynamic-memory-is-used/?mc_cid=26981ac7f4&mc_eid=UNIQID

Fast Fourier Transform (FFTs) are often used with DACs to create a spectrum analyser which allows for subsequent beat detection?

tiny solid-state batteries more efficient that coin cell batteries; useful for wearables

bipartite ring buffer more efficient? TODO: seems with RTOS emphasis on ‘tasks’, will require understanding job queues https://www.aosabook.org/en/freertos.html

filters relating to a pedometer: https://www.aosabook.org/en/500L/a-pedometer-in-the-real-world.html

TODO: richard braun embedded (useful C has functions) https://www.twitch.tv/videos/233685076?filter=all&sort=time

I only touch low-voltage 5V side, the high ‘dangerous’ voltage side I leave to hardware engineers

TODO: FreeRTOS spectrum analyzer, i.e. using ‘tasks’: https://www.youtube.com/watch?v=f_zt7zdGJCA

https://iosoft.blog/2020/09/29/raspberry-pi-multi-channel-ws2812/ addressable rgb, each LED has serial chip with a serial port in and serial port out that it funnels through the chain (interesting option of having a light display at sunset) using PlatformIO (python under the hood) very slow. perhaps though, it’s useful for finding libraries or possible environment options to explore like unit testing BOARD-SELECTION: (want a board with WiFi and a builtin display?) IMPORTANT: chip cost may be $4, board cost $15 go with ARM (although ESP32 much more powerful than Atmel, e.g. RAM, peripherals, frequency similar cost)

LEDS: tricolor led can be any 3 colours. RGB LED is specific colours we power level appropriate from cable, board will enter program mode automatically resolution of PWM setup dependent on the frequency set up for it (e.g. resolution of 8 gives 100% duty cycle of 255)

TODO: is brightness of PWM determined by duty cycle? If so, then a PWM LED fade should just be duty cycle, or should it be colour as well?

DISPLAY: useful to display FPS (might only need to display this say, every 250ms as oppose to every frame), power consumption (perhaps unscaled power, i.e. power if not performing alterations; perhaps only bright LED is power throttling) technology, size, connection as various permutations of these, have various controller permutations e.g. SH1107_I2C, SSD1306_SPI, etc. (we probably want a library to draw lines, shapes, fonts, etc.) I2C developed by Phillips to allow multiple chips on a board to communicate with only 3 wires (id is passed on bus) (number of devices is limited by address space; typically 128 addresses?) price difference between a shape drawable display and character display?

boustrophodon: 0 > 1 > 2 > 3 > 4 | 8 < 7 < 6 < 5 <– IMPORTANT: LED matrix arranged as boustrophodon columns if (x % 2 == 0) return (x * height) + y; else return (x * height) + (height - 1 - y);

FFT divides samples into frequency buckets logarithmic scale employed in spectrum analyser to account for high frequency range being more greatly separated than low frequency

Modern OS will have code memory write protected for security reasons. Bare metal can do this however

IMPORTANT: for a preemptive multitasking kernel like linux, a call to pthread_yeild() (allow other threads to run on CPU) is not necessary. however, for embedded, maybe

LEDS: WS2812B is standard. Neopixel brand. Each chip is RGB LED with MCU. probably has low drop-out regulator onboard so can pass 5V to 3.3V can buy as grids or strips have a library to generate the square wave forms, e.g. FastLED

by powering board with pins, can ensure enough current is passing to it can simultaneously connect USB cable and will defer to power pins over it?

With power, still have to be careful if say, setting max brightness and all LEDS to white seems it’s common to have GBR as format? TODO: only in video 4 are power calculations done? (have a function to limit max watts of power drawn?)

// #define AT_MOST_N_MILLIS(NAME,N) static AtMostNMillis NAME(N); if( NAME ) // operator bool() { return ready(); }

8BIT MATH (another reason to inspect assembly): // qadd8( i, j) == MIN( (i + j), 0xFF ) (saturated add) // qsub8( i, j) == MAX( (i - j), 0 ) // mul8( i, j) == (i * j) & 0xff // add8( i, j) == (i + j) & 0xff // sub8( i, j) == (i - j) & 0xff // sin8( x) == (sin( (x/128.0) * pi) * 128) + 128 // cos8( x) == (cos( (x/128.0) * pi) * 128) + 128 #if QADD8_C == 1 unsigned int t = i + j; if( t > 255) t = 255; return t; #elif QADD8_ARM_DSP_ASM == 1 asm volatile( “uqadd8 %0, %0, %1” : “+r” (i) : “r” (j)); return i; #endif

TODO: networking with containers: https://iximiuz.com/en/series/computer-networking-fundamentals/

TODO: look at embeddedartistry course developments for ‘building testable embedded systems’

LIN protocol for vehicles (seems to be a host of protocols specific to automotives)

buck converter steps down DC-DC voltage, while stepping up current (various step-down mechanisms in relation to AC/DC and voltage/current)

although bluetooth LE say 50m distance, a repeater can be used (and really for any RF)

TODO: profiling not your application https://linus.schreibt.jetzt/posts/qemu-9p-performance.html

TODO: https://dev.to/taugustyn/call-stack-logger-function-instrumentation-as-a-way-to-trace-programs-flow-of-execution-419a

TODO: outdoor project: https://hackaday.io/project/186064-green-detect look into further hackaday.io/projects

TODO: TPM (trusted platform module) for embedded https://www.amazon.com/Trusted-Platform-Module-Basics-Technology/dp/0750679603

TODO: freeRTOS https://www.digikey.com/en/maker/projects/getting-started-with-stm32-introduction-to-freertos/ad275395687e4d85935351e16ec575b1

Digital laser dust sensor (particulates in the air). PM1 being worst as most fine TODO: DC vs stepper motor? TODO: what is the use case for a motor driver such as L298 Dual H-Bridge Motor Driver and Tic T500 USB Multi-Interface Stepper Motor Controller (circuitry without MCU?) driver lines are diode protected from back EMF?

Growth of ‘enviro sensor kits’, i.e. test air quality etc. to create smart home or garden Growth of ‘IoT’ sensor kits/smart home Growth of ‘AI sensors’ (Perhaps more relevent to me is power/energy and automation and sensing) Growth of ‘IoT’ sensor kits Growth of sensor compounding, e.g. video now with LiDAR to detect depth, gesture detection sensor

H-bridge is IC that switches voltage polarity, e.g. run DC motors forwards or backwards Rectifier converts AC to DC (transformer is high voltage AC to low voltage AC)

TODO: is something like SerialStudio necessary for visualisation? Could not live stream gnuplot?

TODO: domestica course on creative coding. css, html graphics also for 2D animations?

Marketing refers to some MCU work that react to environment as physical computing

CFFI to create a python interface for C for something say like a console session?

TODO: Perhaps for a machine just targeting embedded development, use WSL to easily use GUI debugger apps?

TODO: Essentials of putting metadata section in firmware binary, such as version string! (particularly so test runners can utilise this information)

TODO: amplifiers, e.g. class-D etc. MEMs accelerometers for vibration detection in cars

Analog LED is single colour Digital LED allows controlling each colour separately (so, a.k.a e.g addressable APA102 LEDs) This is done through an LED chip (think WS2812B LED chip for NeoPixel)

A channel in a sensor is quantity measured. So an acceleration sensor could have 3 channels, 1 for each axes

DSP: http://www.dspguide.com/pdfbook.htm terms like THD (total harmonic distribution), PFC (power factor correction), HV (high voltage)

circuit design in software (perhaps parallel with Robert Feranc?): https://www.jitx.com/

Simulator testing gives much faster turn-around times, can add sanitisers without memory concerns, pass peripheral data in from file, draw to window instead of display etc.

For memory allocator, set to 0 on free in debug builds? Investigate gcc tunables, e.g. in debug build: export MALLOC_PERTURB_=((RANDOM % 255 + 1))

network game (with some testing tools): https://github.com/TheSandvichMaker/netgame

TODO: how to calculate: “Its power supply system can supply up to 4200 mAh and run for more than 5 hours” SiC (silicon carbide) power module Whole area in power management (also leads onto safety regulations, e.g. SIL)

use time-series database where everything is stored ordered (as oppose to relational based on set thoery whose order is determined in clauses. also get full SQL support here and can store more data types)

investigate automated ‘agriculture’, e.g smart argicultural kit, automatic pot plant watering, etc.

how to overclock and underclock? how are these different to adjusting clock scalers?

investigate cpu fault handling: https://github.com/tobermory/faultHandling-cortex-m

DC motor: raw PWM signal and ground signal controls speed high rpm, continous rotation (e.g. fans, cars) servo: dc-motor + gearing set + control circuit + position sensor signal controls position limited to 180° accurate rotation (e.g. robot arms) stepper: can be made to move precise well defined ‘steps’, i.e. jumps between electromagnets position fundamental (e.g. 3D printers)

TODO: working with thumb instructions TODO: best practices monitoring systems in the field (4G?)

May compile for different architectures in embedded for different product lines e.g. low-end fit bit, high-end fitbit

Compile with different compilers to see performance benefits at end. Also possible may have to use a particular compiler for specific hardware.

In reality, don’t want an RTOS if timing very critical. Most MCU don’t even have multiple cores so adding software overhead (or are there additional hardware to emulate cores?) Even without RTOS, will most likely have some timer and separation of tasks. Main overall benefit of RTOS is consistent driver interfaces across multiple mcus (useful for say complex bluetooth/ethernet stacks etc.) Most mcus are overpowered for what they do, so using an RTOS is probably a good idea.

tamper response usually done with a button on the board that gets activated when case opens this will trigger an interrupt handled by RTC (real time clock)

battery balancing relevent to multiple cells TP4056 Lithium Battery Charging Board?

verification: requirements validation: does it solve problem authenticate: identity authorise: privelege token something used to authorise

robot ➞ renode (why not just qemu with shell scripts perhaps?) seems that robot/renode parsing of UART cleaner?

embedded systems special purpose, constrained, often real time (product may be released in regulated environment standardsd, e.g. automotive, rail, defence, medical etc.) challenges are testabilty and software/hardware comprimises for optimisation problem solving, e.g. bit-banging or cheap mcu, external timer or in-built timer, adding hardware increases power consumption, e.g. ray tracing card or just rasterisation, big.LITTLE clusters 1/4 scalar performance for 1/2 power consumption good tradeoff

C type qualifiers are idempotent, e.g. const const const int is fine sparse + smatch static analysis tools that give named address spaces (near, far pointers)

synchronisation constructs: * lock (only one thread access at a time) * mutex (lock that can be shared by multiple processes) * semaphore (can allow many threads access at a time)

function reentrant if can be swapped out and its execution rentered functions that operate on global structures and employ lock-based sychronisation (could encounter deadlocks if called from signal handler) (and are thread-safe) like malloc and printf are not reentrant

TODO: DSP, RTOS, wireless + IoT, battery/power, peripheral protocols (USB, LCD, etc.), CODECS, optimisations particular to MCUs, assembly knowledge, sql, c++ stl, python systems testing (continous integration), bootloaders

Language of the People

Garbage collection replaces us with a ‘search’ of our memory and decides when to free So really only use if you can’t figure out when to free something This search is not free

No language implements a feature that determines a memory footprint, i.e. how much memory we use

Know OS specific forking (process launching), file checking/reading/writing, IPCs

Steps for adding new feature: create a new file, e.g. commands.c and include it in main.c (order it before things that will use it) then do a git commit. This gets us off on a nice feeling In new file, create function with barebones functionality that can then be inserted into location and called/tested

Simple single line parser: First separate by whitespace with while (at[0] != '\0'); break and eat_spaces(), find_ch_from_left()

michael ee for RTOS: https://www.youtube.com/playlist?list=PLLYZoEqwvzM35p2Kc7bk7bkwxLtTVwpvy

A real time scheduling algorithm is deterministic (not necessarily fast), i.e. it absolutely must (soft time is it should) (real time processing means virtually immediately) So, a higher priority task will preempt lower priority tasks FreeRTOS will have a default idle task created by the kernel that is always running (this idle task gives indication of a low-power mode for free?)

freeRTOS makes money through some commercial licenses (with support), middleware (tcp/ip stacks, cli etc.)

TODO: setting freeRTOS interrupt priorities is sometimes done wrong? tasks are usually infinite loops

freeRTOS is more barebones (only 3 files) and effectively just a scheduler (so has timers, priorities) and communication primitives between threads In most embedded programs, sensors are monitored periodically. Time and functionality are closely related For small programs super loop is fine. However, when creating large programs, this time dependency greatly increases complexity. So, a priority based real-time scheduler can be used to reduce this time complexity. (priority over time-slice more efficient in most cases, as if not operating, can go to sleep) In addition, schedular allows for logical separation of components (concurrent team development) Also allows easy utilisation of changing hardware, e.g. multiple processor cores COTS (commercial-off-the-shelf) as opposed to bespoke

SEPARATE API CALLS TO AVOID LOW PRIORITY CORRUPTING BY BEING INTERRUPTED BY HIGH PRIORITY Define a context structure that will hold api call information

typedef struct file_api_params_s { uint8_t api_opcode; uint8_t control_opcode; uint8_t file_handle; semaphore sem; void * param1; size_t param2; } file_api_params_t;

Depending on the RTOS you can actually define your messages and message pool ahead of time - or you’ll be providing a pointer to the parameter structure in your message file_api_params_t file_api_pool[n]; //where n = number of tasks that can access the file system + 1;

Each api function does the following: - allocates a free param structure from the pool - populates the parameters from the api call - initializes the semaphore - allocates and initializes a message to send to the file system task - adds the message to the queue - wait for the semaphore

The filesystem task does the following: - waits for messages from the message queue - checks the message validity and that the sender still is valid (for some systems that have bad behavior if the semaphore api does not check for validity) - performs the requested operation - increments the semaphore - goes back to waiting for a message So, filesystem task is assigned high priority

zephyr more of an RTOS a step towards linux with lots of drivers e.g. LVGL, LittleFS, etc. (with this comes a lot more configuration hassle)

TODO: zephyr series (perhaps better to do this first as an RTOS as more modern resources than FreeRTOS?) https://training.golioth.io/ https://blog.golioth.io/zephyr-threads-work-queues-message-queues-and-how-we-use-them/?mc_cid=f6bdf458d1&mc_eid=UNIQID https://blog.golioth.io/taking-the-next-step-debugging-with-segger-ozone-and-systemview-on-zephyr/

lego technics for gear ideas: https://www.youtube.com/playlist?list=PLJTMHMAVxQmvTGatIj-X2rJrN8aGJEaQD

first step in embedded debugging commandments; thou shalt check voltage (e.g. check 5V going to LCD by placing multimeter on soldered pin heads)

we can see in x86 (wikichips), instruction and data separate from L2 cache arm SoC block diagram (datasheet), see d-bus and i-bus to RAM introduce things like CCM (core coupled cache) and ART (adaptive real-time accelerator) that add some more harvard like instruction things essentially, more busses instead of more cores like in x86, (i.e. a lot more than just a CPU to be concerned with) also have more debug hardware don’t really know why cortex-m4 has MPU

AXIM, AHB and APB ARM specific will have a bus matrix (which allows different peripherals to communicate via master and slave ports by requesting and sending data) off this, have AHB (higher frequency and higher bandwidth). like to think of AHB as host bus as it feeds into APBs via a bridge. APB1 normally half frequency of APB2 we can see that DMA can go directly to APB without going through bus matrix So, relevent for speed and clock concerns going through which bus? Also to know if we are DMA’ing something and CPUing something, they are not going to be fighting on the same bus, i.e. spread out load lower power peripherals on lower frequency busses? this level of knowledge further emphasises need to know hardware to understand what is going on

Embedded Workflow

https://go.memfault.com/debugging-embedded-devices-in-production-virtual-panel?mc_cid=32b3cae3e7&mc_eid=UNIQID https://go.memfault.com/embedded-device-observability-metrics-panel-recording?mc_cid=32b3cae3e7&mc_eid=UNIQID

use specifically for understanding mesh networks in context of bluetooth?: https://academy.novelbits.io/register/annual-membership?_gl=115qz5rp_gaNzM2MjQ4ODUzLjE2NTY0OTIwNjc._ga_FTRKLL78BY*MTY1OTU3OTAxNS4xLjEuMTY1OTU3OTIyMy4w&_ga=2.149910452.341656361.1659579015-736248853.1656492067

https://interrupt.memfault.com/blog/ota-delta-updates?utm_campaign=Interrupt%20Blog&utm_medium=email&_hsmi=222505339&_hsenc=p2ANqtz-9kmqPywlKxifduWJneXhUh1h_RQ4bf-v41o2qF8iBciZYc9beFlhwM4EiOVbP3DKUl8kxc_4GOIdpzvkJi5iOGzgwSWA&utm_content=222505339&utm_source=hs_email

automated crash reporting: https://lance.handmade.network/blog/p/8491-automated_crash_reporting_in_basically_one_400-line_function#26634 in embedded, how are metrics transmitted remotely, i.e. via bluetooth to phone than web? too much power if directly to web?

something related to HIL (hardware-in-loop testing) https://blog.golioth.io/golioth-hil-testing-part1/?mc_cid=da33e3796b&mc_eid=UNIQID

something related to systems testing with aardvark SPI/I2C adapter (more tutorials with bus pirate) 10:00 time mark: https://www.youtube.com/watch?v=N60WSQc-G_8&list=PLTG9uzDd_HQ84wVz0DwQ5_mwf1GnpY6LB&index=11 seems indepth bus pirate manual is on git? http://dangerousprototypes.com/docs/Bus_Pirate https://learn.sparkfun.com/tutorials/bus-pirate-v36a-hookup-guide/all (look for device specific tutorials on bus pirate website) http://www.starlino.com/bus_pirate_i2c_tutorial.html

seems that RMII (reduced media-independent interface) is a pin layout to connect MAC devices (flexible in implementation). Can be implemented to support say an RJ45 connector

stm32 datasheet and reference manual (documents of different depths about same mcu) nomenclature will have ‘Application Notes’ that detail specific features like CCM RAM datasheet will often be related to a family, e.g. stm32f429xx. therefore, at the front will have a table comparing memory, number of gpios, etc. for particulars

embedded-blogs: https://patternsinthemachine.net/ https://blog.feabhas.com/ https://blog.st.com/ https://dmitry.gr/?r=05.Projects https://tratt.net/laurie/blog/ http://stevehanov.ca/blog/ https://thephd.dev/ https://www.embeddedrelated.com/blogs.php https://lemon.rip/ https://jpieper.com/ https://www.embeddedrelated.com/ https://patternsinthemachine.net/category/general/ https://embeddeduse.com/ https://martinfowler.com/articles/patterns-of-distributed-systems/?mc_cid=3835da293a&mc_eid=UNIQID

https://embeddedartistry.com/fieldatlas/embedded-software-development-maturity-model/?mc_cid=da33e3796b&mc_eid=UNIQID

Fallback to: https://grep.app/search when searching for code snippets on github?

courses: https://www.youtube.com/watch?v=dnfuNT1dPiM&t=25s https://www.udemy.com/course/mastering-microcontroller-with-peripheral-driver-development/?ranMID=39197&ranEAID=YuKpx7UHSEk&ranSiteID=YuKpx7UHSEk-VcxzUKL7wAR1VyB2muQaaQ&LSNPUBID=YuKpx7UHSEk&utm_source=aff-campaign&utm_medium=udemyads https://www.udemy.com/course/microcontroller-dma-programming-fundamentals-to-advanced/?ranMID=39197&ranEAID=YuKpx7UHSEk&ranSiteID=YuKpx7UHSEk-85NcF8rkTsNIoMCfETBU3g&LSNPUBID=YuKpx7UHSEk&utm_source=aff-campaign&utm_medium=udemyads https://www.udemy.com/course/mastering-rtos-hands-on-with-freertos-arduino-and-stm32fx/?ranMID=39197&ranEAID=YuKpx7UHSEk&ranSiteID=YuKpx7UHSEk-qjeQee0Iel.PZ6z63nXsmw&LSNPUBID=YuKpx7UHSEk&utm_source=aff-campaign&utm_medium=udemyads

silicon mainly obtained from quartz. electric arc when insulator air is supplied enough energy to ionise

astable means no stable states, i.e. is not predominatley low or high, e.g. square wave, sine wave etc. oscillator generates wave (could be for carrier or clock)

RC (resistor-capacitor) oscillator generates sine wave by charging and discharging periodically (555 astable timer) internal mcu oscillators typically RC, so subject to frequency variability

Max out the HCLK in the clock diagram as we are not running off battery. Will have clock sources, e.g. HSI, HSE, PLL. output of these is SYSCLK. SYSCLK is what would use to calculate cpu instruction cycles.

a clock is an oscillator with a counter that records number of cycles since being initialised

Crystal generates stable frequency PLL is type of clock circuit that allows for high frequency, reliable clock generation (setup also affords easy clock duplication and clock manipulation) So, PLL system could have RC or crystal input Feeding into it is a reference input (typically a crystal oscillator) which goes into a voltage controlled oscillator to output frequency The feedback of the output frequency into the initial phase detector can be changed Adding dividers/pre-scalers into this circuit allows to get programmable voltage. So, a combination of stable crystal (however generate relatively slow signal, e.g. 100MHz) and high frequency RC oscillators (a type of VCO; voltage controlled oscillator)

openocd -f /usr/share/openocd/scripts/interface/jlink.cfg -f /usr/share/openocd/scripts/target/stm32f4x.cfg should open a tcp port on 3333 for gdb

The CPU architecture will have an exception (a cpu interrupt) model. Here, reset behaviour will be defined. the 32 bit arm cortex-m4 has FPU (a application, m for microcontroller, r high performance real time) TODO(Ryan): avr vs arm vs rsic-v vs x86 vs powerpc vs sparc vs mips (what motivations brought about these architectures?) as often harvard archicture (why?) Von Neumann, RAM (variables, data, stack) + ROM (code/constants) + I/O all on same CPU bus. Harvard has ICode bus for ROM, and a SystemBus for RAM + I/O. This allows operations to occur simultaneously. So, why use Von Neumann? it is labelled as an evaluation board different boards use different ICDI (in-circuit debug interfaces) to flash through SWD via usb-b e.g. texas instruments use stellaris, stm32 ST-link is fixed point used anymore? TODO(Ryan): Why is a floating pin also called high impedance? To avoid power dissipation and unknown state, drive with external source, e.g. ground or voltage.

Pull-up/down resistors are to used for unconnected input pins to avoid floating state So, a pull-down will have the pin (when in an unconnected state) to ground, i.e. 0V when switch is not on

IMPORTANT: Although enabling internal resistors, must look at board schematic as external resistors might overrule

Vdd (drain, power supply to chip) Vcc (collector, subsection of chip, i.e. supply voltage of circuit) Vss (sink, ground) Vee (emitter, ground)

the sparsity of linux can make configuration vary e.g bluez stack -> modify policies

for long range, LoRa or sigfox essentially tradeoffs between power and data rate ieee 802.11 group for WLANs (wifi - high data rate), 802.15 for WPANs; 802.15.1 (bluetooth - le variant - heavily used in audio), 802.15.4 low data rate (zigbee implements this standard)

unlike windows msdn, linux documentation is mostly source code (not good as if not fast/easy, not used). so, essential to have something like ctags and compile from source (sed -e “s/-Werror//g” -i *.make) source and unit tests are documentation in many linux sources

https://embeddedartistry.com/blog/2018/06/04/demystifying-microcontroller-gpio-settings/?mc_cid=c443ecbc14&mc_eid=UNIQID https://embeddedartistry.com/blog/2019/04/08/a-general-overview-of-what-happens-before-main/?mc_cid=c443ecbc14&mc_eid=UNIQID

Aardvark adapter essential for automated testing (so, an adapter of sorts should always be used for automated testing?)

E-COMMERCE

find products from amazon, etsy, ebay, etc. however, only sell with aliexpress as offers dropshipping methods, i.e. no packaging alibaba is wholesale not dropshipping, i.e. must buy in bulk

(buying in bulk will get faster shipping times) manually -> DSers app to automate purchases -> sourcing agent be upfront with customers regarding shipping time at the start (we are just testing, i.e. consistent sales and proof of demand at the start, so long shipping times are fine. just mention in FAQ or shipping policy etc.) (get customisable product later, i.e. branded box as will have to pay in bulk on Alibaba for them to do this)

Handling Returns: 1. damaged: get user to send photo, have them keep the product, get supplier to ship another 2. ok: ask user to pay for shipping costs back to you. then refund them

TikTok film yourself using a product, see if video can go viral (whole skill in itself)

front-loaded time investment. once found good product, 30-60min per day of ads (ads manager, metrics, etc.) 2-3 hours a day overall spend daily time on product research (DEVELOP SKILLS TO FIND A PRODUCT THAT IS WANTED AS A GOOD PRODUCT IS KEY), copyrighting, competitors, etc.

targeting impulse buyers, not value shoppers (so, sell under $100) show innovative product with wow factor, they buy on the spot

BRANDING:

digital age allows for microexperiments to test market, instead of wasting capital on something that could fail (i.e. put advertising into market first to see how customers react)

does it target niche customer (write down target audience and their problems you are solving) get product that solves problem or adds value to their life in a meaningful way product must have a unique selling point high perceived value or problem solver, e.g. posture corrector -> solves problem vegetable slicer -> adds value can’t be bought in stores (not commonly advertised, not a basic product in store) (no one will buy it from an unknown store if not unique) targeted to customer niche lightweight and easy to ship possible to add markup pick something that you have expertise in, allowing for easier market research - health (back pain, sun-burn); pet supplies needs to have WOW/UNIQUE-factor to GRAB ATTENTION IN AD

Have 5-10 products on website to build trust. Only put ads in for one HERO PRODUCT at a time

DAILY RESEARCH TO SELECT PRODUCTS: TIME CONSUMING PROCESS TO UNDERSTAND MARKET REGULAR ACTIVITY, NO OFF SWITCH - manual (see what products are selling well on websites); Google Trends to verify search volume for product - social media ads (follow big theme instagram pages, initiate checkout to get ad algorithm to follow) - (START WITH THIS: spy tools (combine previous 2)) - IMPORTANT: The only way to know if will work is to test with your own ads. These are just indicators - identify how much competitor selling for by reverse searching image As gain experience, will see products over and over again, i.e. product saturation (Google Trends can show this)

STORE CREATION

Make sure domain is available before setting site name (short word namelix.com) Also ensure domain is verified and web events prioritised before ads?

Tracking essential before running ads, so as to better optimise ads Facebook/tiktok pixels tracking code on website installable through shopify app Google analytics as well

Shopify apps: * Fulfillment app: DSers -> aliexpress (CJDropshipping, S-pocket, weho, zendrop all have there own apps) * Aftership (helps customers track order) * Klaviyo/SMSBump marketing channels (email/text, e.g. abandoned cart emails etc.) * Vitals (Product Reviews, Volume Discounts/Product Bundles (Upsells), Currency Converter, Visitor Replays, Wheel of Fortune, Frequently Bought Together, Related Products and Product Description tabs.) (want to save purchase information to allow one-click purchasing if returning to increase LTV) (offer discount in post-purchase emails) (GO INTO SHOPIFY AND MODIFY CONFIRMATION EMAIL TO INCLUDE 24HR DISCOUNT CONFIRMATION EMAILS HAVE 100% OPEN-RATE)

Essential pages: Home Page, About Us, Product Pages, FAQs, Legal Policies and Contact Us add a Shipping Policy and Track Your Order page via the Aftership app

Conversion rate optimisation (if not getting sales, one of these needs improving): * Product Photos (5-10 photos; flat-lays on whitebackground and lifestyle someone using) initially source from Aliexpress, alibaba, amazon (later down the track can: take photos yourself, 3D realistic model from fiverr/upwork, get professional photos etc.) * Pricing (Compare at Price on Shopify for discount offering?) (Test free shipping. Perhaps best to offer free shipping over $X to increase AOV) * Product Description Hero lines describing one core benefit (i.e. how improve’s life) then 5 bullet-points describing more features (i.e. what the product does) MAKE SURE BENEFIT IS TANGIBLE/SPECIFIC AND METRIC BASED IF POSSIBLE, E.G 1000 songs in pocket can look at real reviews and write description on their pain points * Product Reviews (import reviews from Aliexpress via Vitals app and ensure grammar is correct and have both good and bad reviews) (hero product should have the most) * Website Speed (jpgs over pngs (crushpics app), no videos, low number of apps)

landing page are more suited for digital products (and take a lot longer to make) (can later do with say Shogun, zipify pages etc.)

MARKETING

Human beings are biologically programmed with 8 main desires: Survival, food/drink, freedom from danger, sex, comfortable living conditions, being superior to others, protection of loved ones and social approval. Ads that target these will be subconsciously watched

paid traffic and organic traffic (TikTok with someone using your product. study and recreate competitor videos) AT LEAST $50 A DAY ON ADS. $150-$200 TO TEST A PRODUCT (does not include $50 for ad creative) run ads for 4-7 days to test (if perform terribly on day 1, then probably kill earlier) (have to give time for ad algorithm to optimise to find your buyers) should get sales within 1-3 days after this 4-7 day period, choose to scale up or kill of ad-sets (terrible is 0.5% Link CTR, and $3+ Link CPC maybe only wait 48 hours)

Facebook (includes Instagram) and TikTok (perhaps if targeting younger audience) (tiktok does not require lots of followers to go viral, just good content) Start with one platform, Facebook (king for ecommerce due to data collection and targeted ads) With Facebook, set feedback score to send after 8 weeks to ensure to not being blocked if too many bad reviews

Goodle ads, Snapchat and Pinterest aren’t for impulse buyers (perhaps explore Google ads when doing a branded campaign, e.g. people search for your website)

If doing influencer marketing stick with theme pages as oppose to personal brands, e.g. advertise on an instagram themed page?

Writing a blog and loading it up with keywords can drive ad-free traffic by ranking in google searches. However, long and difficult process

Ad Metrics: INTEREST: * Link CPC (cost per click) how much it costs for one person to click on your ad? (strive for under $1) * Link CTR how many people click through to see website after ad (strive for over 1-2%) (CPC and CTR correlated) ULTIMATE METRIC (if this profitable after 4-7 days, scale product): * Cost per purchase how much it costs for someone to purchase on website?

For Facebook ads create a business account from main profile. Has to be a real account not to get banned If get banned without doing anything wrong, message and should be resolved 1-2 days * Don’t call out people directly, e.g. ‘People have a problem’ over ‘you have a problem’ * No outlandish claims

Add FB pixel helper extension to browser Before launching, click on pages, add to cart, checkout and see if events are firing a pixel?

Go after 1M audience size? Optimise for purchase? (ignore warnings asking to optimise for funnel actions)

Start with (ABO) ad-set budget optimisation. Later might do CBO (campaign budget optimisation)

1%, 3% etc. LLA (look-alike-audiences) target for people who have already made purchases (best to do when say 300-500 events) (involves creating a custom audience and targeting for them specifically?)

Improve CTR by testing new ad hooks (first 3 seconds to capture attention. rest should keep attention till end. should be replayable. copy competitors) Add more call to actions in the ad copy, i.e. add more links If have successul ad-sets, modify the hooks of them an rerelease

CPM (cost per 1000 impressions) is cost of ads, which is largely out of your control. if you have better ads, i.e. more shares and views, the cost of ads will be cheaper

Not getting sales, look at funnel: 1. Ad metrics (are people sharing, etc.) 2. Website metrics (where’s the drop-off in customer activity) 3. New product

Consider changing creatives or extend ad running time if breaking even or slightly profitable, otherwise move on. don’t get attached to products (a winning product is waiting out there)

NEWS - Wrestle the Scorpion

Good news first ever drug to slow down cognitive decline of Alzheimer’s. Also drug to slow down ALS. Perhaps prolonging drugs first step?

Will future cars will be synced with mobile OSs from Apple or Android? Already seeing start of this with CCC (Car Connectivity Consortium) pushing smartphone car unlocking

How nice that Apple’s anti-tracking crackdown only applies to third-party apps

With the growth of hardware, really seems that adaptive learning algorithms are going to be used instead of solving the problem explicitly Perhaps AI solving the most-optimal implemention of an algorithm is more appetising for me

More companies branching into GPUs with AI functionality, e.g. Intel Arc, Acer etc. Furthermore, more companies branching into VR, e.g. Lenovo, Facebook etc.

As you would expect, Raptor Lake CPUs faster single thread speed at a lower wattage

Will it be common place for impaired actors like Bruce Willis to sell likeness for deepfakes? In China, using virtual influencers.

Although decentralisation sounds good (own cell network etc.), it requires user maintenance which people will pay others to do and we’re back to square one in a way

How does Google Tensor G2 chip with various CPU architectures, e.g. Cortex-A78, Cortex-A55, Cortex-X1 work?

OS’s designed for wearables, e.g. WearOS for Android. I suppose wearables have become an established device class target?

It really isn’t tinfoil hat mentality to be wary of updating unless necessary, e.g. most recent kernel patch affected Intel graphics displays

In a literal sense Moore’s law is dead, however chiplets pose interesting alternatives

Character.ai chatbot creator. In fact, AI editor/generators for most artforms seems to exist (video runwayml)

The fluidity of OSs continues with new Ubuntu 22.04 replacing PulseAudio with PipeWire, X11 with Wayland.

Xbox streaming game console. High network speeds seems to make cloud gaming more affordable. However, don’t like the idea of constant network connectivity and the power of the provider to shut you out.

15th Century greatest appearance of ‘geniuses’. Despite access to information, genius declined. Seems that require proffesional tutors at young age to instill a human social engagement

Thanks to TCC compiler, can compile C in memory and load it, hence using it in some way like a scripting language However, this can create serious security holes. Could still use in a sandboxed process, e.g. with libseccomp WASM allows to run a subset of C++/C (and well anything that compiles to webassembly) in browser as a sandbox. Could use with wasmer library Indeed, with WASM, can compile a native library and use it on the browser like SQLite3

With the steady proliferation of VR gaming, it’s a good thing I was not young during this time

DynamicPixelTuning (DPT) promises to make every pixel capable of outputting all colours. Therefore, get 3 times resolution than having combined rgb pixels

Interesting new Danish political party with decisions made by AI. I think AI as a collaborator is promising

Despite new phones offering a plethora of new features, don’t assume that they are bug-free, e.g. pixel phone crash detection malfunctioning on roller coaster, not allowing dialling to 000, etc.

Although you hear new Apple and Microsoft chips, almost certainly ARM under the hood

Although largely unecessary now, Nvidia has lite hash rate which impedes peformance in order to get GPUs into hands of gamers

Marketing plug for cloud gaming consoles when in fact standard phones can do the same thing

With improved network performance, perhaps cheaper to offload calculations to powerful servers (VDI: virtual desktop infrastructure)

NB-IoT (narrowband, i.e. low power). Also have CoAP (constrained application protocol) for embedded devices to access Internet

Easy to fall prey to the act of not adding anything useful to a product, but simply adding IoT and calling it smart, e.g. smart condom

HDR (high dynamic range) refers to colour spectrum. Implemented in technologies such as Dobly Vision Dolby Atmos is a surround sound technology

Not really decentralisation with cloud, as many services just operated by a handful of large companies (not what DARPA intended!) Cloud is really just reduced complexity. It excels when application simple and low traffic (managing a large application in the cloud is just as difficult as on bare metal) Or, your traffic patterns are unpredictable Otherwise, paying an unjustified premium Sold as computing on demand (no complexity) when in reality, just renting computers at a higher premium Let your talents to your own machines, rather than Amazon or Google

Are the petabit speeds of research optical chip really that useful if hardware can’t process that fast? Will radio only get us so far?

More firm on not using Apple, e.g. Apple not allowing other app-stores, taking revenue percentage of adds on apps, etc.

Amazon continue to kibosh any semblance of non-monolopy now getting into home insurance

With development of cheaper more powerful hardware, companies introducing more ‘gaming’ brands e.g. Phillips new gaming monitor

NEWS - Scaling the Libra

Xcode cloud subscription. oh no wasting so much time on naming of new processes, e.g. ‘Developer Experience Infrastructure’

Another billionaire investing in a ‘utopian cities’ seems more dystopic than anything. Promulgation of venture capitalists, angel investors, etc.

PIC have weird instruction set, so generally have to use with assembly over C compiler (no good free compiler) So, use AVR for low-end like just LED driving? e.g. TLC5971

over-current and over-voltage detectors for when using charging devices, e.g USB-C charger? so, perhaps investigate/understand voltage regulators? also have UVLO (undervoltage-lockout)

I can see machine learning ‘appropriate’ in say cleaning up random noise, e.g. brain signals

interesting set of questions to inspect a software engineering workplace: https://neverworkintheory.org/2022/08/30/software-engineering-research-questions.html?utm_source=tldrnewsletter

things like database accelerator library indicative of normal software not being fast

MOSFETs are a type of transistor. different transistors for say quick-switching, low signal, high frequency, amplifier etc.

Could buy a GPU for running interesting machine learning applications like Stable Diffusion (prompt engineers, oh dear…) https://www.krea.ai/ GPU structure similar to CPUs, e.g have cache, GDDR6 memory (more simple parrellisation)

new TV monitor combination QD-OLED curved monitors less fatiguing as physically matches our eye’s shape flexible/bendable monitors no one asked for…

cool looking completely submerged server desktops. pure water is a very good insulator (our tap water will have chlorine for example) obtained via ozone treatment

the Nintendo DSi implemented augmented reality virtual reality is completely immersive

how does wireless charging work without a pad, i.e. no induction charging? interesting power sharing from phone to watch

trie -> gzip. still have to decode and resulting memory same as before compression succinct data structures are designed to not have to be decoded, i.e. everything is stored as bits. ∴ uses a lot less memory (however, only for larger datasets)

C11 static assertions, how to use? performance improvements? static const char sound_signature[] = { #embed <sdk/jump.wav> }; static_assert((sizeof(sound_signature) / sizeof(*sound_signature)) >= 4, “There should be at least 4 elements in this array.”); anonymous structs int fnc(int arr[static 2])

C23 can prevent VLA by defining constexpr true, false keywords nullptr (it’s a pointer not number) enum e : unsigned short { x }; digit separators function attributes

smarter devices not necessarily better, e.g. printer’s with end-of-life software

PLC more expensive microcontroller that is more versatile, e.g. handle voltage overload (often used in assembly lines)

removing shared pointers probably gives later code in the pipeline a small cache boost as all the data is now co-located excessive logging cause for bad performance excessive heap usage bad for performance if on an OS note there is static, stack and heap memory areas

doesn’t seem like quantum computers will be able to solve any practical tasks (unless program exploits quantum parrelism to a large extent)

cloud computing with containers growing, e.g. AWS/Azure ➞ Kubernetes ➞ docker

it’s sad, but you really could do a stand-up of modern software projects, e.g. “introducing Goliath, an automatic external dependency manager. under the hood we use a Nextrus package manager. can be scripted with Freasy language extension of Frosty core language”

seems a lot of phones adding satellite connectivity even though it’s much slower

although space travel seems like the future, how to cope with serious health affects like space radiation

opening up web pages from tech news sites just awful. inudated with floating content-blocking video ads, permanent marquee ads embedded beween paragraphs and browser title bar flashing with bot message notification, bot message popup, cookies accept bar, …

New AI that can edit videos with textual prompts The dramatic shift in technology that mobile and cloud devices have brought is being realised by natural language processing As text is seen as a universal input in a lot of Unix programs, interesting possibilities.

eyes convert photons to electrochemical signal transduction converts one energy form to another CCD (charged couple device; less noise, more power, lower speed) and CMOS (consumer grade) common camera sensors digital cameras convert photons to array of pixels, represented as voltage levels quad pixel camera sensor combines four adjacent pixels in this array

round design of new Nvidia GPU may be evidence for the 20-year cyclical nature of fashion

although all crypto is ponzi scheme, it seems the goals of Ethereum to perform secure financial transactions is headed in a better direction than BitCoin. furthermore following The Merge, it does not rely on power hungry mining. this is in term has led to a lot of 2nd-hand GPUs flooding the market

avx512 cant actually get performance stated by Intel ‘marketing’ as it causes heat up and cpu thermal throttles

smart power homes whereby the source of the power can be discerned, i.e. if it’s clean or not. extends to phone chargers with knowledge of this

good to see (in some ways) hardware vendors pushing for AI standards to allow for greater optimisation

possibility of sprite animation in terminal using chafa tool to create ascii block art

Intel new naming scheme confusing using brand name as category, i.e. ‘Intel Processor’ instead of ‘Pentium’

YouTube no respect for customers, running clandenstine experiment running up to 5 ads at the start instead of spacing them out Furthermore, Mozilla researchers found that buttons like ‘Stop recommending’, ‘Dislike’ have next to no effect

There can be such extremes in software, e.g. from programming hardware in FPGA to writing natural-language descriptions for an AI to convert to code

Amazing creations from AI from still images, videos, digital assistants: https://threadreaderapp.com/thread/1572210382944538624.html Companies like DeepMind, OpenAI chatbot AI used in UI testing GPU DLSS (deep learning super sampling) is AI upscaling

Finally, alleviating ambigious USB 4.0 v2.0 naming scheme with devices having clear USB 40Gbps, 240W printed on them

Compelled to investigate a service like paperspace in order to run OpenAI and other AI projects

Promising Framework laptop build to easily repair in mind. However, subpar experience

Record breaking DDos attacks, 17.2million requests per second, 3.4terabits per second, 809million packets per second

Google have size to challenge Dolby with new HD audio standard. However, whilst seeming altruistic, is just so they don’t have to pay Dolby licensing fees in their hardware

Record-breaking figures acheived with overclocking can be decieving as may employ high-end coolers or even liquid nitrogen

The inundation of Javascript web frameworks has provided a learning point for adopting new technologies. In general, stick to familiar technologies and only adopted bleeding edge later

Cool application of IoT: https://hackaday.com/2015/11/24/building-the-infinite-matrix-of-tamagotchis/

Perhaps the going through graphical ASMR programming videos the ‘enjoyable’ remedy

More encompassing/combined sensors, e.g. AI vision sensor, gesture detection sensor

Increased power of technology, developing perhaps faster interfaces, e.g. search by photo, speech, etc.

sextempber cornucopia of condoms don’t have to be nostradamus to work out parents parking in effect, most people’s journey to learning is bespoke my engineering mind (which has been molded by experience) results in largely ad hoc responses many positive social dealings at university are unfortunately pyrrhic

NEWS - Sagging Arrow

I believe decoding brain waves into functional output is major change in my future

Unikernel has applications bundled inside kernel (so like eBPF) for high performance

Amusing Dead Internet Theory that the Internet died around 2016 and is now largely bots

Interesting 3D print to make diffusing sheet for say LEDs to create different ambiences

Amazing potential of ‘molecular computers’ to make drugs with precise traits (could extend to human’s gaining amazing a priori skills)

Again, Microsoft updates not working. This time, not actually properly applying vulnerability update

Interesting can just buy off the shelf drone and attach something like an ESP32 to send data back to us

Cool program that sort of gamifies video conferencing allowing for individual chats: gather.town

Like ‘Scrum Certification’ now with Matter have an official certification process

Good work being done in legislating IoT labels, e.g. like nutrition labels that give information sensor data collected by device

Chiplets seem future of shrinking size and expanding computing power, now seen in GPUs They can be easily be recombined to create custom designs

VSR (virtual super technology) has GPU scale to 4K and then downscale if necessary to monitor’s resolution However, if want say 8K gaming, will require a DisplayPort cable

Robots beginning to see more everyday use, e.g. waiters. Also, have robots learn in the field, e.g. Texas University have robots walking around campus

Resistive RAM uses analog memory cells to store more information with less energy

GPT-4 upcoming text generation AI Perhaps text-to-speech, language transcription too diverse to solve without AI, e.g. deepgram program

Although some people experience myocarditis after COVID vaccine, myocarditis long been linked to a number of viral infections

AI requires high quality data, i.e. created by skilled individuals. Also requires unique data. Might not have enough unique data in the future

Unfortunate that Intel releasing software-defined-silicon, i.e. pay-as-you-go to enable certain hardware features

Watching me sleep on the floor is watching how the sausage is made in regards to my posture

NEWS - Leo the Lion Enters the Pride

Politics are infiltrating areas of technology: * Rust toolchains on embedded. Rust developers have explicitly tweeted saying technology will always be political * NASA Artemis Accords have first line stating primary focus on women and person of colour on Moon Leads down the path of FSF, and in fact any cultural revolution which thinks it can do whatever it wants in the name of the people. Although perhaps ad hominem, experience has shown that they’re all polite until they’re not.

Seems in with big-tech, marketing is often more important than the underlying technology. 170km ‘The Line’ ecotopia, metaverse, telsa etc. They make wild claims and the general public has no way of knowing the facts behind them (perception vs reality) A scary thought is that in this age, possible for self-sustaining narratives capable of deflecting facts

ARMs open-model allowed vendors to implement custom MPUs that saw in gain dominance over oher RISCs like MIPS and AVR.

Tech companies becoming conglomerate monopolies, e.g. tiktok music, apple tv etc.

Much like the C++ standards committee, concerns of ‘ivory tower’ nature of smart home Matter standards

Seeing consequences of covid semiconductor demand in chip shortages. Exacerbated by increase usage in automotive industry and large dependence on foundries in Taiwan

Annoyed at the web: * Seemingly lack of awareness of bloated and abstracted infrastructure * New technologies in the sphere are just sensationalised titles with little substance e.g. homescreen social media, css layout model and js frameworks

The importance of programming to a physical machine is paramount. The underlying technology is always changing, e.g. arcade machines, consoles, phones, watches, plastic 4bit processors

EU lawmakers want USB-C for all mobile devices to reduce e-waste. Makes me postulate a technocracy.

NEWS - Goat in Current

EMV (europay, mastercard and visa) secure payment technology embedded in credit card chip

Waited till USB-C becoming standard to introduce reversible USB-A (achieved with movable plastic divider and duplicated pins) Intel thunderbolt faster than USB-C, yet ports still look very similar

AI in sensors used to alter configuration to optimise power consumption AI generated voice, text. Disney can now alter age AI parsing of voice (natural language processing) In summary, generative AI everywhere. In fact, with ChatGPT being able to explain technical concepts, birth of AGI (artificial general intelligence) Perhaps this could be used as a sort of offline search engine. In fact, ChatGPT generate prompt for DALLE Whilst ChatGPT solves problems considering computers as generalised machines, it seems eventually it will get there. So, embedded probably the last the be tackled due to unique systems

Genetic engineering in flora seems more appetising, e.g. drought-resistance wheat, air purifing plants

Skeptical of announcements made by budget-starved laboratories (e.g. universities) about breakthroughs for technologies decades away, e.g. fusion There are often caveats and furthermore, commerciality is most likely decades away

Seems that bipartisan government action required to fix rats nest of drivers in modern OSs in a similar vein to EU enforcing anti-competitive laws on Apple to allow third-party apps, USB-C etc.

Have to be careful not to engage in technological contempt culture, e.g. language wars. As technology changes rapidly, address changes with temperance

If social media was all RSS (really simple syndication), things would be much simpler

Example of Apple silicon is M1 chip. Placing some Apple silicon inside new removable monitors to use less power from attached computer

Optical computing, i.e. analog uses less power excel at linear algebra. Makes them ideal for machine learning

Is prompt engineering now an important skill, as oppose to a sad state of affairs?

Would controlling the weather be nice? e.g. releasing sulfur to reflect more sunlight

Medical developments not just better treatments but also easier detection, e.g. Alzheimer via blood test Even still, immune diseases like rheumatoid arthritis are more easily treated with drugs than osteoarthritis

Home appliances with WiFi connectivity allow for remote updates, e.g. LG dishwasher

Mass production to TMSC advanced 3-nm chip underway Unfortunately most likely due to Samsung chips having low yield, they don’t have high QA for voltage regulation as compared to TSMC

NEWS - Aquatic Bearings

Could the US China chip ban result in China becoming more self-reliant in the future as it’s forced to invest in own chip production?

Google wants RISC-V as tier-1 architecture for Android. Part of further push for open source usage so as to not pay licensing fees, e.g. developing their own video codec etc.

AI training on users that will eventually replace them, e.g. Adobe tracking artists workflow, Github copilot, etc. However, AI does not create innovative results.

Interesting storing application state in base64 url. No server required, and browser history becomes undo-redo

NEWS - A Virgin Woman

Thread is new low-power protocol for Matter (and therefore IoT devices, i.e. mesh network). Similar to Zigbee and Z-Wave

Advent of more software in cars has led to subscription based services, e.g. heated seats. However, adding software increases complexity: * Will lack of connectivity (e.g., no cell coverage where you’re at) mean that the features are disabled until you get back into cell range? * Can the servers be DOS’d so that nobody’s seat heaters work? * If I pay for a subscription and sell a car, does the subscription stay with the car? Or will it be like Tesla’s approach, where the new owner has to pay to unlock the software features, even if the previous owner paid? * What if there is a bug on the server that incorrectly reports my subscription status? Will I be refunded, fully or partially? * What happens when you can’t get in touch with customer support because your subscription isn’t being properly detected on the hardware? * What happens if the hardware breaks, but you’ve paid for the subscription? Is repair to heated seats covered under terms of the subscription, or will that be pushed to owners? * What happens if this strategy is used by a smaller company than BMW, who suddenly goes out of business and bricks your otherwise perfectly working hardware due to shutting down servers?

Are big-tech companies realising the flaws in their ‘Simplicity Sprints’ culture? Slowing hiring, realising they have way too many employees

Unfortunate that ‘true security’ makes things more complex and inconvenient, e.g. yubikey

Physics head-scratcher: dark matter makes up 80% of universe, yet we can’t detect it?

Meta releasing AI chatbot to wild has again resulted in racist and sexist comments

More evidence of contrasting quality between modern hardware and software with worldwide Google outage caused by software update

What are FPGAs and how do they allow the creation of hardware for emulating old games like GBA? EU enforcing USB-C forces Apple to switch to USB-C. Yay!

Realise I’m highly misanthropic Frequent STMicroelectronics newsletter coverage of IoT (and the many, many protocols) and machine learning indicative of trend in industry

Hi-fi audio. Newer terms to mean higher fidelity/data resolution new OLED TVs (contrast, blacks). QLED/QNED (brightness) is a adding a ‘quantom dot’ layer into the white LED backlight LCD sandwich

Are we moving down the road of homogenous, e.g. specifically target CPU or GPU or hetereogenous programming, e.g. CUDA

Although a update pathway is provided from 20.04 to 22.04, I’m reluctant to do so due to possible configuration issues. Why I chose LTS 5 year support (2025)

Interesting maglev trains reach amazing speeds and produce minimal noise due to lack of friction

Interesting blood transfusions from older mice to younger mice, the younger mice display characteristics of old mice

Bitcoin is a Ponzi scheme as almost no one actually uses it in transactions, and is purely speculative. Does not create anything. Interesting manifestation of capitalism.

Moore’s law number of transistors doubles every 2 years. Although not strictly true, general trend is holding. Proebsting’s law states that compiler improvements will double program performance every 18 years. Therefore, cautious about the performance benefits a compiler brings. Focusing on programmer productivity is more fruitful In general, newer compilers take longer to compile, but produce slightly faster code maybe 20% faster.

Cyberdecks are evidence that the trend of going smaller isn’t always aesthetic

Growing trend to have workstations operate in the cloud or containerisation. For testing yes, for development however?

Strive towards much more potent nuclear fusion (100million°C) reactors as oppose to nuclear fission (neutron splits Uranium, same as original nuclear bombs)

Amazing that certain old rpm harddrives were susceptible to crashing when ‘Rhythm Nation’ played as the resonant frequency was the same

ripgrep a much faster and user friendly grep! unar will automatically handle pesky non-folder archives

Although the open nature of RISC-V gives it some economical advantages, historically the ISA has not been the major driving factor in widespread adoption. Rather, who invests the most in R&D, e.g. many places will develop ARM, with RISC-V go on your own.

Security an ever present issue, e.g. every Ubuntu weekly newsletter get a list a security updates

Privacy laws prevent recording keystrokes in app, however can record other information like time between keypresses etc. to identify you, e.g. TikTok

Chiplets connection of chips. So, can build chiplets that aren’t SoC, e.g. just CPUs and SoCs without chiplets Intel R&D into chiplet technology stacking presents it as a future possibility (Apple already uses it with two M1 max chips to M1 ultra)

ACM (Association for Computing Machinery) Turing Award is essentially Nobel Prize for Computer Science. Not applicable for me as awards largely for academic contributions like papers/reports published e.g data abstraction (Liskov substitution, Byzantine fault tolerance), parallel computing (OpenMP standard). In some sense, the modern day Booles and Babbages I’m more concerned with engineering feats in software products.

An unfortunate reality of open tech, AI being used to make paywalls ‘smarter’

Read a Google research project on removing noise in photos. Investigate source to test and am completely put off by the amount of dependencies involved: conda (why not just whole hog and docker), python, jax for TPU (python to tensor processing unit), external repositories This also applied to the ‘amazing’ AI image generator Stable Diffusion (I suppose high VRAM requirements also) Docker has uses in CI

AI for everything dogma is becomming more pervasive with ‘clusters’ to train model. Although Tesla can build a supercomputer to train, like all dogmas, not applicable to everything (readability, debugging goes down)

As Moore’s law is widening, i.e. was 2 years now 4 years, companies creating own hardware, e.g. YouTube chip to handle transcoding

sudo apt-get update sudo apt-get -y upgrade sudo apt-get -y install
gcc
make
pkg-config
apt-transport-https
ca-certificates

if ! [ -f /etc/modprobe.d/blacklist-nouveau.conf ]; then echo “nouveau is not blacklisted, doing so and rebooting”

# Blacklist nouveau and rebuild kernel initramfs echo “blacklist nouveau options nouveau modeset=0” >> blacklist-nouveau.conf sudo mv blacklist-nouveau.conf /etc/modprobe.d/blacklist-nouveau.conf sudo update-initramfs -u # NOTE: fter rebooting we need to run this file again sudo reboot fi

Check if nvidia driver is installed

if ! [ -f /usr/bin/nvidia-smi ]; then echo “nvidia driver is not installed, installing” # Install NVIDIA Linux toolkit 510.54 wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/NVIDIA-Linux-x86_64-510.54.run chmod +x NVIDIA-Linux-x86_64-510.54.run sudo bash ./NVIDIA-Linux-x86_64-510.54.run rm NVIDIA-Linux-x86_64-510.54.run fi

Install CUDA toolkit 11.3 Upgrade 1 for Ubuntu 20.04 LTS

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda-repo-ubuntu2004-11-3-local_11.3.1-465.19.01-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2004-11-3-local_11.3.1-465.19.01-1_amd64.deb sudo apt-key add /var/cuda-repo-ubuntu2004-11-3-local/7fa2af80.pub sudo apt-get update sudo apt-get -y install cuda

ssh-add -L or simply look in ~/.ssh directory (this is essential for private key)

seems that packaged things in the cloud aren’t all that flexible, e.g. ML-in-a-box cannot have independent components updated

annonyingly have to remove open source noveu driver also install cuda not from apt repository

Based on what was trained (LAION 400M internet scraped image-text pairs, which contains violent and sexual images (as oppose to DALLE-2)), output may bias, e.g. nerd might bias towards wearing glasses

Array Away

Matching Pairs

A quadratically scaling solution is intuitive However, as we know every match is unique, linearly scaling solution obtained with a hash map. C++ STL implementation of hash tables are sets (just keys) and maps Unordered variants are raw hash maps Ordered use self-balancing red-black-tree yielding logarithmic time Simplest hashing function (x >> 4 + 12) & (size - 1) Important to keep in mind we are executing on a physical machine and that Big-Oh is a ‘zero-cost abstraction’ world. For example, the extra overhead of introducing a hashmap (memory allocations/copies) will result in this being slower for small lists (also no dynamic memory allocations in ISR) This is why C++ STL uses hybrid introsort

Sorting Squared Array

Quadratic insertion/bubble sort preferable for small lists Loglinear divide-and-conquer merge/quick for medium Linear radix for large

Tournament Winner

In cases space and size parameters different Can join linear operations populate and min/max determination

Free Electives (36): * COMP8001 (elective) * COMP3331 (networks) * COMP9032 (microprocessors)

Disciplinary/major (96): 66 (core) + 30 (electives) Change to default COMPA1 (Computer Science) from COMPS1 (Embedded Systems) * COMP1511 (Programming Fundamentals) * COMP1521 (Computer Systems Fundamentals) * COMP1531 (Software Engineering Fundamentals) * COMP2521 (Data Structures and Algorithms) * MATH1081 (Discrete Mathematics) * MATH1131 (Mathematics 1A) * MATH1231 (Mathematics 1B)

IMPORTANT: FOR ALL TOPICS INCLUDED, HAVE SOME MENTION ON WALLPAPER

Local

Sap is a fluid that transports nutrients throughout tree Gum trees named as sap is gum like, as oppose to say resin like. Will also typically be smoothed bark Eucalyptus type of gum tree Eucalyptus oil droplets from the forests combine with water vapour to scatter short wavelength rays of light that are predominantly blue (ROYGBIV in descending wavelengths)

Australia 5 time zones: Christmas Island (UTC+7), Perth (UTC+8), Adelaide (UTC+10:30), Canberra (UTC+11), Norfolk Island (UTC+12)

Daylight savings incorporates the literal increase in sunlight to timezone So in Spring, the clock springs forward. We lose an hour, i.e 23 hours in that day

When using daylight saving time, will be AEDT as oppose to AEST, i.e. different time zones Adelaide, south-east onwards observe daylight savings WA not observed as large part of state is close to tropics, negating effect of tilt of earth

Gregorian calendar does not correspond exactly to one solar year. So, every 4 years Feburary has an extra day (28 to 29) known as a leap year. Also have leap seconds. Gregorian calendar primarily used, however Chinese calendar has each month start on a new lunar phase

Solar eclipse when sun is eclipsed by moon Lunar eclipse when moon is eclipsed by earth

Prime meridian is through Greenwich, England. Longitude is vertical lines, indicating east or west from prime meridian. International Date Line is when you cross over 180° longitude, i.e. from UTC+12 to UTC-12 or vice versa

Latitude is horizontal from equator. Places on the equator have equal time of daylight and night time

Tropics are at 23.5° They are at this amount as this how much they are tilted from the equator North line is cancer, south line is capricorn. The tropics are the region between the tropic lines. They are hot, due to the sunlight they recieve all year round So, tropics are actually hotter than equatorial regions Northern Artic and Southern Antarctic are latitude circles. They represent what areas of sunlight will hit. So in summer, will have 24 hour days, whilst in winter 0hour days.

Solstices mark the start of summer (longest daylight of year) and winter. Equinox is start of spring and autumn (equal number of daylight and night time)

Timestamp is calendar and time of day and UTC (Coordinated Universal Time) offset

Desktop

Chips

CPU contains a clock. Each tick marks a step in the fetch-decode-execute cycle. The signal will be sent along the address bus as specified by program counter and instruction or data will be returned along the data bus.

Von-Neumann has instructions and data share address space. The Von-Neumann bottleneck occurs when having multiple fetches in a single instruction, e.g. ldr Harvard has instructions and data with separate address spaces In reality, all CPUs present themselves as Von-Neumann to the user, however for efficiency they are modified Harvard at the hardware level, i.e. pipeline/cache stage. Specifically, will have separate L1 cache for instructions and data. (also have uOP cache considered L0) Therefore, when an architecture is described as Harvard, almost certainly modified Harvard.

Endianness only relevent when interpreting bytes from a cast Can really only see usefulness of Big Endian for say converting string to int Little Endian makes recasting variables of different lengths easier as starting address does not change

Although the instruction set supports 64bits, many CPUs address buses don’t support entire 16 exabytes. As we have no need for 16 exabytes (tera, peta, exa), the physical address size may be 39bits, and virtual size 48bits to save on unused transistors.

Direct-mapped cache has each memory address mapping to a single cache line Lookup is instantenous, however high number of cache misses. Fully-associative cache has each memory address mapping to any cache line. The entire cache has to be searched, however low number of cache misses. Set-associative cache divides cache into fully-associative blocks. An 8-way cache means the number of cache lines in a block. This is best in maintaining a fast lookup speed and low number of cache misses.

Cache sizes will stay relatively small due to the nature of how computers are used. At any point, there is only a small amount of local data the CPU will process next. So, beyond the empirical sweet spot of approximately 64MB, having a larger cache will yield only marginal benefits for cost of SRAM and die-area, i.e. law of diminishing returns.

Check if in L1. If not go check in L2 and mark least recently accessed L1 for move to L2. Bubbles up to L3 until need for memory access which will go to memory controller etc.

CISC gives reduced cache pressure for high-intensive, sustained loops as less instructions required. Instructions will have higher cycle count. Also typically a register-memory architecture, e.g. can add one value in a register and one in memory together (as oppose to load-store)

TDP (thermal design power) maximum amount of heat (measure in watts) at maximum load that is designed to be dissipated (bus sizes of chips smaller, so TDP getting lower) However, value is rather vague as could be measured on over-clock and doesn’t take into account ambient conditions

Clock frequency will often be changed by OS scheduler in idle moments or thermal throttling

Hardware scheduler allows for hyperthreading which is the sharing of execution units. Therefore, hyperthreading not a boon in all situations. AMD refers to this as simultaneous multithreading.

Microarchitecture will affect instruction latency and throughput byway of differening execution and control units. For Intel CPUs, i3-i7 of same generation will have same microarchitecture. Just different cache, hyperthreading, cores, die size etc.

Codec is typically a separate hardware unit that you interact with via a specific API HEVC (H.265; high efficiency video coding) newer version of H.264 VP9 is a open source Google video coding format

When people say vector operations, they mean SIMD. SSE registers are 128bits (4 bytes) XMM, AVX are 256bits (8 bytes) YMM

Average CPU die-size is 100mm². GPU much larger at 500mm² as derives more benefits from more control units, i.e. parallelisation Common transistor size is 7nm. Low as 2nm Silicon atom is 0.2nm Gleaning a Moore’s law transistor graph, see that average CPU a few billion transistors and high end SOCs around 50 billion transistors.

Memory model outlines the rules regarding the visibility of changes to data stored in memory, i.e. rules relating to memory reads and writes A hardware memory model relates to the state of affairs as the processor executes machine code: * Sequential Consistency: Doesn’t allow instruction reordering, so doesn’t maximise hardware speed * x86-TSO (Total Store Order): All processors agree upon the order in which their write queues are written to memory However, when the write queue is flushed is up to the CPU * ARM (most relaxed/weak) Writes propagate to other processors independently, i.e not all update at same time Furthermore, the order of the writes can be reordered Processors are allowed to delay reads until writes later in the instruction stream

A software memory model, e.g a language memory model like C++ will abstract over the specific hardware memory model it’s implemented on. It will provide synchronisation semantics, e.g. atomics, acquire, release, fence etc. These semantics are used to enforce sequentially consistent behaviour when we want it. However, using intrinsics, we can focus only on hardware memory model.

A cache controller implements cache coherency by recording states for each cache line. MESI is baseline used. Has states: * Modified: Only in this cache and dirty from main memory * Exclusive: Only in this cache and clean from main memory * Shared: Clean and shared amongst other caches * Invalid Intel uses MESIF (Forward same as shared except designated responder), while Arm uses MOESI (Owned is modified by possibly in other caches) Cache coherency performance issues are difficult to debug, e.g. one value changed in cache line invalidates it, even though another value in cache line remains unchanged

Legalities

Anti-trust laws don’t prevent monopolies, they prevent attempts to monopolise by unfair means, e.g. Microsoft browser market, Apple app store etc.

Technically, any digital work created is automatically protected by copyright. So, without a license, people would have to explicitly ask for permission to use

Permissive (MIT, BSD, Apache, zlib) gives users more freedom to say relicense, include closed source software, etc. They generally just enforce attribution Apache like MIT except must state what files you have changed

Weak copyleft (LPGL) applies to files of library not your entire codebase, i.e. must still release your version of the library used So, dynamic linking makes this easier for keeping your source closed If statically linking, must make a few extra steps to ensure the LGPL parts are available, e.g. publish object files

Copyleft (GPL) enforces the developers usage of the code. So, any derivative software must release whole project as GPL, i.e infects your software (and restricts choice of libraries to GPL) Subsequently encounter more licensing restrictions.

Creative commons licenses are composed of various attributes. The default is attribution. They are typically used in artworks, e.g. images, audio files Other elements are optional and can be combined together, e.g no derivative, no financial, must share under same license.

Bootstrap

UUID/GUID (universally/globally) 16 bytes. Typically generated by concatenating bits of MAC address and timestamp

UEFI firmware interface made to standardise interface between OS and firmware for purposes of booting

Called /dev/sd as originally for SCSI (small computer system interface; standards for transferring data between computers and devices) The preceding letter indicates the order in which it’s found, e.g. /dev/sda first found The preceding number indicates the partition number, e.g. and /dev/sda1, /dev/sda2

UEFI use of GPT (GUID partition table) incorporates CRC to create more recoverable boot environment over BIOS MBR (located in first sector of disk) Furthermore, UEFI has more addressable memory in 64-bit mode as oppose to only 16-bit mode Also, UEFI supports networking The ESP (EFI system partition) will have EGI entries that point to a UUID of where to boot one of these will be grub binary like shimx64.efi NOTE: The bootloader is the EFI OS loader and is part of the OS that will load the kernel

ACPI interface to pass information from firmware to OS. This firmware will have hardware information baked into it set by manufacturers

Formats

Inodes store file metadata. The metadata stored by an inode is determined by filesystem in use, except for filename which is never stored Typical metadata includes size, permissions, data pointer NOTE: FAT32 won’t store permissions, last modification time, no journaling or soft-links

A filename maps to an inode. Therefore a directory is a mapping of filenames to inodes

A hardlink references an inode, and is therefore impervious to file name change, deletion, etc. A softlink is to a file name

Journaling is the process of regularly writing operations that are to be performed in RAM to disk area of memory known as journal. then apply these changes to disk when necessary this overhead makes them slower, but more robust on crashes as can read journal to ascertain whether certain operations finished performing

FAT for ESP (because FAT simple, open source, low-overhead and supported virtually everywhere) vFAT is driver (typically for FAT32)

EXT4 for system (supports larger file sizes) (NTFS microsoft proprietary) Most filesystems will use a self-balancing tree to index files

Unix

Ubuntu distro as compared to debian more user friendly. For example, automatically includes proprietary drivers like WiFi, has PPAs to allow installation of 3rd party applications, and install procedure just works. Also updates more regularly than Debian and provides LTS, so know regular backports provided

Linux is a monolithic kernel, i.e. drivers, file system etc. are all in kernel space. So, more efficient, not as robust to component failure Windows uses a hybrid kernel, moving away from original microkernel due to inefficiencies Using Ubuntu generic kernel (could also have -lowlatency etc.) to not include a lot of modules in kernel to free up RAM usage

Xfce as default Ubuntu GNOME had bug with multiple keyboards. Furthermore, Xfce codebase was readable when inspecting X11 code. In addition, Xfce automatically provided GUI shortcut creation

.deb and .rpm are binary packages. Annoyances arise due to specifying the specific library dependency for each distro version Flatpaks and Snaps are containerised applications that include the specific libraries and runtimes AppImages combine the ‘shared’ libraries and runtimes of Flatpaks and Snaps into a giant file. This file can be copied and run on any distro If packages is being actively maintained, preferable to use .deb as faster and simpler

Linux DRM (direct rendering manager) -> X11 (display server) -> xfce (desktop environment)
Linux ALSA (advanced linux sound architecture) -> pulseaudio (sound server)

Fstab (File System Table) describes filesystems that should be mounted, when they should be mounted and with what options.

SystemV ABI: rdi, rsi, rdx, rcx, r8, r9 (6 integer arguments) xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6 (7 floating arguments) Remaining arguments pushed right-to-left on stack rax return and syscall number Stack 16-byte aligned before function call SSE2 is baseline for x86-64, so make efficient for __m128

A premptive scheduler will swap processes based on specific criteria. Round-robin means each process will run for a designated time slice CFS is a premptive round-robin scheduler. Time slices are dynamic, computed like ((1/N) * (niceness)) Processes are managed using a RB-Tree. Therefore, cost of launching a process or a context switch is logarithmic Kernel will have an internal tick rate that updates waiting threads. Lowering this will increase granularity however will increase CPU time and hence battery time as more time spent in kernel code. Windows scheduler uses static priorities, so one intensive process can dominate CPU

System level refers to inbetween kernel and userspace, e.g. network manager Systemd is a collection of system binaries, e.g. udev Primarily, systemd is a service manager A service extends the functionality of daemons, e.g. only start after another service, restart on failure after 10s etc. The kernel will launch systemd init service that will then bootstrap into userspace (hence alllowing for the aforementioned service features)

Kernel offers various methods of process isolation, e.g. chroot, cgroups etc. (chroot cannot access files outside its designated tree) A container will utilise one of these options provided by the kernel to acheive: * cannot send signals to processes outside container * has own networking namespace * resource usage limits

ELF (Executable and Linkable Format) Header contains type, e.g. executable, dynamic, relocatable and entrypoint A section is compile time: * .text (code) * .bss (uninitialised globals) * .data (globals) * .rodata (constants) A segment is a memory location, e.g .dseg (data), .eseg (eeprom) and .cseg (code/program)

Typically stored as crt0.s, this will perform OS specific run-time initialisation. The conditions assumed here will be outlined in ABI, e.g. argc in rdi Some functions include setting up processor registers (e.g. SSE), MMU, caches, threading ABI stack alignment, frame pointer initialisation, C runtime setup (setting .bss to 0), C library setup (e.g. stdout, math precision) The program loader will load from Flash into RAM then call _start (which is crt entrypoint)

Components

Storage device sizes are advertised with S.I units, whilst OS works with binary so will show smaller than advertised (1000 * 10³ < 1024 * 2¹⁰) Also, storage device write speeds are sustained speeds. So, for small file sizes expect a lot less

A flip-flop is a circuit that can have two states and can store state. Various types of flip-flops, e.g. clock triggered, data only etc. A latch is a certain type of flip-flop. Called this as output is latched onto state until another change in input.

Registers and SRAM stored as flip-flops. DRAM is a single transistor and capacitor

SRAM (static) is fast, requires 6 transistors for each bit. So, 3.2billion transistors for 64MB cache. Sizeable percentage of die-area SRAM more expensive, faster as not periodically refreshed.

DRAM (dynamic) is 1 transistor per bit refreshed periodically. SDRAM (synchronises internal clock and bus clock speed). SDRAM. LPDDR4 (low-power; double pumping on rising and falling edge of clock, increasing bus clock speed while internal typically stays the same, amount prefetched etc.)

DIIM (Dual In-Line Memory Module) is form factor with a wider bus SODIMM (Small Outline)

CAS (Column Address Strobe), or CL (CAS latency) is time between RAM controller read and when data is available. RAM frequency gives maximum throughput, however CL affects this also. In addition, RAM access is after cache miss, so direct RAM latency is only a percentage of total latency as time taken to traverse cache and copy to it.

NAND and NOR flash are two types of non-volatile memory NOR has faster read, however more expensive per bit and slower write/erase NOR used in BIOS chips (firmware will be motherboard manufacturer, e.g. Lenovo) A NAND mass storage device will require a controller chip, i.e. a microcontroller How the controller accesses the NAND flash, i.e. the protocol under which its Flash Translation Layer operates, will determine what type of storage it is: * SD (secure digital) * eMMC (embedded multimedia card): Typically SD soldered on motherboard (For SD/MMC protocol, will have a RCA, i.e. Relative Card Address for selecting card) * USB (universal serial bus) * SSD (solid state drive): Parallel NAND access, more intelligent wear leveling and block sparring 3D VNAND (Vertical) memory increases memory density by vertically stacking NAND flash

Form factors include M.2 keying and PCIe (Peripheral Component Interconnect) Interface includes SATAIII, NVMe (non-volatile memory host controller) and PCIe SATA (Serial Advanced Technology Attachment) SSD is the lowest grade SSD. A single form factor may support multiple interfaces, so ensure motherboard has appropriate chipset

Each CPU socket has memory banks that are local to it, i.e. can be accessed from it directly. NUMA (non-uniform memory access) means that accessing memory from a non-local bank will not be the same speed. A NUMA-aware OS will try to mitigate these accesses.

RAID (redundant array of independent disks) is method of combining multiple disks together so as to appear like one disk called an array. Various types, e.g. RAID0 (striping) some parts of file in multiple disks, RAID1 (mirroring) each disk is duplicate so could give speed increase etc.

Battery will have two electrodes, say lithium cobalt oxide and graphite. When going through a charging/discharging cycle, ions move between electrodes. So, charging cycles will affect the atomic structure of the electrodes and hence affect battery life.

Circuits based on conventional current, i.e. + to - Cathode is terminal from which conventional current flows out of, i.e. negative

LiPo (lithium-ion polymer) uses polymer electrolyte instead of liquid. Standard lithium-ion has higher energy density, cheaper, not available in small sizes and more dangerous due to liquid electrolyte LiPo more expensive, shorter lifespan, less energy, more robust LiPo battery is structured to allow a current to be passed to it to reverse the process of oxidation (loss of electrons), i.e. is rechargeable

Battery 51Watt/hr, which is A/hr * V is not a fixed value, e.g. 1A/hr could run 0.1A for 10 hours

Petrol cars still use lead-acid as they have lower internal resistance and so can give higher peak current then equivalent LiPo (just not for as long)

HDMI(High Definition Multimedia Interface)-A, C (mini), D (micro) carry audio and visual data DisplayPort has superior bandwidth to HDMI

3.5mm audio jack (3 pole number of shafts (internal wires), 4 pole for added microphone)

IEC (International Electrotechnical Commission) power cords used for connecting power supplies up to 250V, e.g. kettleplug, cloverleaf

Touch screen types need some external input to complete circuit Resistive works by pressure pushing down plastic<-electric coating->glass Unresponsive, durable, cheap Capacitive contains a grid of nodes that store some charge. When our finger touches charge flows through us and back to the phone, changing the electric current read. We are good conductor due to impure water ion in us. So, things electrically similar to our fingers will work also like sausages, banana peels

1080i/p 1080 references vertical height in pixels Interlaced means display even and odd rows for each frame. Due to modern high bandwith not used anymore. Progressive will display each row sequentially for a given frame

4k means horizontal resolution of approximately 4000 pixels. standard different for say television and projection industry, e.g. 3840 pixels

Screen density is a ratio between screen size and resolution measured in PPI (Pixels Per Inch)

A voltage applied to ionised gas, turning them into superheated matter that is plasma. Subsequent UV is released.

LCD (Liquid Crystal Display) involves backlight through crystals. IPS (In-Plane Switching) and TFT (Thin Film Transistor) are example crystal technologies. For an LED monitor, the LED is the backlight, as oppose to a fluorescent. However still uses LCD backlight, so really LED LCD.

Quantum science deals with quanta, i.e. smallest unit that comprises something They behave strangely and don’t have well defined values for common properites like position, energy etc., e.g. uncertainty principle A ‘quantum dot’ is a semiconductor nanoparticle that has different properties to larger particles as a result of quantum mechanics. QLED/QNED (Quantum NanoCell) adds a ‘quantom dot’ layer into the white LED backlight LCD sandwich.

OLED is distinct. It produces own light, i.e. current passed through an OLED diode to produce light. LTPO (Low Temperature Polycrystalline Oxide) is a backplane for OLED technology.

E-ink display uses less power than LCD as only uses power when arrangment of colours changes.

HDR (High Dynamic Range) and XDR (Extreme Dynamic Range) increase ability to show contrasting colours.

5.1 means 5 speakers, 1 subwoofer In order of ascending levels of audible frequencies 20Hz-20000Hz have devices woofer, subwoofer, speaker and tweeter.

Phone

ARM core ISA e.g. ARMv8 Will then have profiles, e.g. M-Profile ARMv8-M R-Profile for larger systems like automotive

ARM holdings implements profile in own CPU, e.g. Cortex-A72 This is a synthesisable IP (Intellectual Property) core sold to other semiconductor companies who make implementation decisions like amount of cache from 8-64kb specification.

However, other companies can build own CPU from ISA alone, e.g. Qualcomm Kryo and Nvidia Denver

Then have actual MCUs e.g. STMicroelectronics, NXP or SoCs e.g. Qualcomm Snapdragon, Nvidia Tegra

big.LITTLE is heterogenous processing architecture with two types of processors. big cores are designed for maximum compute performance and LITTLE for maximum power efficiency

FFT divides samples (typically from an ADC) into frequency band A logarithmic scale typically employed to account for nonlinear values whereby a greater proportion in high frequency band

DSP instructions may include transforms like FFT, filters like IIR/FIR (Finite Impulse Response; no feedback) and statistical like moving average

With the addition of 64bit extension, ARM retroactively called aarch32. Possible instruction sets include Thumb-1 (16-bit), Thumb-2 (16/32-bit), aarch32 (32bit instructions), aarch64 (32bit instructions), Neon, MTE (Memory Tagging Extension), etc. aarch64 does not allow Thumb instructions

FPU is a VFPv3-D16 implementation of the ARMv7 floating-point architecture VFP (Vector Floating Point) is floating point extension on ARM architecture. Called vector as initially introduced floating point and vector floating point. Neon is product name for ASE (Advanced SIMD Extension), i.e. SIMD for cortex-A and cortex-R (more recent is SVE (Scalable Vector Extension)) Helium is product name for MVE (M-profile Vector Extension), i.e. SIMD for cortex-M

RSA (Rivest-Shamir-Adleman) is asymmetric, i.e. public and private key. Much slower than AES AES (Advanced Encryption Standard) is symmetric, i.e. one key SHA (Secure Hash Algorithm) is one-way and produces a digest

MPU (Memory Protection Unit) only provide memory protection not virtual memory like an MMU (Memory Management Unit)

Built atop Android OS, many phones will implement own custom OS, e.g. Huewei EMUI, Samsung One UI The ART (Android Runtime) is the Java Virtual Machine that performs JIT bytecode compilation of APK (Android Package Kit)

EABI (Embedded) is new ARM ABI, renamed as it suits the needs of embedded applications. An EABI will omit certain abstractions present in an ABI designed for a kernel, e.g run in priveleged mode From the calling convention part of ABI we can garner number of arguments until stack usage and alignment requirements.

AAPCS (Arm Architecture Procedure Call Standard): r0-r3, rest stack s0-s7 Stack 4-byte aligned, if on function call, 8-byte aligned

Shader is a GPU program that is run at a particular stage in the rendering pipeline Nvidia GPU cores named CUDA cores. AMD calls them stream processors. ARM shader cores So, CUDA is a general purpose Nvidia GPU program that can utilise GPU’s highly parallised architecture OpenCL whilst more supportive, i.e can run on CPU or GPU, does not yield same performance benefits Renderscript is android specific heteregenous in that it will distribute load automatically OpenGL has a lot of fixed function legacy (now shader based) and drivers rarely follow the standard in its entirety OpenGL ES (Embedded Systems) is a subset Vulkan is low-level that more closely reflects how modern GPUs work

Flux is an arbitrary term used to describe the flow of things, e.g. photon flux, magnetic flux

Lumens is how much total light is produced by an emitter Candela is intensity of light beam produced by an emitter Lux is how much light hits a recieving surface Nit is how much light is reflected off a surface and so is what our eyes and cameras pick up. A display will be in nits as its a recieving object as oppose to the backlight LEDs A higher nit display is more easily viewable in a wider array of lighting conditions, e.g. will combat the sun’s light reflecting off the surface in an outdoor setting Brightness is subjective, and therefore does not have a value associated to it

Waves

Sound Waves (20Hz-20kHz) Ultrasonic are sound waves not audible by humans SONAR (Sound Navigation And Ranging) used in maritime as radio waves largely absorbed in seawater due to conductiveness

Radio Waves (10Hz-300GHz) Microwaves make up the majority of the spectrum of radio waves (300MHz-300GHz) They are divided into bands, e.g. C-band, L-band, etc. RADAR (Radio Detecting And Ranging) encompasses microwave spectrum Higher frequency results in higher resolution than SONAR. Also as EM wave, much faster transmission rate. Allowing for long range transmission, radio waves bounce off ionosphere (where Earth’s atmosphere meets with space)

Infrared Waves (300GHz-300THz) Can be used in object detection. Heat is the motion of atoms. The faster they move, the more heat is produced Approximately 50% of solar radiation is infrared.

Visible light LIDAR (Light Detection And Ranging) (lasers; weather dependent) higher accuracy and resolution than radar, lower range

Ultraviolet UVA has longer wavelength, associated with skin ageing UVB associated with skin burning UVB doesn’t pass through glass, however UVA does UVC is a germicide SPF (Sun Protection Factor) is how many times longer it takes to burn than with no sunscreen. However, UV can still get through and sunscreen is water-resistant, not waterproof

Wireless Protocols

ISM (Industrial, Scientific and Medical) bands (900MHz, 2.4GHz, 5GHz) occupy unlicensed RF band. They include Wifi, Bluetooth but exclude telecommunication frequencies

GNSS (Global Navigation Satellite Systems) contain constellations GPS (US), GLONASS (Russia), Galileo (EU) and Beido (China) They all provide location services, however implement different frequencies, etc.

GSM (Global System for Mobile Communications) uses SIM (Subscriber Identification Module) cards to authenticate (identity) and authorise (privelege) access

4G (Generation; 1800MHz) outlines min/max upload/download rates and associated frequencies. Many cell towers cannot fully support the bandwidth capabilities outlined by 4G. As a result, the term 4G LTE (Long Term Evolution) is used indicate that some of the 4G spec is implemented. More specifically have 4G LTE cat 13 to indicate particular features implemented.

SMS (Short Message Service) are stored as clear text by provider SS7 (Signaling System Number 7) protocol connects various phone networks across the world

Between protocols, tradeoffs between power and data rate IEEE (Institute of Electrical and Electronic Engineers): * 802.11 group for WLANs (WiFi - high data rate), * 802.15 for WPANs; 802.15.1 (Bluetooth), * 802.15.4 low data rate (ZigBee, LoRa, Sigfox)

Wifi, Bluetooth, ZigBee are for local networks. LoRa is like a low bandwidth GSM LoRa (Long Range) has low power requirements and long distance. AES-128 encrypted by default. LoRa useful if only sending some data a few times a day. LoRa has configurable bandwitdh, so can go up to 500KHz if regulations permit Lower frequency yields longer range as longer wavelength won’t be reflected off objects. Will be called narrowband Doesn’t require IP addresses. LoRaWAN allows for large star networks to exist in say a city, but will require at least 1 IP address for a gateway Sigfox uses more power.

A BLE (Bluetooth Low Energy) transceiver only on if being read or written to GATT (Generic Attribute Profile) is a database that contains keys for particular services and characteristcs (actual data) When communicating with a BLE device, we are querying a particular characteristic of a service

A QR (Quick Response) code is a 2D barcode with more bandwidth. Uses a laser reader. RFID (Radio Frequency Identification) does not require line-of-sight and can read multiple objects at once. Uses RFID tag. NFC (Near Field Communication) is for low-power data transfer. Uses NFC tag

TV standards Americas: NTSC (30fps, less scanlines per frame) 4.4MHz Europe, Asia, Australia: PAL (Phase alternate line) (25fps) 2.5MHz

Sensors

MEMS (Micro Electro Mechanical Systems) combines mechanical parts with electronics like some IC, i.e. circuitry with moving parts. e.g. microphone (sound waves cause diaphragm to move and cause induction), accelerometer, gyroscope (originally mechanical)

On phone, many sensors implemented as non-wakeup. This means the phone can be in a suspended state and the sensors don’t wake the CPU up to report data

Accelerometer measures rate of change in velocity, i.e. vibrations associated with movement (m/s²) So can check changes in orientation. It will have a housing that is fixed to the surface and a mass that can move about. Detecting the amount of movement in the mass, can determine acceleration in that plane.

Gyroscope measures rotational acceleration, unlike an accelerometer which is unable to distinguish it from linear (rad/s²) Gyroscope resists changes to its orientation due to intertial forces of a vibrating mass. So can detect angular momentum which can be useful for guidance correction.

IMU (Inertial Measurement Unit) is an accelerometer + gyroscope + magnetometer (teslas) The magnetometer is used to correct gyroscope drift as it can provide a point of reference

Quartz is piezoelectric, meaning mechanical stress results in electric charge and vice versa. In an atomic clock, Caesium atoms are used to control the supply of voltage across quartz. This is done, in order to keep it oscillating at the same frequency.

NTP (Network Time Protocol) is TCP/IP protocol for clock synchronisation. It works by comparing with atomic clock servers

Dalgo

Programmer (for this, have wallpaper list projects)

PIC (Position Independent Code) can be executed anywhere in memory by using relative addresses, e.g. shared libraries The process of converting relative to absolute, i.e. query unresolved symbols at runtime adds a level of indirection

REST (representational state transfer) is an interface that outlines how the state of a resource is interacted with On the web, an URL is an access point to a resource A RESTful API will have URLs respond to CRUD requests in a standard way: * GET example.com/users will return list of resource, i.e. all users * POST example.com/users will create a new resource * GET example.com/users/1 return single resource * PUT example.com/users/1 update single resource * DELETE example.com/users/1 delete single resource

OAuth (open authorisation) is a standard that defines a way of authorising access OAuth offers different functionality than SSH by having the ability to ‘scope’ access Typically used by RESTful services These endpoints described in ‘discovery document’: 1. authorisation-server -> authorisation-code 2. authorisation-code -> access token, refresh token 3. resource-server -> resource

RFC (Request For Comments) documents contain technical specifications for Internet techologies, e.g. IP, UDP, etc.

Udp (head of line blocking) + client server (p2p unreliable as internet path optimised for cost/closest exchange point) + dedicated (peoples home Internet don’t normally have high upload rates); Mix of cloud (flexible to just turn up and down, high egress bandwidth charge) and bare metal (fixed bandwidth rate set into price) Matchmaking, host migration difficult as hard to measure what user has good connection,
e.g. Whats there NAT type?

MCU

CMOS technology allows the creation of low standby-power devices, e.g. non-volatile CMOS static RAM

Various synthetic benchmarks indicative of performace, e.g. DMIPS (Dhrystone Million Instructions per Second) for integer and WMIPS (Whetstone) for floating point

SPDIF (Sony Phillips Digital Interface) carries digital audio over a relatively short distance without having to convert to analog, thereby preserving audio quality.

The polarity of the magnetic field created by power and ground wires will be opposite. So, having the same position in each wire line up will reduce outgoing noise as superposition of their inverse magnetic fields will cancel out. Furthermore, incoming noise will affect each wire similarly Coaxial has the two conductors share an axis with shielding outside. Twisted pair wire is a cheaper way of implementing coaxial Glass fibre optic does not have this issue.

On startup, copy from Flash to RAM then jump to reset handler address No real need for newlib, just use standalone mpaland/printf Some chips have XIP (execute-in-place) which allows for running directly from flash

QI is a wireless charging standard most supported by mobile-devices for distances up to 4cm FreePower technology allows QI charging mats to support concurrent device charging

Chrom-ART Accelerator offers DMA for graphics, i.e. fast copy, pixel conversion, blending etc.

5ATM is 5 atmospheres. 1 atmosphere is about 10m (however calculated when motionless) 50m for 10 minutes

MIDI (Musical Instrument Digital Interface) 3 byte messages that describe note type, how hard pressed and what channel Useful for sending out on MCU FRAM (ferroelectric) is non-volatile gives same access properties as RAM

RENDERING: rendering is the process of creating the 2D/3D model, i.e. the drawing onto the monitor ray tracing solves transparency issues, it’s just substantially slower than standard projection rasterisation (so future is ray tracing)

A single point of light from the sun hitting a single object will reflect in multiple straight line directions/paths

rasterisation is taking polygon and converting to pixels eventually will have to convert say 3D position to 2D (as all rendering is fundamentally this)

The Scripting Awakening

I decided to try and implement ctime in Bash for pedagogical purposes. My first task was to write and read a binary file. Googling how to do this in Bash returned the consensus, “use another language”. Often when I read this, I’m not deterred. I have encountered similar naysayers before when it comes to directly using Xlib. However, when it came to wanting structures to read and write to (essential for binary work), I found Bash was empty. With this understanding, I realised more generally that the usefulness of scripting languages are limited. Specifically, they should be limited to basic tasks that involve searching or external tool usage. I suppose my biggest use case of Bash is for enhancing my terminal (.bashrc) and vim (:.!) interactions. C is a simple language and if you understand it’s low-level capabilities, you can do many things. However, despite this recognition, I did come upon certain procedures to follow when working with Bash scripts.

# Checking arguments as callee
[ ! $# -ge 3 ] && return 1
[ "$1" = "-arg" ]
# Checking arguments as caller
func || exit 1
# Function returning value
_FUNC_NAME=val
# Informative usage information
printf "App v1.0 by Ryan McClue\n" >&2
printf "Usage:\n" >&2
printf "app -arg <value>\n" >&2

The diverse software landscape of Linux distros means that we don’t know what version of each

STM32CubeMX (this justs generates code, IDE is full fledged) to download, must get link with email. STM32CubeIDE is woefully slow. Maximising to full screen just blurs out half Once installed, a series of pop-up menus just keep appearing spontaneously as it has to download more to satisfy a simple create project To download qtcreator well known bug that it selects the wrong mirror, 3MBps to 30kbs. Have to decipher command line arguments and mirror parsing, e.g. ./qt-unified-linux-x64-4.3.0-1-online.run –mirror http://ftp.jaist.ac.jp/pub/qtproject

My god, installing cubemx is awful. require email. password setting for account fails.

QTcreator does not honour system .gdbinit file, have to manually set breakpoints and dissassembly flavour QTCreator doesn’t show console output. Tick ‘run in terminal’ crashes on startup

Before even using the program, installing is bad. Install pandoc. It requires a pdf engine, e.g. xelatex that must be separately installed Which name is not found with traditional sudo apt install xelatex, it’s in texlive-xetex and is 100MB… Searching for the appropriate package name to install a missing dependency is what I use ubuntu stack exchange mostly for…..

sound output varies significantly across different hardware. so, to detect possible sound bugs we may need to invest in a good piece of sound hardware

in linux, x11 is mixer for drm, pulse a mixer for alsa (more flexibility with pulse, e.g. can disable it if only needing one application at time, or can redirect it to another sound card) audio is a mess in linux

due to disconnected nature of x11 and scheduling, must vsync. sleeping isn’t an option due default to round-robin scheduling policy (switching to real-time scheduling is not suitable for general-os). we could improve by setting niceness (however must be sudo) as not real time, we can’t control a lot of things, e.g. could be USB latency, adobe photoshop in background, etc. on say raspberry pi could program GPIO pins to have a accurate poll for a joystick.

sound on linux is absurd (tribal knowledge). even the people on the mailing list don’t know how it works. constantly saying, don’t do this, use a library… often times, source code is the documentation, so good luck spending months on how to operate low level. return when threading is involved.

x11 is more complicated than it needs to be. like many OS commands, it should do a lot of the dirty work for you, as no-one knows all the minutiae. it’s bad to think that OSs are getting harder to work with. however, it is unfair to pick on X11, as their are other apis (even modern) that are just as bad, even worse. x11 is particular annoying in that if it crashes, even launching a virtual terminal may not work as keys aren’t registered

if only eclispe cdt debugger was good. it terminates for no reason when running (can sometimes be fixed by switching to run mode and then back to debug). sometimes have to restart to fix. run-to-line doesn’t work when inside sub-routine. memory browser crashes on entering address. doesn’t give information on stack overflows.

firefox will unexpectanctly say we updated in the background and you must restart or tab crashed etc.

Online help forums for QTCreator useless. In reality, only core devs would be able to answer your question thoroughly So, this is where I see the benefit of mailing lists

Often Ubuntu audio mixer just stops working and generates fuzz and have to restart

Firefox facebook video chat. For some reason just mysteriously stops working. says to restart, did that work, nope! both rev every fucking second is probably why.

Unfortunate just resign yourself to choosing the software with fewest or manageable ‘quirks’ (bugs to offensive…)

How on earth can something so frequently used like TeX have such poor/cryptic parsing error information Furthermore, pandoc doesn’t treat text as text. If different file extension, will perform different parsing e.g. inserts ‘hidden’ YAML header if markdown file

Tim Berners-Lee envisioned a universal place for information exchange. Unfortunately the freedom of creativity has resulted in much of the Web being intrusive and hindering

Have so many Ubuntu variants when in reality just different packages installed (possibly different kernel parameters), e.g. mate, xubuntu, lbuntu, studio

Go on a news website and try to watch a video. after watching an obtrusive ad, video starts, skip to 30% of way, the same ad repeats…

oscilloscope default noise is mains (200MHz, 1Gsamples/sec as oppose to multimeter which is maybe 10samples/sec so really only applicable for perhaps a logic gate or 0.1hz square wave)

Ensure BNC connector is plugged in correctly (affect probe compensation; similar to banana plugs in multimeter not being plugged in correctly)

IMPORTANT: ensure offset dials are correct first, i.e. at 0 so 0 is centre with menu, end-arrows can still be pressed even if not visible change to 24mega points for memory when just wanting wave length (not zooming in?)

Continous triggering enables us to view from start point, i.e. static image not free flowing. Will show before and after trigger point, i.e. start in centre of screen

single-shot triggering contact bounce first then ringing as stablises (makes it a balancing act between selecting triggering level for button press and release) verify signal ringing (e.g. clock signal), i.e. inspect ramp-up/down (measure time to completely bottom out)

Normal-mode triggering the best of both worlds (will show black screen unless triggered)

No real issue with oscilloscope blowing up if testing battery powered/isolated DC power supply. With USB powered, ground must be on ground! Otherwise will short USB and that port will probably break

Using RS232 (recommended standard) decoder functionality (there is also SPI/I2C decoding). https://www.youtube.com/watch?v=SarsWOCMvjg&t=76s Also investigate PWM

math -> decoder on; event table on (make sure zoomed out enough to view multiple packets (this will increase memory automatically?); increase baud also)

Multimeter measure power consumption of MCU? (stm32 nucleo boards have convenient IDD jumper)

Resilience

programming in AVR assembly actually made me think fondly of modern technology (a rare feeling indeed)

TACKLING DIFFICULT PROBLEM – HOTLOADING FILE MODIFICATION TIME BEING READ TOO EARLY. MISINTERPRETING TIMESPEC NANOSEC

linux raw ALSA can fall into the trap of being so niche and difficult it is neck beard inducing

use of set -x in bash scripts will exit if subprocess returns an error, even if 2>/dev/null. Cryptic!

polling multiple keyboards lag due to bug in gnome. inspect gnome git switched to desktop environment to lxfe 4. also makes creating shortcuts a breeze

Pragmas and binary searching codebase to check where compiler optimises routine wrong

copy-pasta from UDP socket attempting to access like a TCP. doesn’t fail, just hangs on read()

break inside of for loop thinking inside of switch having a assignment instead of comparison inside of if unsigned - value causing overflow thereby giving larger than actual mul instruction copying to r0, r1

compiler giving wrong storage class for function, even though the issue was an unmatched closing parentheses. example of earlier syntax bug, giving other false-errors

ALWAYS ENABLE ADDRESS SANITISER! (look at niagara user github page) u8_cursor += byte_counter SHOULD-BE u8_cursor = file_mem + byte_counter;

Having -O2 made program crash as was using memory of stack that in debug mode was never modified

Rage Against the Machine

instead of feature, we introduce a patented terminology like user stories to have someone teach you them. slows down development business logic …. just means high-order operations in say main()

Incessent unit-testing, why not test startup assembly then. Falls apart… What not to test, e.g. assume that hex_to_bin() simple enough to work? Introducing formulas to determine whether or not to automate something….

certain ‘design pattern’ enforce really long names to conform to pattern. any competent programmer can read use-case specific functions with clearer names “test_CommandHardware_CheckForMsg_Should_GetCharAndAddToPacker_When_BytesAvailable”

How can software better serve humanity? e.g. bloatware causes slowness for aus post workers

With UML if doing sufficiently complex to the same level as a blueprint, may as well have just written the program the idea of iterative design would be followed by literal architects, however too costly and time consuming. with software, we have the ability to do this. 1. design (urban planner): separation of code (mental clarity and division of labour) design metrics are temporal coupling (physics outputs data to renderer, so format is important), layout coupling (renderer inherits from opengl), idealogical coupling (threading, width, memory), fluidity (one change to system causes a major crash) 2. programming (architect) 3. compiler (builder)

Most design patterns are just utility classes rather than a way to architect a program

I don’t want to fight the language (Java). Higher level languages should allow you to easily express cpu instructions

reject the idea of TDD driving good design. accept that tests validate design. tdd and bdd have good elements in them, but the dogma is not effective.

in general we don’t add security threats that weren’t already present, e.g. loading from shared object could just as easily override binary if have write priveleges to both

even though a ‘new’ language won’t crash, it can still have buffer overruns. NullPtrExceptions

c++ struct functions implicitly have a this pointer. virtual functions will result in a vtable (array of function pointers) to be generated for that struct. therefore, virtual function will first go to vtable, then lookup in function, so double indirection (so not a zero-cost abstraction) normal function call just call 0x1234, however with vtable mov rax, qword ptr [rsp + 20] etc. (dereferencing pointers)

an engine makes things that aren’t likely to be difficult easy (except for linux…) important to know low-level to write new tools. we don’t have a wheel yet.

for any non-trival task, scripting languages become a hindrance with no static type checking, no real debugger, slow, not as capable. there are complications in software we have wrongly convinced ourselves are necessary, e.g. scripting for hotloading hotloading C is far superior, as C is more powerful and can use same debugger (using Lua is a downgrade) build systems can be useful as they allow for incremental builds (however, in negatively reinforces people to only make small changes) (speed increase may only be noticeable for large, complex code base) they are also useful for managing cross compilation (libraries have to be pulled in and compile also) the idea of incorporating a scripting language into a game was a failed experiment in 2005s. things like a visual based interface is fine as it is constrained

with closed source it is often the case that a company employs someone to oversee the experience of the software and have q+a therefore, better quality with this higher layer of checking than open source.

the best bet in safeguarding security is to reduce the attack surface, which means to reduce the number of lines of code.

Don’t restrict right side of bell curve Let your aces be aces Being an ace involves having an opinion Most influential software written largely by one person, e.g Linux, Unix, git etc. Then a team is assigned to maintain it. Fallacy about solo programmer productivity requiring large teams. Design by committee pushes design to middle of bell curve as opposing views average out

templates add complexity in the debugger (no actual names). only really useful if you save a ton of code (not much code to just write each implementation) if templates are necessary then just use meta-programming as it is much more powerful.

many programs treat memory as an infinite resource. allocating memory is introducing a failure case and making not a fan of allocation festivals. we can create our application with minimal failure cases (cannot do this with the platform layer).

uml and diagrams in general are a waste of time (its just code you would write and often fails to capture subtleties) you should become more proficient at reading code and understanding its relationship.

oh no, we had a security bug in our development version! (printf and friends. printf %f defaults to double)

understanding history is important; c runtime library way of packing return fail information is the reason for inverse truthniess

downloading unverified external tool, always good to get more viruses on my machine… unless playing at EVO with some maxed out razer device, not feasible to hit that hard I guess RTFM is the answer to that…

build tools are more of a hindrance! always asking yourself what flags are being passed (linker and compiling separate steps), what files are it picking up, what is the CWD, etc.

Go against merge requests from strangers and just auto let a group of trusted people. This avoids the problem where you work for hours only to have it blocked

Many online communities are anti-engineering in that they don’t embrace criticism.

Do anything on web takes a lot longer than it should dealing with a myriad of software with different odd conventions. Lack of functionality/integration with hardware will lead to collapse Many features lacking like type system, try to emulate. Many features have like garbage collection try to avoid.

Scripting languages can cause heap fragmentation. Why just use a real language as we want robustness (scripting languages dependent on interpreter speed) and type checking

Fundamental lack of awareness that there is a better way to program. We all make slow because of lack of time however to say it can’t be done is a fallacy. The cultural differences with these people make it a fool’s errand to try and get these people to program correctly e.g. time visual studio users think is fast is less than 10 seconds?!

Const rarely finds bugs that I have, i.e..writing to a variable a shouldn’t. In saying that, you should use features of language that helps you catch bugs.

Apple store is hardly a free marketplace. They can just block your app for any reason

Also, the excessive testing is pushed by web where the poor languages dictate heavy testing. Testing first makes no sense as the app may change

Much like the food industry has organic vs processed, we need a term for games that are made by people who love games and care about the experience as opposed to large companies concerned with making money (indie vs triple a)

Sometimes crashing is good because it signifies the serious problem that can be rectified immediately rather than some other Insidious hidden bug The problem is not mapping id/pointer (or whatever std:: c++ people would use) to an entity correctly (i.e. correspondence problem). the symptom of this problem will be different depending on the implementation, e.g. pointer will crash program.

Crazy nuttiness of command line parsing UNIX. No one remembers single commands except privileged ls. Plus sign turning off, minus on, etc.

The flip side of high level languages is loss of capability in controlling the cpu Run time languages are slower and more complex Most scripting languages aren’t designed for modern hardware, e.g. simd, threads

Scripting once, deploying everywhere is broken. You must test what you write on each machine. Computers aren’t wonderful abstractions we wish they were

C was created to solve the problem of a portable high level assembly language for UNIX. C++ is a frankenstein languAge with bjarne just adding features in, e.g. he just wanted c with classes. Go and rust were designed for specific purposes, although I don’t care about safe memory features or want garbage collection. Complexity creates bugs. C++ incredibly complex. People just stick to a subset of c++, which may be different to yours. Exceptions defer error handling and also bring about an ethos of more errors when they should just be states of your program.

roughly 90% of games played on PC games are pirated. however intensive anti-piracy makes it harder for people to play your game yet, in line with this is, there is no code of ethics in computer science for wasting peoples time. so, get more respect from community? drm just introduces another way for the software to waste people’s time (however, may come a day when it’s required…)

Successful ideas are higher level languages to assembly and compile time type checking Success is solving a problem people have talked themselves out of solving Why does Twitter need 4000 employees? Spacex is roughly the same and the put rockets on mars! They make problems difficult by the engineering culture they create. 18000 classes overflowed java function pointer buffer, they are deluded in thinking this way Over concerned with Uml, state diagrams, acronyms, etc. Nightmarish distractions/unproductive

High level don’t worry about implementation details - chrome c++ entering a 1 char created 25000 memory allocations with std::strong Abstractions - more code makes program as a whole more difficult to understand. Abstractions are just hiding low level which is the most important part and will constrain the system. Java huge call stack to make a http request. Make lightweight abstractions only Functional programming - Haskell around since 1990 hasn’t taken over. Imperative more clear. Functional breaks down over non trivial work Data hiding - poor cache performance, redundant code Excessive inheritance - poor cache performance Exceptions - constant cognitive tax on remembering what throws what and huge verbosity doing so. Also don’t know what program will throw during some time Commenting - write good comments and garden them as they can rot easily Re-use - only effective when used to small extent, humans are bad at understanding layers upon layers. You don’t make something better by making it more complicated All aforementioned ideas can be contextually useful

The backbone of web tcp/ip has scaled tremendously well over 60 years. The web software stack is opposite. Browsers rev every 15 minutes, everything is worse than it was before, JavaScript slower than anything before it. Nodejs, php etc. Bourne out of good idea to mAke a particular type of software development rapid. However they fail to effectively utilise system resources to solve a problem that would be easy in C. Often people are only concerned with how fast they can write this, without caring about quality. In the web sphere google, Facebook etc. Know their competitors aren’t going to be concerned with quality, e.g. gmail is incredibly slow and littered with bugs (typing in a name gives wrong result as it hasn’t finished time to server) so they use these technologies that help to get something working quickly but will be janky. I want to write software where quality matters and software is enjoyable to use. Web needs to acknowledge this. They have different thinking, e.g. network latency so we don’t have to care about performance, no you need more effort to avoid this latency! Although different tradeoffs made for different contexts, on a whole programming is programming. The web is almost impossible to write good software as have to deal with complexities that don’t have to be there. Best minds are being put to make people click ads

Why is it big news regarding command prompt. Battlefield orders of magnitude complex and runs faster Msdos could see all physical memory; Modern cpus have mmu

must be able to judge the quality of something from your own criteria, not simply on social cues/norms

memory safety not of that much concern for programming. however, for security I suppose it is? wish security wasn’t an issue https://github.com/KULeuven-COSIC/Starlink-FI/

various daughterboard accoutrement nomenclature, e.g. arduino shield, beaglebone cape, raspberry pi hat, ST Zio/Morpho connectors etc.

discordant packaging naming in ubuntu ‘dummy packages’, e.g. qemu is package name, however binaries are named differently.

still installing an OS is a cross-your-fingers exercise. A single USB presents multiple boot options. choosing between uefi 1.0 and 1.0 one will give you WIFI during install ubuntu forums have contemporary posts mentioning install issues this finicky nature means just have barebones things and install separate OS in virtual machine chose xcfe4 for gnome-repeat-bug and easy shortcuts

Any overarching idelogoy like everything is objects, everything is data is pretty arbitrary. Certain approaches are useful for particular problems. I’m sensiiltive to Input lag in text editor, so use vim. Intellisense pop ups slow

Although large groups can help maintain, the introduction of many can create imperfections This extends to all software companies. Do a lot of good things, but will always have things that are dysfunctional at, e.g documentation, disparate build systems, non-uniform design practices Don’t restrict right side of bell curve Let your aces be aces Being an ace involves having an opinion Most influential software written largely by one person, e.g Linux, Unix, git etc. Then a team is assigned to maintain it. Fallacy about solo programmer productivity requiring large teams. Design by committee pushes design to middle of bell curve as opposing views average out

Loss of generational knowledge, i.e. people forgetting gensis of influential software written by individuals

Growth of machine learning due to quantity of computation available. Localised improvements, overall degradation (rendering, update processing etc. far more involved) Wrong perception that performance takes too much time, so we want do it but we could do it. However, if they have never done it, how could they say they could? Growth of startups are just finding a niche, rather than writing revolutionary software

Testing nomenclature so diverse, yet could be boiled down to a handful of terms, e.g. Regression testing just re-running tests on new changes, which is something you do implicitly

JSON is not even robust enough to allow trailing commas that even C89 arrays support …

Now have ‘memory leak’ finder tools for interpreted languages like Javascript…

Although I’m in favour of an OS wrapper, these don’t give you all the control you need and you end up having to write your own OS specific code

Unfortunately can’t even use modern GNU AVR assembly with numbered labels, advanced macros and location counter

w3m-img installs to non default path /usr/lib. Have to resort to X11 python ueberzug

difference in size between C and CPP header files for vulkan SDK is huge, leads to much slower compile times. contains some useful things, but not that useful

Unfortunately software like npm is poorly designed so will inevitable get ‘fast/lightweight’ variants

Many tutorials on low-level like Vulkan, X11 can be wrong so need to understand spec

Fibres are green threads, i.e. not OS or hardware threads so not actually faster. They just allow interleaved execution which is used in web to not freeze UI on large linear execution

Unfortunate trend on Internet for people to ask why aren’t you using what I’m using, e.g. VScode, as oppose to asking why are you doing things your way Tech news propagates shallow information fast. In aggregate, most of it not important, e.g. new bright blueberry distro release

The distinction between new as in fashion and new as in forward progress can be overlooked in tech. Unfortunately, most is the former

Despite Wayland (and Mir) being far more sane (merging display server and client into one), still not fully supported

Being able to sift through low-quality tech news essential, e.g. ‘new’ database library, css library, VR headset, distro, container library etc. Amazon, Tesla, Twitter, Netflix endeavours Cancer, alzheimer trial drugs Fusion energy

Have to update Ubuntu version if want to easily get compiled version of modern gcc

Unfortunate mnemonic repetition of nonsensical statements in programming, e.g. manual memory management is hard (overly simplistic and misleading)

Auto, i.e. type inference is good when the type is not actually known. However, often used in replace of a name that is too long to type, leading to unclear types and over complication

C grew out of existing code. All new languages trying to be a top-down design on the opinions of a single programmer (BDFL) Just have a simple language with ideal metaprogramming/code generation facilities so the programmer can decide, not the language designer

Whilst github codespaces seems cool, the reason for having some many containers is that the build systems for each project is incredibly complex and requires many dependencies

Sometimes have to rephrase something for the compiler to generate more optimal code, e.g. div /= 2; div /= 2; will generate more efficient instruciton that div >> 2 for avr_gcc

The advent of AI assistants just gives you more settings to disable at factory reset time on a phone 2022, modern Samsung galaxy S22 ultra won’t even recognise USB to transfer files on Ubuntu and Windows …

With security conscious individuals, surely more dependecies is bad from this viewpoint. No matter…

With the proliferation of web technologies, now say AOT (ahead-of-time) compilation for what was normally just compilation

Reason for GUI proliferation is that installation procedures are so complex, e.g. install and use esp-idf requires python virtual environment

TODO: favourites tab for morning viewing videos: common sense skeptic; techquickie; handmade podcast; network next

Resident Name Unit Number / 17 High Street University Terraces Kensington NSW 2033 Australia

communicating with busy people: https://threadreaderapp.com/thread/1562510420644343810.html

security so vast and not something I want to devote time to: https://leveleffect.referralrock.com/l/JOHNHAMMON07/

the constellation we can’t see as it’s blocked by the sign is the zodiac sign for that month not new territory to regain interest in the past, e.g. renaissance grew interest in ancient greek astrology zodiac latinized of zodiakos, meaning circle of animals (all zodiac names latinized greek) vernal is beginning of astronomical year; daylight starts getting longer until summer solstice. hence, spring is when zodiac starts aries (march 20th) -> ram taurus (april) -> bull gemini (may) -> twin brothers cancer (june) -> crab leo (july) -> lion virgo (august) -> woman libra (september) -> scales of justice scorpio (october) -> scorpion sagittarius (november) -> archer’s arrow capricorn (december) -> goat aquarius (january) -> water bearer pisces (feburary) -> opposing fish

IS IT POSSIBLE THAT AN ASSEMBLY INSTRUCTION LIKE RDTSCP COULD BE TRAPPED BY PROGRAM LOADER? https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints

vector math routines (obtaining cross product from column vector form) when drawing vectors in a physical sense, keep in mind they are rooted at the origin (even if drawings show them across time) whenever doing vector addition/subtraction, remember the head-to-tail rule (their direction is determined by their sign). could also think that subtract whenever you want to ‘go away’ from something dot product transpose notation useful for emulating matrix multiplication unit circle, x = cosθ dot product allows us to project a vector’s length onto a unit vector dot product allows us to measure a vector on any axis system we want by setting up two unit vectors that are orthogonal to each other simple plane equation with d=0 will be through origin (altering d shifts the plane up/down) cross product gives vector that is orthogonal to the plane that the two original vectors lie on (length is |a|·|b|·sinθ). So, really only works in at least 3 dimensions with units, e.g. for camera, start with arbitrary ‘unit’ defintion. later move onto more physical things like metres by applying a scaling factor to direction vector, can move along it world space coordinates. camera position is based on these. the camera will have its own axis system which we determine what it should be and then use cross product based on what we want understanding dot product equivalence with circle equation for multiplication of vectors, be explicit with a hadamard function (IMPORTANT have reciprocal square root approximation which is there specifically for normalisation. much faster cycle count and latency than square root)

noise is randomness. white noise is complete randomness. blue noise (harder to generate) is randomness with limitations on how close together points can be (more uniform)

pride cometh before the fall! six of one, half a dozen of the other. have our cake and eat it too ad infinatum

Borrowing money: clear you’re the right person to give the money to. Clear you understand what you’re doing and the game process is figured out. Proof of you will complete it. Here is the game, why I’m good at it, why people will like it and why I’m going to succeed at developing it Interview ability to explain problems you have encountered on projects

Saying one instruction is faster than the other ignores context of execution. e.g. mul and add same latency, however due to pipelining mul execution unit might be full TLS vs atomics, e.g. TLS is series of instructions determined by OS and compiler. Atomics depend on how other cores are run and synchronising necessary with other cores So, must measure which is faster for particular situation

adding restrict also useful to prevent aliasing and thereby might allow compiler to vectorise say array loops

Before cpus increased in single thread execution speed. Now more cores. It’s a topic of research to convert single threaded into multithreaded for emulation. This is why emulation of something like the GameCube (powerpc) is slow. Furthermore, due to hardware irregularities that programs relied on may take hundreds of instructions to emulate If actually a simple translation, then should run close to native speed. This is reality of emulating hardware with hardware

log2(n) number of bits for decimal https://en.algorithmica.org/hpc/cpu-cache/associativity/

using genetic algorithm/machine learning to optimise for us https://zeux.io/2020/01/22/learning-from-data/

Cpu try to guess what instructions ahead (preemptive). Cost of incorrect reflushing expensive. So want to get rid of conditional jumps. Ideally replace with conditional movs or arithmetic branch less techniques. Endianness (register view), twos complement (-1 all 1s) Branch less programming is essentially SIMD

If variable clock speed, cpu could detect not using all cores and increase single core clock

Not memory bound is best case for hyper threading Intel speeds optimised for GPR arithmetic, boolean and flops Intel deliberately makes mmx slow

Low cache associativity means fast lookup but a lot of misses and thus eviction policy like LRU

Count cycles to counteract possible thermal throttling Hyperthreads useful only if different execution unit Cpu reads memory from cache and ram in cache lines (due to programmer access patterns). Each item in cache set is cache line size

if apple computers use RISC ARM in M1, why CISC necessary? (only because on Intel?) emphasis of CISC simplify assembly (e.g. more addressing modes), thereby reducing size of binary? (reduce instructions) and increase cache coherency RISC will require less transistors to implement complex hardware but will make optimising harder for compiler? single cycle instructions (reduce cycles per instruction)

when looking at a pointer, to optimise compiler must know whether it can assume it points to a local var or not. so, easier to eliminate aliasing with non-pointers

when viewing from application in a sandboxed environment like a phone, total RAM less than installed as portion reserved for kernel

2.5bln * 8 (simd) * n (execution units) * 2 (cores) assuming instructions have throughput of 1 so 64 / 4 gives how many floats per second from L1 cache in general, not streaming from memory the entire time (would probably hit a cache bandwidth limit)

// undefined behaviour if not true ALWAYS_INLINE void assume(bool cond) { if(!(c)) {__builtin_unreachable()}; } times when manual inlining is required: https://www.youtube.com/watch?v=B2BFbs0DJzw

Unless your in web where everything takes years to load, as single threaded performance is largely stagnant you will have to utilise parallelism if want performance. This is very difficult and like single threaded code generic libraries which can actually be very specific will create bloated in performing code based. Multithreading is building up a new discipline to single threaded. There are a lot of pitfalls for performance (balancing want things local, however must share to utilise) If you care about performance for anything, you should care about cache misses Memory bandwidth and caches are major reasons for a cup attaining performance. You have to think about where does memory actually live and how is it transferred around

Optimiser allows for lexical scoping of stack variables. However, for optimiser to inline will have to get rid of pointers to prevent aliasing

stack and heap memory are the same physical thing, so only more efficient if memory was hot, i.e. recently touched

Computer faster than you think, e.g instructions, clock cycles, cores. Very large With M2 drive should be quick

may be easier for optimising compiler to work with things passed by value as oppose to pointer. so, if need to modify something may have to return the value in a functional programming style. however, this is error prone. compiler cannot optimise pointers, e.g. if setting two values to same pointer, compiler cannot say that both the values are the same as another pointer may point to the same address. this is aliasing.

we see contiguous memory in virtual memory, however in physical memory it is almost certainly going to be fragmented

compilers can auto-vectorize loops for us (and other operations if we perform say 20 of them). so, floats will be twice as fast as doubles (more space, even though same latency and throughput)

optimising is a very precise process. only do when you have code that is working and you know will keep. games by their very nature are about responsiveness, so optimisation and low-latency is important. I like programming with this as a mentality.

with pure compiler optimisations, i.e. code we have not optimised ourselves, a 2x increase is not unexpected. (code we optimised not as much)

optimise for worst case (looking out on whole world) not best case (in front of wall, don’t render what is behind). we care about highest framerate, not lowest.

virtually never use lookup tables as ram memory is often 100x slower (so unless you can’t compute in 100 of instructions)

When many people say too much effort involved in optimisation, they are generally thinking of point 1

sometimes things should be running faster than they should, however only so much we can replace e.g. the structure of many OSs are based on legacy code, so simply outputting to stdout may go through shell then kernel etc. also, may have to deal with pessimised libraries. in these cases: * isolate bad code, i.e. draw hard boundary between your code and theirs by caching calls to them * do as little modification to the data coming in from them (no need to say put in a string class etc.)

People think that it’s slow, but it won’t crash (because of interpreter) Performance is critical in getting people excited about what you do, e.g. windows ce was laggy, iphone performant but less features changes the market low latency is more desirable, even if less features

count number of math ops in function and control flow logic that is evaluated branches can get problematic if they’re unpredictable

inspecting measurements of microarchitecture consider latency and throughtput (how much does it cost to start again) so, FMUL may have latency of 11, however throughput of 0.5, so can start 2 every cycle, i.e. issue again (cpus these days are incredibly overlapped even if single-threaded) so throughput is more of what we care about for sustained execution, e.g. in a loop

however, these numbers are all assuming the data is in the chip. it’s just as important to see how long it takes to get data to the chip. look at cache parameters for microarchitecture; how many cycles to get from l1 cache, l2 cache, etc. when get to main memory potentially hundreds of cycles

bandwidth of L1 cache say is 80 bytes/cycle, so can get 20 floats per cycle (however, based on size of L1 cache, not really sustainable for large data)

so, using these rough numbers we should be able to look at an algorithm (and dissect what operations it’s performing, like FMULs), know how much data it’s taking, and give a rough estimate as to how long it could optimally take (will never hit optimal however) IT’S CRITICAL TO KNOW HOW FAST SOMETHING SHOULD RUN

REASONS SOFTWARE IS SLOW: 1. No back-of-envelope calculations (people aren’t concerned that they are running up to 1000 times slower thn what the hardware is capable of) These calculations involve say looking at number of math ops to be performed in the algoirthm and comparing that to the perfect hardware limit 2. Reusing code (20LOC is a lot; use things that do what you want to do, but not in the way you want them to be done; often piling up code that is ill-fitting to what the task was, e.g. we know this isn’t a regular ray cast, it’s a ray cast that is always looking down) 3. When writing, thinking of goals ancillary to task (not many places taught how to actually write code; all high level abstractions about clean code there are thinking about templates/classes etc. not just what does the computer actually have to do to do this task?) (there is no metric for clean code; it’s just some fictional thing people made up)

WHENEVER UNDERSTANDING CODE EXAMPLES: 1. COMPILE AND STEP INTO (NOT OVER) IN DEBUGGER AND MAKE HIGH LEVEL STEPS PERFORMED look at these steps for duplicated/unecessary work (may pollute cache). (perhaps even asking why was the code written the way it was) could we gather things up in a prepass, i.e. outside of loop? if allocating memory each cycle that’s game over for performance. do we actually have to perform the same action to get the same result, e.g. a full raycast is not necessary, just segment on grid O(n·m) is multiplicative, not linear O(n). big oh is just indication of how it scales. could be less given some input threshold (big oh ignores constants, hence looking at aymptotic behaviour, i.e. limiting behaviour) now, once code reduced, look at minimising number of ops

understanding assembly language is essential in understanding why the code might not be performing well

branch prediction necessary to ensure that the front-end can keep going and not have to wait on the back-end

execution ports execute uops. however, the days of assembly language registers actually mapping to real registers is gone. instead, the registers from the uops are passed through a register allocation table (if we have say 16 general purpose registers, table has about 192 entries; so a lot more) in the back-end this is because in many programs, things can happen in any order. so to take advantage of this, the register allocation table stores dependency chains of operations (wikichips.org for diagram) from execution port, could be fed back into scheduler or to load/store in actual memory

when looking at assembly, when we say from memory, we actually mean from the L1 cache

xmm is a sse register (4 wide, 16 bytes); m128 is a memory operand of 128 bits ymm is 8 wide 1p01 + 1p23 is saying issue 1 microop on either port0 or port1 and one microop on either port2 or port3 so, we could issue the same instruction multiple times, i.e. throughput of 0.5

microop fusion is where a microop doesn’t count for your penalty as it’s fused with another. with combined memory ops, e.g. vsubps ymm8, ymm3, ymmword ptr [rdx] this is the case so, if a compiler were to separate this out into a mov and then a sub, not only does this put unecessary strain on the front-end decoder it also removes microop fusion as they are now separate microops (important to point out that I’m not the world best optimiser, or the worlds best optimisers assistant, so perhaps best not to outrightly say bad codegen, just say makes nervous)

godbolt.org good for comparing compiler outputs and possible detecting a spurious load etc.

macrop fusion is where you have an instruction that the front-end will handle for you, e.g. add and a jne will merge to addjne which will just send the 1 microp of add through

uica.uops.info gives percentage of time instruction was on a port (this is useful for determining bottle-necks, e.g. series of instructions all require port 1 and 2, so cannot paralleise easily) so, although best case say is issue instruction every 4 cycles, this bottleneck will give higher throughput

some levels of abstraction are necessary and good, e.g. higher level languages to assembly

file size https://justine.lol/sizetricks https://codegolf.stackexchange.com/questions/215216/high-throughput-fizz-buzz/236630#236630

more important to understand how CPU and memory work than language involved in an OS, you will get given a zero-page due to security concerns

likely() macros for branch prediction compiler optimisations (https://akkadia.org/drepper/cpumemory.pdf, pg 56)

recording information: We want to understand where slow with vtune, amd uprof, arm performance reports Next, determine if IO bound, memory bound etc.

To determine performance must have some stable metric, e.g. ops/sec to compare to e.g measure total time and number of operations Hyper-threading useful in alleviating memory latency, e.g. one thread is waiting to get content from RAM, the other hyper-thread can execute However, as we are not memory bound (just going through pixel by pixel and not generating anything intermediate; will all probably stay in L1 cache), we are probably saturating the core’s ALUs, so hyper-threading not as useful

Inspecting the assembly of our most expensive loop, we see that rand() is not inlined and is a call festival. This must be replaced Essentially we are looking for mathematical functions that could be inlined and aren’t that are in our hot-path. When you want something to be fast, it should not be calling anything. If it does, probably made a mistake Also note that using SIMD instructions, however not to their widest extent, i.e. single scalar ‘ss’. Want to replace with ‘ps’ packed scalar

we have the option of constructor/destructor pairs if we want to determine best possible time if all caches align etc. ‘hunt for mininum’, e.g. record mininum time execution in loop iteration or re-run if smaller time yielded alternatively, we could develop a statistical breakdown of values (could see moments when kernel switches us out etc.)

(IMPORTANT save out configuration and timing information for various optimisation stages, e.g. ./app > 17-04-2022-image.txt)

threading Observe CPU percentage use is not close to 100% For multithreading, often have to pack into 64 bit value to perform single operation on it, e.g. delta = (val1 << 32 | val2); interlocked_add(&val, delta)

When making multi-threaded, segregate task by writing prototype ‘chunk’ function, e.g. render_tile Then write a for loop combining these chunk functions Before entering the chunk function, good to have a configuration printout, e.g. num chunks, num cores, chunk dim, chunk size, etc.

When dividing a whole into pieces, an uneven divisor will give less than what’s needed. so (total + divisor - 1) / divisor to ensure always enough. We will want this calculation to be in the last dividing operation, e.g. tile_width then tile_count calculated, so use on tile_count Associated with this calculation is clamping to handle adding extra exceeding original dimensions For getting proper place in chunk, call function wrapper for pointer location per row

(May have to inline functions?) Next we want to pass each chunk onto a queue and then dequeue them from each logical core? So, have a WorkOrder that will store all information required to perform operation on chunk, i.e. all parameters in render_chunk function (may also store entropy for each chunk, i.e. random number series) Then a WorkQueue that contains an array of WorkOrders with total number equalling number of chunks So, the original loop iterating of chunks now just populates the WorkOrders Now in a while loop that runs while there are still chunks to execute, we call the render_chunk function and pass in the WorkQueue The render_chunk function will increment the next_work_order_index and return true if more to be done

When spawning the actual worker thread functions, have same while loop calling the render_chunk function as for core 0 (the amount of threads to spawn would think should be equal to number of logical cores? however exceeding them may increase performance?) (this debate of manually prescribing the core count applies to the chunk size as well. perhaps the sweet-spot for my machine in balancing context switching and drain out is to manually prescribe their size as oppose to computing them off the core count) (Collating information into the WorkQueue struct helsp for printing out configuration) (Setting up this way, we can easily turn multithreading off)

As creating threads will require platform specific, put prototypes in main.h and the implementations in linux_main.cpp Then include linux_main.cpp based on macro definition of platform in the build script at the bottom of main.cpp

hyperthreading, architecture specific information becomes more important when in a situation where memory is constrained in relation to the cache (hyper-threads share same L1-L2 cache)

volatile says code other than what is generated by this compiler run, could modify this value. it’s required for multithreaded, as compiler may not re-read value that it may have cached in a register if changed elsewhere when incrementing volatiles, must use a locked_add_and_return_previous_value (could return new value, just be clear)

simd clamp can be re-written as min() and max() combination, which are instructions in SSE Although looking at the system monitor shows cpus maxed out, we could be wasting cycles, e.g. not using SIMD

Define lane width, and divide with this to get the new loop count Go through loop and loft used values e.g. lane_r32, lane_v3, lane_u32 (IMPORTANT at first we are only concerned with getting single values to work, later can worry about n-wide loading of values) (TODO the current code has the slots for each lane generated, rather than unpacked. look at handmade hero for this unpacking mode) If parameters to functions, loft them also (not functions? just parameters? however we do random_bilateral_lane() so yes to functions?) If using struct or struct member references, take out values and loft them also, e.g. sphere.radius == lane_r32 sphere_r; (group struct remappings together) Remap if statement conditions into a lane_u32 mask and remove enclosing brace hierarchy (IMPORTANT you can still have if statements if they apply to lanes, e.g. if mask_is_zeroed() break;) (TODO for mask_is_zeroed() we want the masks to be either all 1’s or 0’s) (call mask_is_zeroed() on all masks to early out as often as possible to get a speed up) Once lofted all if statements, & all the masks into a single mask (it seems if there is large amounts of code inside the if statements, you don’t want to do it this way and rather check if needing to execute?) (IMPORTANT to only & dependent masks, e.g. if there is an intermediate if like a pick_mask or clamp, then don’t include it, but do the conditional assign directly on this mask) Then enclose remaining assignments in a conditional assignment function using this single mask? (conditional_assign(&var, final_mask, value); this uses positive mask to get source and negative mask to get dest?) (also discover the work around to perform binary operations on floating point numbers) So, by end of this all values operated upon should be a lane type? (can have some scalar types if appropriate) We may have situation where some items in a lane may finish before others. So, introduce a lane_mask variable that indicates this. To indicate say a break, we can do (lane_mask = lane_mask & (hit_value == 0)); For incrementing, will have to introduce an incrementor value that will be zeroed out for the appropriate lane item that has finished. Have horizontal_add()? Next once everything remapped create a lane.h. Here, typedef the lane types to their single variants to ensure working before adding actual simd instructions Also do simd helper functions like horizontal_add(), mask_is_zeroed() in one dimension first Wrap the single lane helper functions and types in an if depending on the lane width set (IMPORTANT any functions that we are to SIMD, place here. if it comes that we want actual scalar, then rename with func_lane prefix)

Debug in single lane, single threaded mode (easier and debugger works) However, can increase lane width as needed (threading not so much?)

For bitwise SIMD instructions, the compiler does not need to know how we are segmenting the register, e.g. 4x8, 8x8 etc. as the same result is obtained performing on the entire register at once. So they only provide one version of it, i.e. no epi32 only si128 Naming convention have types: __m128 (float), __m128i (integer), __m128d (double) and names in functions: epi32/si128 (integer), ps (float), pd (double) Overload operators on actual wide lane structs (IMPORTANT remember to do both orders, e.g. (val / scalar) and (scalar / val)) Also have conversion functions

Lane agnostic functions go at bottom (like +=, -=, &=, most v3 functionality) (IMPORTANT it seems we can replace logical && and || with binary for same functionality in simd)

(IMPORTANT simd does not handle unsigned conversions, may have to cut off sign bit, e.g. >> 1)

process of casting type to pointer to access individual bytes or containing elements (used in file reading too)

(IMPORTANT masks in SIMD will either be all 1’s or 0’s. perhaps have a specific name for this to distinguish?)

(IMPORTANT seems that not all operations are provided in SSE, like !=, so have to implement with some bitwise operations)

SIMD allows divide by zeros by default? (because nature of SIMD have to allow divide by zeroes?)

To get over the fact that C doesn’t allow & floating point, reinterpret bit paradigm *(u32 *)&a as oppose to cast (IMPORTANT in SIMD cast is reinterpreting bits, so the opposite of cast in C)

caching https://akkadia.org/drepper/cpumemory.pdf 1. know cache sizes to have data fit in it 2. know cache line sizes to ensure data is close together (may have to separate components of structs to allow loops to access less cache lines) i.e. understand what you operate on frequently. may also have to align struct 3. simple, linear access patterns (or prefetch instructions) for things larger than cache size

inline assembly (raw syscalls from github) HAVE TO INSPECT/VERIFY ASSEMBLY IS SANE FIRST THEN LOOK AT TIMING INFORMATION inspecting compiler generated assembly loops, look for JMP to ascertain looping condition due to macro-op fusion (relevent to say Skylake), e.g. cmp-jmp non-programmable instructions could be executed by the cpu similarly, instructions that only exist on the frontend but exist programmatically e.g. xmm to xmm might just be a renaming in register allocation table also due to concurrent port usage, can identify parts of code as relatively ‘free’ struct access typically off a [base pointer] in assembly, 1.0f might be large number e.g. 1065353216 in assembly loop, repeated instructions may be due to loop unrolling we might see: * superfluous loading of values off stack * more instructions required, e.g not efficiently using SIMD (often this exposes the misconception that compilers are better than programmers; so better to handwrite intrinsics)

comparing unoptimised assembly to ‘wc’ see noticeable speed increase. example of non-pessimisation

Following the basic principles of non-pessimisation, I make a note of the huge amount of cruft in the C STL. The output buffering, hidden malloc() ‘optimisations’ (uncommitted memory, encounter expensive page faults later; prefer reliability/clarity over edge-performance benefits), OS line ending conversions, non-obvious use of mutexes etc. Whilst these may seem like minor inconveniences, they can be insidious for performance, e.g. rand() has a huge call-stack that if we replace with a simple xor shift, results in 3x speed up. Although easy to criticise, it may be the situation that the CRT had to be that way because of C standards. To isolate use of the CRT, wrap in functions so we can hopefully replace with system calls, intrinisics, etc. Although generally okay to use STL, it forces you to use their patterns (e.g. memory allocation, locks etc.). This is true for STLs in general across all languages. Some may have bloatness from other areas, e.g. C++ templates To avoid the compiler having to generate a large export table of all functions, make them static To avoid large amounts of linking and ∴ increase compilation time, have a unity build. Furthermore, garunteed ability to inline functions (as with multiple translation units, possible one might only have function declaration and not definition) (issues may occur with slower incremental builds when including 3rd party libraries; yet, can still work around this possibly using ccache)

https://marketplace.fedevel.education/itemDetail.html?itemtype=course&dbid=1569757838995&instrid=us-east-2_KpwYC7yK5:45f6c01d-ccc8-43e0-8f33-c5a70caf707f

Spreadsheet (have dimensions to make parametric): if want to access parameteric sketch constraint, <>.Constraints.name (IMPORTANT: even though variable, make sure to reference it off the sketch) GIVE OPTIONAL NAME SO CAN REFERENCE FROM .Constraints! (formula typecasting may be necessary, e.g. (value1mm)) printer_tolerance * container_l/w/h * board_offset_x/y * case_thickness * base_thickness * hole_diametre_percent_ratio (e.g. <>.Constraints.height * hole_diametre_percent_ratio) * global_chamfer

supports necessary for complex geometries like overhangs, greater than 45°, etc. put supports everywhere by default? (perhaps add blockers to this?) maybe just use everywhere to get a feel for it, then manually add enforcers?

Polymorphism is a single object that can be interpreted as having various types. This can simply be a struct with unions and a type field.

Never use setters/getters unless actually doing something. You’re spending your entire day typing. If needing, replace variable name with name_ to see where it was used.

Caches are a way of minimising roundtrip time of RAM by putting memory as close to the core that thinks will need it L1 closest, 1/2 cycles, 32k. Wikichip for more info Cpu will go to caches first

L1 can supply 2 cache lines per clock Instructions per clock, number of work components, e.g. number of add components, cache line per cycles, cache latency, agu (address resolver units) impose restrictions A cache miss is simply stalling for an instruction. However, this may not be an issue if we do other work, e.g. complex algorithm takes many instructions hiding memory access for out of order cpu. If hyper threading with two schedulers, if cache miss on one, just switch to the other. Can really only know if a cache miss incurs a performance penalty by looking at raw numbers from vtune, etc. Because of the scheduler, it’s not as simple as just looking at memory sizes So, due to complex overlapped/scheduling nature of modern CPUs can really only know if cache miss incurs penalty with vtune Uop website displays table for instructions

Currently good that most things are little endian with 64 byte cAche lines, however some hardware guys Is going to come along and change it back to big endian

Should be code of ethics in software to not create bugs/inconveniences for users that couldve been avoided

An instruction of throughput 1 means issued every clock. As many instructions take longer than 1 cycle, each core requires a scheduler to see if it can execute something. View cpu as sections where there is some distance to communicate.

Making some thing good takes time. However, if you have crazy design practices it will also take longer You have to be reality based when programming. That is in an engineering sense, to design something that solves the problem you have People become attached to a way of programming which doesn’t focus on solving the problem. They want to build rube-goldberg machines Selectively attacking problems seriously means you have a functional program quicker, whereby you can actually decide if those other problems need to be addressed. Can defer hard decisions later as they will be made better as you will have more technical expertise and more context to work with

Testing is important. If you don’t write tests, your software doesn’t work. However, write higher level system tests, not excessive unit tests. More efficient and this is where bugs are likely to be. You often have to remove code, so having unit tests just increases the volume of code you have to write. Huge drain on productivity. Maybe for NASA. A new paradigm should weigh up cost-benefit. Almost always the cost is ignored and people gobble them up

my style of programming and problems enjoy solving found in embedded, e.g your constrained with the silicon not like in web where you just build another data centre

Compiler works on file by file, so knows nothing about calls across files. Therefore it generates object files which are partially executable machine code with unresolved symbols. Linker merges these object files and also adds extra header information so that the OS can load our executable (or more specifically a kernel, e.g. linux)

Complete code coverage on the one hand is very thorough, however don’t get a lot of engineering output. Furthermore, most bugs appear in between systems not in units.

Best way to test is to release on early access. This checks hardware and software, user may be running adobe acrobat which hogs cpu so instruct them to kill it before running your game. Or maybe 20000 chrome plugins. This is something a hardware lab can’t tell you

Process is allocated virtual memory space OS has mapping table that converts these to physical addresses. Part of our processes address space is pre-populated by the kernel program loader, e.g. linux-vdso, environment variables, etc. Kernel tunables: sudo sh -c ‘echo kernel.perf_event_paranoid=0 >> /etc/sysctl.d/local.conf’ (sysctl) User tunables: ulimit -a In virtual address space, have user space and kernel space address ranges. A virtual address is mostly relating to page table indexes and the last bit is a static offset (as for security addresses are randomized)

To make an installer just fwrite your executable and then data files appended with footer. Inside the exe, fread the exe and fseek based on the static offset of the appended resources Bake resources in for reliability only really

Packed files better as less OS operations performing expensive file handles etc.

Programming about solving problem. overlooked by design philosophies. If you don’t have any functionality, you don’t have a structural problem

const is only useful if you find it catches bugs for you (maybe for globals instead of using #defines) however, in terms of optimsations, const is useless as you can cast const away. therefore, for me, const is mostly just a waste of typing. however, have to use for strings in C++ In a similar vein, VLAs useful here (note that sizeof(array) and sizeof(pointer) for calculating string array count)

direction of stack growth is often determined by CPU. if selectable, then OS. eg. x86 is downwards ulimit -s for stack size (main executable will have stack size listed in headers)

good practice to assign variable for syntatical reasons, i.e. more readable, e.g Controller *controller = &evdev_input_device.controller;

don’t go through a ton of unecessary stdlib. malloc probably more optimised for small memory size requests relative to overall code base, interfacing with system calls is not that much code (and can be reused)

if we try to access memory from an invalid page (not reserved or comitted) will get segfault. only have to worry about errors that don’t manifest themselves on every run of the program.

on x86, writing by 32bit faster than 8bit as less instructions in general, the fastest way is to use the widest possible register that can be operated on at once the speed of accessing the memory from cache is pretty cheap for nearby regions

don’t make changes for conceptual cleanliness. end of the day, want to make performant, bug free code in the shortest amount of time.

when programming some days you are off. this just means you’re going to be debugging a lot

we want it to be clear what our code can and cannot touch. global variables make this hard (however, can add _ to see where they are all used) however, as many OSs are rather janky and most code will live outside this, it is ok to have some globals here globals are fine in development. can repackage into a structure later.

clock speed not as relevent as improvements in microarchitecture and number of cores means can be more efficient under less duress. also, lower clock speed may be because want to draw less power.

short build times (under 10 seconds) are incredibly important to not decentivness making changes and testing them

function overloading, generalised operator overloading and default arguments are c++ features that can’t easily be implemented with gcc extensions

note that >> will typically (implementation defined) perform arithmetic shift (fill in with 1’s) on signed, so not always the same as a divide. similarly, sign-extension just fills in the new MSBs with 1’s

for large cross-platform projects, best to differentiate with filenames, rather than ifdefs. this also gives the ability to have different control flows across the different platforms (essential)

for a game, better to have the game as a service to the OS (not the other way round). this is becuase the game does not need to know/perform the myriad of possible operations the OS can perform.

most modern cpus have a floating point unit, making them faster than ints (same latency), e.g. a multiply is one instruction where ints is two (multiply and shift) x87 is the FPU instructions for x86 (also have SSE instructions which is want you ultimately want) however, for multiplayer games, optimisers can give different results when using floating point, e.g a platform that has operator fusion like a MULADD may give different result when rounding then a platform that has do it separate. (fixed point could solve this)

Programming Mentality: always important to know when coding what is your goal. premature optimisation and design are bad. your goal dictactes the quality of the code you write, e.g. allowed to be janky as first pass on API. write usage code first.

often when having a variable lengthed array, ask ourselves do we really need that?

asserts are part of debug program that are used to check that things work that should always work. use them for a condition that must be true that is not explicitly present

don’t think about memory management, but rather memory usage. if having to worry about freeing etc. done something wrong.

when writing ‘spec’ code, no need to handle all cases; simply note down the edge cases you should handle later on stop yourself thinking about whether the code is messy or not. only care once problem is solved!

it’s not the programming practice but the dogma that gets you. when you start to name things it almost always becomes bad. almost all programming practices have a place, just not used often so RAII people, in case of things that must be released, e.g DeviceContext, ok to use a constructor/destructor pair. I’ll throw you a bone there

compression orientated programming is you code what you need at the time (breaking out into function, combining into struct, etc.) over time the code marches towards a better overall quality

amdahls law gives the time taken for execution given a number of cogives the time taken for execution given a number of cores. for this formula (indeed any formula) we can obtain some property by seeing as function parameter approaches infinity. in this case, the parallelising part drops out. brooks law says that simply adding more people to a problem does not necessarily make it faster. if requires great deal of coordination/communication actually slows down.

solving a problem: 1. decide what you are doing (this can’t be open-ended.) 2. organise groups to achieve this by making these boundaries, we are presupposing that each part is separate, e.g tyres team and engine team; assume tyres and engine cannot be one piece. therefore, the boundaries define what products you can make, i.e. you produce products that are copies of yourself or how you are structured so, in software if we assign teams for say audio, 2d, 3d we would expect individual APIs for each. the org chart is the asymptote, so it’s the best case that we make a product as granular as our org chart. it could be far worse and even more granular therefore, communication between teams is more costly than communication within teams. takeaway is that low-cost things can be optimised, high-cost can’t be (further away on the org chart) note that communication in code could just be someone checking something in and you pulling it what we are seeing now with modern software is the superposition of orgcharts due to use of legacy codebases now we see org charts in software, where people are artifically creating inheritence hierarcies that limit how the program works this is very bad. the reason it’s done is for people to create mental models that help them solve the problem as they can’t keep the complexity in their head. it may be necesary to solve the problem, however it shouldn’t be looked at as good. however, because it’s done due to lack of understanding, the delegation/separation is not done with enough information. so you limit possibilities of the design space. so although, libraries, microservices, encapsulation, package managers, engines may be necessary due to our brain capacity (until neuralink or we figure out a better way to do them) they are not good! we may use hash map, but only in a particular way They limit optimisation as we have already decided separation so always be on the lookout for times when you don’t have to do these most people just download hundreds of libraries because they know it works and they won’t be worse than any one else. WE MUST BE LEAN AND FLEXIBLE IN ORGCHARTS IN COMPANY AND IN SOFTWARE TO INCREASE DESIGN. some old codebases need to be retired

DO THE ‘MOST CERTAIN’ THING FIRST. THIS COULD EITHER BE THE IMPLEMENTATION OR THE USAGE CODE choose data structures around solving problem

some software is scaffolding, i.e. not shipped with the final product, e.g. editor for games

Require machine-specific documentation files to understand system we are on System specific ctags template projects, e.g. linux kernel, glibc, etc. If using library, have ctags for that project

Use GLOBAL and global_prefix If casting is occuring, always be explicit about it! Prefixing functions with sdl2_func() or linux_func() only if intending to be cross-platform

With error handling, bad practice is to allow a lot of errors, which brings in error classes etc. Instead, if it’s something that is actually an error, e.g. missing file, write the code to explicitly handle it. Handling the error in a sense makes it no longer an error, rather a feature of the program

Refactoring Mentality: Refactoring is essential. You must know what you are trying to achieve so you have some notion of progress, e.g. adding a constraint to the system. (Replacing variable type in scope gives us all locations where variable was used, including macros. This us where dynamic lnguages fail as you don’t know if you broke something)

You can abstract/encapsulate anything at anytime, so why not do it later when you know what you are doing? We want file format to be simple and binary, unlike json which is general purpose and string

modifying (parse as pointer arg), returning (result struct). it is best to put simple types are arguments rather than a group struct to allow for maximum code reuse. only put group struct as arguments if must be put together (in general don’t pack so don’t force user to create a struct when they may already have the values)

if can go functional without sacrificing, saves you complexity down the road. oftentimes simply utilising elements of functional programming is good, e.g. no global state, only operating on parameters etc.

writing code guides you to the right design, e.g. made same call -> put into function; require args in function -> struct; many related functions together -> organise related functions into own file; (if thinking could be moved to another file, make comment sections outlining code blocks to ease this process later on) etc. (these are simple, compression changes), complex api -> transient struct, overload functions for internal/external use There are also large changes that are more difficult, e.g. sections of single value interpreted differently -> pack 2 variables into 1 (_ technique useful here) same operations performed on pairs of variables -> vectors (as oppose to working with them as scalars) vectors are particularly useful as without them repeated actions quickly become intractable It can get messy at times, but always know that a clean solution is out there and you will refine towards it Work threw the error tree one at a time Important to throw in asserts for underlying assumptions. Also for debugging be aware of ‘copy-pasta’, e.g. copying variables will have same name for two parameters as didn’t replace it

we don’t want to orient our code around objects (if anything, algorithmic oriented). its about how you arrive at some code that determines how good it is

excessive pre-planning and setting waypoints is really only useful if you’ve done the problem before (which in that case you’re not really designing) instead, we become a good pathfinder, learning to recognise the gradient of each day of the journey. when write the simplest thing and loft it out into good design later (in this explorative phase, if we make an changes for efficiency reasons we have just introduced the possibilities of bugs with no benefit)

only break into function when you know what you want, e.g. called multiple times or code finalised and improved readability (in which case a tradeoff is made between understanding functionality and semantics)

Don’t be scared of mass name changing!! Before doing so, see all places where name is used Don’t be scared of long list of compiler errors. Work your way through them

Refactoring with usage code: just write out structures that satisfy the usage code. If major rewrite use #if 0 #endif to allow for successful compiling

most solder has flux core (typically rosin) to remove oxide films, i.e. wetting the metal (to remove dirt/grease will require cloth or steel brush) Sn/Pb (60/40) lower boiling point and shinier finish (cone shaped) then non-leaded. fumes are flux as boiling point of lead (≈1700°C) much higher fume extractor at top
Don’t restrict right side of bell curve Let your aces be aces Being an ace involves having an opinion Most influential software written largely by one person, e.g Linux, Unix, git etc. Then a team is assigned to maintain it. Fallacy about solo programmer productivity requiring large teams. Design by committee pushes design to middle of bell curve as opposing views average out
Cpu try to guess what instructions ahead (preemptive). Cost of incorrect reflushing expensive. So want to get rid of conditional jumps. Ideally replace with conditional movs or arithmetic branch less techniques. Endianness (register view), twos complement (-1 all 1s) Branch less programming is essentially SIMD
What simd instructions available to cpu? If variable clock speed, cpu could detect not using all cores and increase single core clock
Flops calculated with best instruction set? Not memory bound is best case for hyper threading Intel speeds optimised for gpr arithmetic, boolean and flops Why is Intel shr instruction so slow? Intel deliberately makes mmx slow

refactoring just copy code into function that gets it to compile. later, worry about passing information in as a parameter basic debug and release compiler flags When refactoring, utilise our vimrc all files Also, just pull code out into desired function and let the compiler errors guide you Returning multiple values, just return struct To reduce large number of function parameters, put into struct

Debugging Mentality: debugging stepping through pass-by-pass. inspecting all variables and parameters and verifying state of particular ones. make deductions about state of variables, e.g overflowed, uninitialised, etc. drop in asserts draw it out

being able to draw out debug information is very useful. time spent visualising is never wasted (in debugger expressions also)

When in debugger, go iteratively progress through variable values in function and see if they look right We can isolate some area of the code and say this is probably the problem Then investigate relevent sub-functions, etc. This can be a long process with seemingly little gains. The issue could be subtle, e.g. signed/unsignedness size, function called rarely

Configuration files should be copied, not generated (becomes too messy) Symlink to template files from projects

To begin, I ensured that I had a debugger from which I could easily step through the application’s execution. In code, I was able to programmatically set system and user breakpoints.

For handling non-fatal errors, single line check. For fatal errors, nest all preceding code (I have learnt to not be afraid of indentation in this manner). (error handling in general, i.e. reduce ‘errors’ by making them part of normal execution flow)

When performing the common task of grouping data, a few practices to keep in mind. Use fixed sized types to always know about struct padding (in fact, I like to extend this to all my code)

If wanting multiple ways of accessing grouped data, use union and anonymous structs. Use an int to reference other structs, e.g. plane_index

If the data being grouped can only exist together (e.g. points), use vectors. Put all structs related typedefs inside their own header file for easy access.

As floats are an approximation, when comparing to 0.0f (say for a denominator check) or negative (say for a square root) use a tolerance/epsilon less-than/greater-than check. In fact whenever dividing should always ask oneself “can the value be zero?” To be clear about float to int casting, use a macro like truncate/round (think about what if uneven divide) Due to mixed integer and float arithmetic going to float, calculate integer percentages val * 100 / total There is no need to overload the division operator as can do (* 1.0f / val)

For easy substitution, use single letter prefix names like output_h and output_w. Convention for variable arrays, e.g. Planes plane[1], planes and plane_count (use ARRAY_COUNT macro here) Put for loop statements on separate line to help not be afraid of indentation. Iterate over pixel space and then convert to say, world space for calculations (normalisation and lerp) Aspect ratio correction is simply rearranging a ratio. If we determine one side is larger, scale other. Use “ ASCII code to print a status indicator. Only use const for char * string literals stored in the data segment.

Endianness comes into play when reading/writing from disk (e.g. file type magic value) and working directly with u8 * (e.g. iterating through bytes of a u32)

MARKETING APP: f5bot.com, https://github.com/lawxls/HackerNews-personalized I’m notified when keywords related to “human wants thing, my app can do thing” appear on HN, Reddit and Lobsters. If I can then contribute with information to that discussion, I’ll also leave a link to my app. Don’t just self plug, people (myself included) appreciate more detailed information on how they can solve their own personal problem, instead of being thrown into “here’s an app, figure it out”

Keyboard