Construction Notes

Posted under Projects at 2022-07-06 21:57. Last updated 2025-03-30 17:00.
Tags: site‑admin, software

Some technical notes relating to the construction of this site.
(Or, some things you learn when writing your own CMS.)

warning symbol: undentifiable tentacular entity operating a keyboard in this area

25. I find the switch() control structure in PHP a bit dubious, for more than one reason, but primarily because it uses more space (characters) than if() elseif() else(). But recently I was editing a file with a line like:
```
elseif (in_array(needle,[stack])) <do-thing> ;
```
— and wondered if a switch might actually manage to save space, so I tried it:
```
switch($testval) {
   case 's' :
   case 't' :
   case 'a' :
   case 'c' :
   case 'k' : <do-thing> ;
   }
```
And the answer was, no, it’s still some characters longer even without the break statements. Then I wondered idly whether using in_array() as a case test value might be even shorter. Now, based on all the documentation I remember about switch, which isn’t much, this probably shouldn’t work as I sort of understood that the switch test value is an essential component. But:
```
switch($testval) {
   case 'value-1' : <do-thing-1> ; break ;
   case in_array(needle,[stack])) : <do-thing-2> ;
   }
```
— did in fact work. Now that’s interesting. It seems to be sufficient that the individual case statement evaluates to TRUE, rather than requiring a test $testval == <caseval> to evaluate TRUE. Not what I had understood at all. So does this mean that switch could be operated without a primary test value, if all the individual cases evaluate to booleans?
```
$testval = [1,2,3] ;
switch() {
   case in_array(1, $testval) : echo '1 exists' ; break ;
   case in_array(2, $testval) : echo '2 exists' ; break ;
   case in_array(3, $testval) : echo '3 exists' ;
   }
```
No. We get (in error_log): PHP Parse error: syntax error, unexpected ')' So it requires a value of some kind in the switch statement even if it’s not going to test against it. How about a random variable name?
```
$testval = [1,2,3] ;
switch($a_val) {
   case in_array(1, $testval) : echo '1 exists' ; break ;
   case in_array(2, $testval) : echo '2 exists' ; break ;
   case in_array(3, $testval) : echo '3 exists' ;
   }
```
No. PHP Notice: Undefined variable: a_val How about a boolean?
```
$testval = [1,2,3] ;
switch(TRUE) {
   case in_array(1, $testval) : echo '1 exists' ; break ;
   case in_array(2, $testval) : echo '2 exists' ; break ;
   case in_array(3, $testval) : echo '3 exists' ;
   }
```
Yes. 1 exists. Same goes for a plain 1. But not for 0, FALSE, or NULL, though these don’t produce an error message. I think I’m getting the idea. Do 0 or FALSE produce line 1 output if I delete 1 from the array?
```
$testval = [2,3] ;
switch(FALSE) {
   case in_array(1, $testval) : echo '1 doesn’t exist' ; break ;
   case in_array(2, $testval) : echo '2 doesn’t exist' ; break ;
   case in_array(3, $testval) : echo '3 doesn’t exist' ;
   }
```
Yes. 1 doesn’t exist. But if we remove the break statements so case 2 & 3 are evaluated we get now invalid statements that neither exist in the array, in spite of the FALSE. So only the first line is being evaluated against the actual switch value and FALSE is behaving like TRUE. That’s horribly unreliable, but we’re obviously doing this wrong anyway.

In summary, it seems to be safe to use individual case tests that evaluate to TRUE as long as you have a real variable, TRUE or 1 in the switch(). But is this useful? In the particular case I was looking at it still ended up 11 characters longer than if() elseif() else() and had all these break statements littering the place up, so I didn’t use it. Maybe some day.

comment?
24. It turns out that http Status Code 301 is long-outmoded and I should be using 308 for redirects for old URLs. Trying that however, Firefox seems to take the code seriously and won’t try to use the old URL a second time in the same session. Which is probably good behaviour for a regular user agent but it’s a nuisance in testing.
comment?
23. Strange Loops: For some unidentified reason I’m finding that using ksort or krsort on an array prior to adding additional details with a foreach($arr as &$mem) loop prevents a later foreach loop in a display template (in a different file) accessing the second member of the array, instead repeating the first, even though var_export() at that point shows both array members (and with the correct original keys). Doing the ksort after adding the additional data gives the same result from var_export() and writes the second member. I have no clue how that’s happening.
comment?
22. Dropping punctuation: It seems that the CSS ::first-letter pseudo-element selects not only the first letter but any surrounding punctuation. (But not U+02BC.) I get the results that a U+2019 fake-apostrophe is selected after the initial, and therefore it is treated as part of a drop-cap, but a preceding left quote isn’t recognised as a character, so the CSS rules aren’t applied. Apparently a future improvement to CSS will allow sub-selectors inside ::first-letter, but what to do in the interim? I could just use U+02BC, but you’d expect some sort of break to be possible? Well, neither ZWJ nor ZWNJ do it. ZWSP does but that’s not ideal either. <wbr /> does it. Not sure if I can automate anything for this though. (Update: I found a way of automating a break insertion; so that works, but it’s hopefully temporary.)
comment?
21.   Howww long have I been using regex? Decades, certainly. And of course I started with BBEdit and have never completely left it. Now it turns out BBEdit does PCRE with the -m flag on by default, which I probably read many times, and also, this matters. And is why I could never work out why some patterns using ^ $ \A \z didn’t work in PHP while others did, but they always did when I tried them in BBEdit. Somehow I only ran into this problem within the last year, and after howevermany months of brainmangling workarounds I’ve found the depressingly simple answer. So RTFM! In great detail. In fact, don’t do any actual work, just spend your life RTFMing until you have understood the words that are written. At least that way you won’t encounter real problems.
comment?
20. And some last minute things. I’d noticed over several years that OSX Quick Look shows some SVG graphics blacked out even though they work perfectly well in other contexts, but it turns out that recent versions of FireFox (and so perhaps other user agents) do the same. The problem seems to be the <use> element, which has a “shadow DOM” which older versions of Firefox couldn’t address, so they ignored it; but although newer versions don’t seem to be able to address it either — that is, nothing I’ve tried makes any difference — they do process it to the point of showing the used object as a black silhouette, though transparency and transformations still work. So, <use> is currently unusable, and these particular graphics will have to be nearly twice the code size, though perhaps they will draw marginally faster once downloaded.

Also, what on earth is a ..htaccess.swp file? Seeing one in a new directory I went on alert thinking some unknown had placed it there while I wasn’t looking but it turns out to be a temporary file written by nano for a newly declared but not yet written .htaccess file. Right.

comment?
19. And Apache also has some sort of problem trying to handle both GET and POST data from a single form. Normally in mod_rewrite I’d expect to use the flags QSA,R,L for this sort of thing, but it seems I should be using QSA,R=307,L. Not sure I comprehend this, but it seems to work.
comment?
18. It appears Apache has a security limitation that prevents slashes being processed even when encoded as %2F. To get round this it is necessary to encode the slash as %2F (or something else distinct) before encoding the string normally, and decoding it again later. This may also apply to some other characters. PHP doesn’t seem to have an automated way of handling this, like a stillwrigglingurlencode() maybe, and as far as I can see the manual doesn’t mention the issue.

(Update: I’ve found one rare event in the logs which implies that some user agents send the double-encoded %252F as %2F anyway, causing a 404 error. Not much to be done about that I think. At least, not without making a hole in Apache’s security settings.)

comment?
17. Also, it seems that an external stylesheet for an SVG file may not behave the same as an embedded stylesheet. (I just wanted it separate for convenience while writing the thing.) Specifically, SVG gradients defined in <defs/> which can be referenced by embedded CSS rules cannot be accessed by an external CSS file. I assume this would apply to anything else that uses hrefs, but that’s what I’ve got. It would be possible to rewrite the gradients as CSS, but since CSS doesn’t support macros it might get a bit repetitious; and that wouldn’t work for vector graphics . . . So, back to embedded.

comment?
16.   So, after several days of code block, I discover that ECMAScript can’t use document.createElement() to produce new SVG elements, even in standalone SVG files. createElementNS() is needed instead. This page goes into the detail. How trashy is this? We’re working in a single namespace here (except when using xlink), and it’s declared in the root element. Having to add the SVG URL in every element (and attribute) method is one of the most egregious wastes of code I’ve seen in a while. (Slightly improved by setting svgns='http://www.w3.org/2000/svg', but still.)

I don’t see this stated anywhere, but in practice I’m also finding that the NS versions don’t work with hard-coded elements. i.e. you can’t use setAttributeNS() to change the class of an extant SVG element, only a script-created one. Original elements have to be altered with plain setAttribute() etc. Because the original elements are already in that original namespace you can’t separately access? Bit of a mess.

(Update, 2022-12-26: Turns out that xlink has been deprecated in SVG2, which may be part of the problem here . . . but it’s supposed to still be available for backward-compatibility, so maybe not. tbi)

comment?
15. Looks like I have a bunch of extra unused sequences in PostgreSQL . . . at a guess, I think what’s been happening is that pgAdmin creates one for every new NOT-NULL integer column if I don’t make a foreign key (or set a default value?) at the time. Which I don’t when I have some data tidying and insertions to do first. Not sure there’s anything to be done about this other than go back and check every time. Or maybe avoid setting them as NOT-NULL until keymaking time; that’s usually just a convenience thing.

And I see all the sequences are bigserial-scale rather than smallserial or serial, which is what I set the related columns as. I’m not even slightly sure why I’d need a 2⁶⁴ value space for anything; that’s more Event Horizon Telescope than Synth Repair Blog stuff. So why not make it serial by default, or just copy the column type? Well, according to the documentation all sequences are based on bigint. I’m not clear whether that means the memory allocation is 8 bytes no matter what you set the limit as, but probably? Not a big issue, as this is just 8 bytes per-sequence, not per-record as it would be with a bigint column. Presumably we don’t need to worry about bytes like we were still using 6800 and Z80 systems . . . do we? It’s even possible that this is optimal, as checking the size might take more time than just allocating and using 8 rather than 4 bytes. But don’t quote me on that.

comment?
14. Hmmmtime again. Periods in URLs: I’ve been trying to write the index page, and some of the terms that have come up are (of course) filenames like .htaccess and .DS_Store. Now I’ve been making every indexed word and their initial characters part of an internal URL which can be clicked to see term and page usage. But it seems that URLs ending [path]/. have a problem. Some of which was how I had substitutions set up, but even with that sorted, even with the period clearly appearing in the browser source code, and even if it’s substituted with ⟨%2E⟩, it doesn’t get included in the URL sent from Firefox. Substituting it with ⟨u2E⟩ works. But though I can speculate about nonstandard browser security approaches, I have no real idea why this doesn’t just work. I mean, hyphens are fine, path segments beginning or containing periods are fine, including test., so it’s just the isolated period . . . 

(Addendum:) Similar issue with strpos(), which doesn’t seem to find the initial period in .DS_Store etc. even though it does find medial and terminal periods in the same word list. strpos('.DS_Store', '‍.') → 0 However in this case what’s happening is that it’s returning the character position, 0, which is interpreted as FALSE later in the script. Not much to be done about that I suppose; use strpbrk() or back to preg_match().

Update: Somehow this broke in the first few months of being online.   I found it while doing something else and tweaked it to be correct under the new conditions, but it may be worth keeping an eye on.

Update: And it seems the same applies to ampersands. Replacing with ⟨u26⟩ works.

comment?
13. Right. array_reverse() starts keys at 0 even when its input starts at 1. (Unless $preserve_keys=TRUE, and then it doesn’t write new keys anyway.) I just bumped into that. Not actually the problem I’m having, but one to bear in mind.

comment?
12. Inescapable curly braces?

Let $s='test 123'; we want to insert curly braces. preg_replace('/(1)/', '{$1}', $s) → test {1}23 So far so good. But we also want to expand a variable in the replacement pattern. To do this we double quote it. Even without the variable we have a problem. With the replacement pattern "{$1}" we get (in error_log): PHP Parse error: syntax error, unexpected '1' (T_LNUMBER), expecting variable (T_VARIABLE) or '$' Try that again with a variable $v='|var|'. "$v {$1}" → (still a parse error) To avoid this, the reference must be separated from the opening brace, e.g.: "$v {_$1}" → test |var| {_1}23 — which is no use. Well, we didn’t want the brackets to do whatever they’re doing; we wanted literal brackets. So, escape them? "$v \{$1\}" → test |var| \{1\}23 At least it’s not a parse error; but what’s with the backslashes being displayed? I played around with it, and found that substituting \x7B for the opening brace works."$v \x7B$1}" → test |var| {1}23 ✓ Not satisfactory though. One more numeric code to remember. (And no, HTML entities weren’t an option here.)

Reading up, I see some suggestion that a string sign can be escaped with another: $$. Not sure I understand that, but here goes. "$v \{$$1\}" → test |var| \{$1\}23 Right, that escaped it — but not usefully. Still backslashes, and no substitution. Delete the bracket escapes? "$v {$$1}" → (parse error again) How about deeper levels of escapes? Nope. We get alternating parse errors (even numbers of backslashes) and pileup (odd numbers). "$v \\{$1\\}" → (parse error) "$v \\\\\{$1\\\\\}" → test |var| \\{1\\}23 Oh, and pileup of string signs too. No surprises here. "$v \{$$$1\}" → test |var| \{$$1\}23 How about preg_quote()? Is this what it’s for? preg_quote('$v {$1}') → test $v \{$1\}23 preg_quote("$v {$1}") → (parse error) preg_quote('$v \{$1\}') → test $v \\{$1\\}23 preg_quote("$v \{$1\}") → test \|var\| \\{$1\\}23 Apparently not.

An alternative, in this case at least, is to concatenate and avoid double quoting from the outset: $v.' {$1}' → test |var| {1}23 ✓ Nice and simple. But why "\{" → \{?

Horsemouthtime. The PHP Manual does state that curly braces are inescapable in double quoted strings, but doesn’t give a reason as far as I can see. It suggests in an example that ${v} is equivalent to {$v}, though it seems to contradict itself there. Trying that: "${v} ${1}" → test |var| 23 with (error log) PHP Notice: Undefined variable: 1 "$\{v\} $\{1\}" → test $\{v\} $\{1\}23 . . . the reference doesn’t work, and backslashes are shown. The Manual also states that the sequence {\$ should produce a “literal {$” . . . which, again, wouldn’t be much use, since we want the reference to expand, right? Well, just for completeness, I tried it: "$v {\$1}" → test |var| {1}23 ✓ So . . . you can escape the brace . . . with a following backslash?  

Well, that’s not really what’s happening here — I think we escaped the reference after it was expanded, therefore doing nothing; and {\$ on an ordinary variable does produce a literal {$ sequence: "{\$v} $1" → test {$v} 123

Incidentally, there’s an old comment on the online manual page that suggests using {{$v}}, but I find that can’t be used with a reference, probably for the same reason: "{{$v}} $1" → test {|var|} 123 "$v {{$1}}" → (parse error)

Also, worth noting all this is in PHP 5.4. Other versions may vary?

And also, along the way I learned that named backreferences aren’t supported in preg_replace() replacement patterns, only in preg_replace_callback(). Maybe one day . . . ?

comment?
11. Rather than background graphics on multiple layers it is possible to combine graphics in a single SVG and use that as the background for one, thus avoiding divitis. This can be made to work with different pattern alignments by using patternUnits="userSpaceOnUse" on the patterns and letting everything else default. (The bit I’d missed before.) Trouble is, this is actually slower than multiple nested divs with a background graphic each. And in fact it looks like the graphic isn’t genuinely flexible. So back to divitis.

(Update: Now I look at this some more it seems that modern browsers do work with multiple background layers, and they are genuinely flexible, so . . . goodbye to the divs.   Some things actually do get better with time.)

comment?
10. Line Breaking: UAs seem to break lines between <img/>s and a variety of spaces (other than U+20). Adding a ZWJ at the end of the <img/> seems to sort it in some cases but not all; setting a span to white-space:nowrap seems more successful but it’s a bit fiddly.
comment?
9. Newlines in CSS data: URIs cause a parsing error; you could eliminate them, but even that doesn’t seem to work; you can leave them in and base64 the whole thing. Also, the data: MIME type for an SVG graphic used this way needs to be image/svg+xml;base64. But SVGs still don’t animate as CSS backgrounds, either using CSS animation or SMIL/animate tag, though they do as embedded SVG graphics (both ways). But don’t as <img>s. That is, in Firefox. This doesn’t seem to be practical for now. n.b. SMIL seems to use significantly more CPU than CSS, so CSS may be better for the hypothetical future.
comment?
8. Option Select: And seems that Firefox does not automatically select <option>s with selected unless something. The something is unclear but it seems to work either if all elements in the <form> are in an unnecessary <fieldset>, or sometimes the individual <select> has to be in its own <fieldset>. Why?
comment?
7. Seems that PHP does not parse CDATA sections in markup. Or at least that’s the impression I get, strongly. To the point where there are many comments on the Net about how to actually get it to do valid parsing when you need it. But nowhere do I see a statement out front that by default CDATA is not parsed — which if PHP was an XML parser would be expected behaviour, but it’s not. (Though it has extensions for the purpose.) This is perhaps one of these things that everyone is just supposed to magically know, by virtue of being born. Termites may be fully aware of it for all I know. I had to work it out. Anyhow it means that HTML5-compatible on-page stylesheets could (if I used them) be prevented from being wrecked by the template system by wrapping in CDATA tags. Which is not actually useful because I prefer to import them in the <head>. But good to know, I guess. Trouble being, I don’t actually know, only speculate on the basis of observation and multiple tests. Not good enough!
comment?
6. Entities: There is no XHTML5 DTD; consequently the ‘official’ method of doing XHTML5 is to use the plain HTML declaration and serve as XML. Which is fine for most purposes, but this does not call the named entity definitions, so they fail to process. Also somewhat incomprehensibly there are people online trying to justify this on the basis that it’s somehow easier to just type the characters than remember the name codes. That smacks of desperation . . . Or a thousand-character keyboard, but I suspect desperation.

Well, a little conversion script for all the ones I find myself using isn’t so hard. But wouldn’t it be nice if things were made to work?

comment?
5. Is PNG-24 always a lossless format? I’m seeing a noise pattern where I expected a smooth gradient transition. A rather drastic dithering pattern, as if I was trying to produce a 4-bit image in PNG-8. Quite visible on-screen. As it happens I wanted to produce something a little bit roughened, not quite so much, so maybe this saves me a step. At least . . . looking at it in other contexts this seems to be just how the file displays in Firefox. There is no clear information about it online but it may be to do with the transparency model; it may only happen at low transparency? It doesn’t look bad really . . . but, hmmm. It’s going to bug me.
comment?
4. Search & Replace! . . . oh right; I forgot. Scotl&   That could be a name for an electrical goods store if it isn’t already. Oh, and Highl&s & Isl&s. Sutherl&, Anniesl&, Orkney Isl&s, Shetl& Isl&s, Pentl&s.
comment?
3. BUG . . . substitution processing fails in the sp-vote-count page. Or at least, note processing, which may not be the same . . . And it turns out that emoticon substitution is working here   but not on the sp-votes page  . Actually there are some oddities on that page . . . not just MathML and SVG. And notes work on this page.

. . . right, it’s another include or subdirectory issue . . . the failure is in the included section of the page.

. . . no, it’s more basic than that, and it affects all pages with secondary includes . . . It’s eval() — which I found I had to use (with file_get_contents()) because include() doesn’t work in some situations. iirc there was another possible solution, but eval() was enormously simpler . . . But eval() doesn’t have a return value, so the substitutions used here might not have any input under certain conditions — in projects rather than post or blogroll pages. This in spite of the processing sequence file_get_contents() → /substitutions/ → eval(), in use till now. There are various suggested fixes but the one that seems to work here is to use eval() between ob_start() and ob_get_clean(), which captures the output for the normal function process. So now the sequence is: file_get_contents() ⤷ ob_start() → eval() → ob_get_clean() ⤷ /substitutions/ ⤷ echo — and it works across all page types.

A separate question is why eval() doesn’t have a simple output control option . . . Or, to return to the starting point, why doesn’t include() work in all contexts?

Well, thinking&trying about it, eval() can be used with return in many contexts, e.g. $some_script = 'much code'; echo eval('return $some_script;'); → much code But maybe the above solution is better for this purpose.

(Later . . .  I wonder, does include() have a return value? . . . yes, but only booleans for successful completion. So you couldn’t use $a=include('x.inc') and have the file output written into the variable; you’d only get TRUE or FALSE. Well, that wasn’t the problem anyway.)

comment?
2. Incoherent RFCs: Kind of amazed a) that this is so and b) that I never noticed before . . . as you’d expect, RFC5321 §4.1.2. “Command Argument Syntax” has a BNF syntax definition; but one of the elements in it, atext, isn’t defined locally. For that you need to go back to RFC2822 §3.2.4, and even there there’s undefined ALPHA and DIGIT elements. Not that a minimally knowledgable email administrator couldn’t guess their meanings, but to be certain, you have to go back to RFC822 §3.3, or rather, over to RFC5234 where the definition of these (but not the intermediate atext) is referenced in RFC5321. But this is supposed to be machine-readable stuff. How’s the poor computer supposed to know these things?
comment?
1. For unidentified reasons I can’t seem to get the tags pages behaving correctly. All urls are written with an extra ‘/tag’. To resolve this, I’ve added a ‘base’ element to the headers . . . but now, some of the SVG graphics . . . actually interactive, scripted ones rather than static (but they’re loaded differently . . . ), don’t display gradients. (All this in Firefox 47.) This should be resolvable by adding the page name in an xml:base attribute in the SVG header . . . but that’s not ideal if you have a graphic appearing in multiple pages. Doable, but I’d like to work out the tag pages problem. Seems to me that I had this problem years ago with another site and found a solution, buuuut what it was?

Next it transpires that no fragment identifiers (in-document links — anything #-prefaced) actually work on any page. So I’ve had to do the doable, and automatically add the local page name to every <base />. Bit of a mess, that.

Update: It turns out that none of this is going to work, because the SVG standards committee decided that in some circumstances — specifically those where you have an html <base /> — SVG fragment identifiers will only relate to the overall document they’re in, which in the case of embedded SVG is the HTML document, rather than something in its own namespace. So you can specify an absolutely unique id for something in your <defs> section but it gets read as html-document#item rather than html-document#svg-id#item, and it turns out SVG (and most UAs) no longer supports xml:base, so SVG cannot be read with its own <defs>; which makes them useless, at least for anything where you need to display the same graphic in different contexts, like in a page and in the blogroll. In order to cope with that I’ve had to undo all the above and automate insertion of the site base address in front of most internal urls. Which is even more messy.

comment?

Comment or Question about this page? write

get​Pages​By​Words ( )

getPagesByWords ( )