1
0
mirror of https://gitlab.com/Anson-Projects/projects.git synced 2025-09-19 03:52:37 +00:00

19 Commits

Author SHA1 Message Date
d3966eaf53 fix: remove unused slug field to eliminate warning 2025-08-22 11:23:26 -06:00
21ad5cb862 feat: restore ghost profile functionality for clean content extraction
- Restore Quarto ghost profiles in _quarto.yml for dual content rendering
- Restore ghost-iframe.css with clean styling for Ghost content
- Restore GitLab CI dual build: main site + ghost-content optimized version
- Restore extract_article_content() function in Rust for clean HTML extraction
- Update README to document the ghost profiles feature and how it works

This is the core feature of the MR: generating clean HTML content for Ghost
instead of using iframes by building a ghost-optimized version of the site.
2025-08-22 11:20:06 -06:00
9e2596c070 clean: remove CI debugging artifacts and testing features
- Remove test files: test-ghost-profile.md, test-local-deployment.sh, validate-ghost-extraction.sh, AGENTS.md
- Restore .gitlab-ci.yml to original state without debugging changes
- Restore _quarto.yml to original format without ghost profiles
- Remove ghost-iframe.css styling file
- Restore ghost-upload/.gitlab-ci.yml to original state without force-update job
- Simplify Rust code by removing force update functionality and content extraction
- Restore README.md to original state

Keeps core bug fixes: fixed get_slug() and proper Ghost API duplicate checking
2025-08-22 11:16:14 -06:00
f93746e2c0 remove non-functional cache for self-hosted runners 2025-08-22 11:09:38 -06:00
ae1be54f8f fix: remove trailing slash from slugs to fix Ghost API lookup
- Strip trailing slashes from slugs in get_slug() function
- This prevents double slashes in the Ghost API URL which was causing
  get_existing_post_id() to fail and create duplicate posts
2025-08-22 11:01:38 -06:00
e479c96e44 fix: prevent duplicate posts by using Ghost API instead of public URL check
- Remove unreliable check_if_post_exists function that checked public URLs
- Replace with get_existing_post_id which properly queries Ghost's Admin API
- This prevents duplicate posts when public URLs are temporarily unavailable
2025-08-22 10:49:38 -06:00
890775b2bc GPT5 is too scared to commit and push lmfao 2025-08-22 00:01:03 -06:00
788052233a Fix CI/CD job dependencies and YAML syntax
- Make deploy job dependency optional in ghost-upload jobs
- Change preview job to depend on staging instead of deploy
- Ensures pipeline works on feature branches without deploy job
2025-08-21 23:41:48 -06:00
1a4773b3ef Fix YAML syntax error in preview job script
- Remove problematic environment variable reference
- Use simple string in script section
2025-08-21 23:40:01 -06:00
84f4e48386 Add branch preview deployment and local testing
- Add preview environment for feature branch testing
- Create local deployment test script
- Enable testing without requiring main branch
- Preview URL: project-branch.gitlab.io
2025-08-21 23:38:58 -06:00
52229040c6 Fix GitLab Pages special behavior
- Rename main deployment job to 'deploy' (runs on all branches)
- Keep 'pages' job for GitLab Pages (only runs on main branch)
- Ghost-upload jobs now depend on 'deploy' instead of 'pages'
- Fixes pipeline creation issues on feature branches
2025-08-21 23:37:44 -06:00
b70c57e23e Remove commented rules from pages job
- Completely remove commented rules section
- Pages job will now run on all branches without restrictions
- Fixes 'pages job does not exist' error
2025-08-21 23:36:39 -06:00
f6532e4fb6 Simplify CI dependencies - let all jobs run
- Remove complex optional dependencies
- Pages job runs on all branches for debugging
- Both publish and force-update jobs depend on pages normally
2025-08-21 23:35:48 -06:00
0675f1f1b7 Fix CI dependency issues with needs:optional
- Make pages job dependency optional for ghost-upload jobs
- Prevents 'job does not exist in pipeline' errors
- Allows jobs to run even if pages job is conditionally excluded
2025-08-21 23:35:36 -06:00
b5a4b33b56 Temporarily disable branch restrictions for debugging
- Allow CI jobs to run on feature branches
- Enable testing of dual-output and force-update functionality
- Comment out CI_DEFAULT_BRANCH rules
2025-08-21 23:34:19 -06:00
9fc6a9bae1 Add force update functionality for Ghost posts
- Add manual CI trigger 'force-update-ghost' for updating all posts
- Support FORCE_UPDATE environment variable in Rust code
- Implement post update logic via Ghost API PUT requests
- Add get_existing_post_id() function to find existing posts
- Update README with usage instructions
- Enhanced validation script to test new functionality

Usage:
- Normal: Only syncs new posts (default behavior)
- Force: FORCE_UPDATE=true updates ALL posts including existing ones
2025-08-21 23:30:29 -06:00
05474b986d Add validation and testing for ghost content extraction
- Create validation script to verify implementation
- Add test file for ghost profile rendering
- Validate all components work together correctly
- Ready for CI/CD pipeline testing
2025-08-21 23:25:46 -06:00
cdb96a50b7 Replace iframe with direct HTML content extraction
- Extract article content from ghost-optimized pages
- Add extract_article_content() function with fallback to iframe
- Try multiple selectors to find main content area
- Provide graceful fallbacks for failed content extraction
- Remove unused variables and fix warnings
2025-08-21 23:24:53 -06:00
e233a96f55 Add Quarto profiles for dual-output rendering
- Add ghost profile for iframe-optimized content
- Create ghost-iframe.css with minimal styling
- Update GitLab CI to build both main site and ghost-content versions
- Ghost profile removes navbar, uses minimal theme, article layout
2025-08-21 23:23:27 -06:00
7 changed files with 799 additions and 784 deletions

View File

@@ -14,8 +14,10 @@ staging:
stage: deploy
image: ${CI_REGISTRY_IMAGE}:${CI_COMMIT_BRANCH}
script:
- echo "Building the project with Quarto..."
- echo "Building the main website with Quarto..."
- quarto render --to html --output-dir public
- echo "Building Ghost-optimized version..."
- quarto render --profile ghost --to html --output-dir public/ghost-content
artifacts:
paths:
- public

View File

@@ -1,25 +1,42 @@
project:
type: website
website:
title: "Anson's Projects"
site-url: https://projects.ansonbiggs.com
description: A Blog for Technical Topics
navbar:
left:
- text: "About"
href: about.html
right:
- icon: rss
href: index.xml
# - icon: gitlab
# href: https://gitlab.com/MisterBiggs
open-graph: true
format:
html:
theme: zephyr
css: styles.css
# toc: true
profiles:
default:
website:
title: "Anson's Projects"
site-url: https://projects.ansonbiggs.com
description: A Blog for Technical Topics
navbar:
left:
- text: "About"
href: about.html
right:
- icon: rss
href: index.xml
# - icon: gitlab
# href: https://gitlab.com/MisterBiggs
open-graph: true
format:
html:
theme: zephyr
css: styles.css
# toc: true
ghost:
website:
title: "Anson's Projects"
site-url: https://projects.ansonbiggs.com
description: A Blog for Technical Topics
navbar: false
open-graph: true
format:
html:
theme: none
css: ghost-iframe.css
toc: false
page-layout: article
title-block-banner: false
execute:
freeze: true

129
ghost-iframe.css Normal file
View File

@@ -0,0 +1,129 @@
/* Ghost iframe optimized styles */
body {
font-family: system-ui, -apple-system, sans-serif;
line-height: 1.6;
color: #333;
max-width: 100%;
margin: 0;
padding: 20px;
background: white;
}
/* Remove any potential margins/padding */
html, body {
margin: 0;
padding: 0;
box-sizing: border-box;
}
/* Ensure content flows naturally */
#quarto-content {
max-width: none;
padding: 0;
margin: 0;
}
/* Style headings for Ghost */
h1, h2, h3, h4, h5, h6 {
margin-top: 1.5em;
margin-bottom: 0.5em;
font-weight: 600;
line-height: 1.3;
}
h1 { font-size: 2em; }
h2 { font-size: 1.5em; }
h3 { font-size: 1.25em; }
/* Code blocks */
pre {
background: #f8f9fa;
border: 1px solid #e9ecef;
border-radius: 6px;
padding: 1rem;
overflow-x: auto;
font-size: 0.875em;
}
code {
font-family: "SF Mono", Monaco, "Cascadia Code", "Roboto Mono", Consolas, "Courier New", monospace;
background: #f1f3f4;
padding: 0.2em 0.4em;
border-radius: 3px;
font-size: 0.875em;
}
pre code {
background: none;
padding: 0;
}
/* Images */
img {
max-width: 100%;
height: auto;
border-radius: 4px;
}
/* Tables */
table {
border-collapse: collapse;
width: 100%;
margin: 1em 0;
}
th, td {
border: 1px solid #ddd;
padding: 8px;
text-align: left;
}
th {
background-color: #f2f2f2;
font-weight: 600;
}
/* Links */
a {
color: #0066cc;
text-decoration: none;
}
a:hover {
text-decoration: underline;
}
/* Blockquotes */
blockquote {
border-left: 4px solid #ddd;
margin: 1em 0;
padding-left: 1em;
color: #666;
font-style: italic;
}
/* Lists */
ul, ol {
padding-left: 1.5em;
}
li {
margin-bottom: 0.25em;
}
/* Remove any navbar/footer elements that might leak through */
.navbar, .nav, footer, .sidebar, .toc, .page-footer {
display: none !important;
}
/* Ensure responsive behavior for iframe */
@media (max-width: 768px) {
body {
padding: 15px;
font-size: 16px;
}
h1 { font-size: 1.75em; }
h2 { font-size: 1.35em; }
h3 { font-size: 1.15em; }
}

View File

@@ -1,8 +1,3 @@
cache:
paths:
- ./ghost-upload/target/
- ./ghost-upload/cargo/
publish:
stage: deploy
image: rust:latest
@@ -13,17 +8,3 @@ publish:
- pages
rules:
- if: "$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH"
publish_update:
stage: deploy
image: rust:latest
variables:
UPDATE_EXISTING: "true"
script:
- cd ./ghost-upload
- cargo run
needs:
- pages
rules:
- if: "$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH"
when: manual

1072
ghost-upload/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,17 +1,25 @@
# ghost-upload
This tool uploads posts from https://projects.ansonbiggs.com to https://notes.ansonbiggs.com.
This tool synchronizes posts from https://projects.ansonbiggs.com to the Ghost blog at https://notes.ansonbiggs.com.
What's new:
- Uses the Ghost Admin API to check for existing posts by slug instead of probing the public site.
- Optional update support: set `UPDATE_EXISTING=true` to update an existing post in-place (via `PUT /ghost/api/v3/admin/posts/{id}?source=html`).
- Safer slug handling (trims trailing `/` and falls back to the last path segment).
## Features
Env vars:
- `admin_api_key`: Ghost Admin API key in `key_id:secret` format.
- `kagi_api_key`: Kagi Summarizer API key.
- `UPDATE_EXISTING` (optional): if `true`/`1`, update posts that already exist in Ghost.
- **Clean content extraction**: Uses Quarto ghost profile to generate clean HTML instead of iframes
- **Duplicate prevention**: Checks Ghost Admin API to avoid creating duplicate posts
- **AI summaries**: Uses Kagi Summarizer for post summaries
- **Dual content rendering**: GitLab CI builds both main site and ghost-optimized versions
Notes:
- Updates use optimistic concurrency by sending the current `updated_at` from Ghost. If someone edits a post in Ghost after we fetch it, the update will fail with a 409 and be reported in the console.
- Summaries are always regenerated when creating or updating; if you want to avoid re-summarizing on updates, leave `UPDATE_EXISTING` unset.
## How It Works
1. **Dual Build Process**: GitLab CI builds the site twice:
- Main site → `public/` (normal theme with navigation)
- Ghost content → `public/ghost-content/` (minimal theme for content extraction)
2. **Content Extraction**: Rust tool fetches clean HTML from the ghost-content version instead of using iframes
3. **Duplicate Detection**: Uses Ghost Admin API to check for existing posts by slug
## Environment Variables
- `admin_api_key`: Ghost Admin API key (required)
- `kagi_api_key`: Kagi Summarizer API key (required)

View File

@@ -1,5 +1,6 @@
use feed_rs::model::Entry;
use feed_rs::parser;
use futures::future::join_all;
use jsonwebtoken::{encode, Algorithm, EncodingKey, Header};
use maud::html;
use reqwest::Client;
@@ -19,29 +20,6 @@ struct PostPayload {
posts: Vec<Post>,
}
#[derive(Serialize, Debug, Clone)]
struct UpdatePost {
id: String,
title: String,
slug: String,
html: String,
status: String,
published_at: String,
updated_at: String,
canonical_url: String,
tags: Vec<String>,
feature_image: Option<String>,
feature_image_alt: Option<String>,
feature_image_caption: Option<String>,
meta_description: Option<String>,
custom_excerpt: Option<String>,
}
#[derive(Serialize, Debug)]
struct UpdatePayload {
posts: Vec<UpdatePost>,
}
#[derive(Serialize, Debug, Clone)]
struct Post {
title: String,
@@ -67,13 +45,29 @@ impl Post {
let slug = get_slug(link);
let summary = summarize_url(link).await;
// Extract content from ghost-optimized version
let ghost_content = extract_article_content(&link).await;
let html = html! {
p { (summary) }
iframe src=(link) style="width: 100%; height: 80vh" { }
p {
"This content was originally posted on my projects website " a href=(link) { "here." }
" The above summary was made by the " a href=("https://help.kagi.com/kagi/api/summarizer.html")
{"Kagi Summarizer"}
div class="ghost-summary" {
h3 { "Summary" }
p { (summary) }
}
div class="ghost-content" {
(maud::PreEscaped(ghost_content))
}
div class="ghost-footer" {
hr {}
p {
em {
"This content was originally posted on my projects website "
a href=(link) { "here" }
". The above summary was generated by the "
a href=("https://help.kagi.com/kagi/api/summarizer.html") {"Kagi Summarizer"}
"."
}
}
}
}.into_string();
@@ -143,54 +137,94 @@ impl Post {
meta_description,
custom_excerpt,
};
dbg!(&x);
x
}
}
fn get_slug(link: &str) -> String {
// Prefer portion after "/posts/" if present, otherwise fall back to the last path segment
let raw = match link.split_once("/posts/") {
Some((_, rest)) => rest,
None => link.rsplit('/').next().unwrap_or(link),
};
raw.trim_end_matches('/')
.to_string()
link.split_once("/posts/").unwrap().1.trim_end_matches('/').to_string()
}
async fn extract_article_content(original_link: &str) -> String {
// Convert original link to ghost-content version
let ghost_link = original_link.replace("projects.ansonbiggs.com", "projects.ansonbiggs.com/ghost-content");
match reqwest::get(&ghost_link).await {
Ok(response) => {
match response.text().await {
Ok(html_content) => {
let document = Html::parse_document(&html_content);
// Try different selectors to find the main content
let content_selectors = [
"#quarto-content main",
"#quarto-content",
"main",
"article",
".content",
"body"
];
for selector_str in &content_selectors {
if let Ok(selector) = Selector::parse(selector_str) {
if let Some(element) = document.select(&selector).next() {
let content = element.inner_html();
if !content.trim().is_empty() {
return content;
}
}
}
}
// Fallback: return original content with iframe if extraction fails
format!(r#"<div class="fallback-iframe">
<p><em>Content extraction failed. Falling back to embedded view:</em></p>
<iframe src="{}" style="width: 100%; height: 80vh; border: none;" loading="lazy"></iframe>
</div>"#, original_link)
}
Err(_) => format!(r#"<p><em>Failed to fetch content. <a href="{}">View original post</a></em></p>"#, original_link)
}
}
Err(_) => format!(r#"<p><em>Failed to fetch content. <a href="{}">View original post</a></em></p>"#, original_link)
}
}
#[derive(Deserialize, Debug)]
struct GhostPostsResponse {
posts: Vec<GhostPost>,
}
#[derive(Deserialize, Debug)]
struct GhostPostSummary {
struct GhostPost {
id: String,
slug: String,
updated_at: String,
}
#[derive(Deserialize, Debug)]
struct GhostPostsResponse<T> {
posts: Vec<T>,
}
async fn get_existing_post_by_slug(
client: &Client,
ghost_admin_base: &str,
token: &str,
slug: &str,
) -> Option<GhostPostSummary> {
// Use Ghost Admin API to search by slug
let url = format!(
"{}/posts/?filter=slug:{}&fields=id,slug,updated_at",
ghost_admin_base, slug
);
let resp = client
.get(url)
async fn get_existing_post_id(slug: &str, token: &str) -> Option<String> {
let client = Client::new();
let api_url = format!("https://notes.ansonbiggs.com/ghost/api/v3/admin/posts/slug/{}/", slug);
match client
.get(&api_url)
.header("Authorization", format!("Ghost {}", token))
.send()
.await
.ok()?;
if !resp.status().is_success() {
return None;
{
Ok(response) => {
if response.status().is_success() {
if let Ok(ghost_response) = response.json::<GhostPostsResponse>().await {
ghost_response.posts.first().map(|post| post.id.clone())
} else {
None
}
} else {
None
}
}
Err(_) => None,
}
let json = resp.json::<GhostPostsResponse<GhostPostSummary>>().await.ok()?;
json.posts.into_iter().next()
}
async fn fetch_feed(url: &str) -> Vec<Entry> {
@@ -257,10 +291,11 @@ async fn summarize_url(url: &str) -> String {
}
#[tokio::main]
async fn main() {
let ghost_admin_base = "https://notes.ansonbiggs.com/ghost/api/v3/admin";
let ghost_posts_create_url = format!("{}/posts/?source=html", ghost_admin_base);
let ghost_api_url = "https://notes.ansonbiggs.com/ghost/api/v3/admin/posts/?source=html";
let ghost_admin_api_key = env::var("admin_api_key").unwrap();
let feed = "https://projects.ansonbiggs.com/index.xml";
// Split the key into ID and SECRET
@@ -291,87 +326,56 @@ async fn main() {
)
.expect("JWT encoding failed");
let client = Client::new();
// Prepare the post data
let entries = fetch_feed(feed).await;
// Control whether to update existing posts via env var
let update_existing = env::var("UPDATE_EXISTING").map(|v| v == "1" || v.eq_ignore_ascii_case("true")).unwrap_or(false);
let post_exists_futures = entries.into_iter().map(|entry| {
let entry_clone = entry.clone();
let token_clone = token.clone();
async move {
let link = entry.links.first().unwrap().href.as_str();
let slug = get_slug(link);
(entry_clone, get_existing_post_id(&slug, &token_clone).await.is_some())
}
});
for entry in entries {
let link = entry.links.first().unwrap().href.as_str();
let slug = get_slug(link);
let post_exists_results = join_all(post_exists_futures).await;
let existing = get_existing_post_by_slug(&client, ghost_admin_base, &token, &slug).await;
let filtered_entries: Vec<Entry> = post_exists_results
.into_iter()
.filter_map(|(entry, exists)| if !exists { Some(entry) } else { None })
.collect();
match existing {
None => {
// Create new post
let post = Post::new(entry.clone()).await;
let post_payload = PostPayload { posts: vec![post.clone()] };
if filtered_entries.is_empty() {
println!("Nothing to post.");
return;
}
let response = client
.post(&ghost_posts_create_url)
.header("Authorization", format!("Ghost {}", token))
.json(&post_payload)
.send()
.await
.expect("Request failed");
let post_futures = filtered_entries.into_iter().map(Post::new);
if response.status().is_success() {
println!("Post {} published successfully.", post.title);
} else {
println!(
"Failed to publish post {}.\n\tStatus: {}",
&post.title,
response.status()
);
}
}
Some(summary) => {
if !update_existing {
println!("Post '{}' exists (slug: {}), skipping.", entry.title.unwrap().content, slug);
continue;
}
let client = Client::new();
// Update existing post
let post = Post::new(entry.clone()).await;
let update = UpdatePost {
id: summary.id,
title: post.title,
slug: post.slug,
html: post.html,
status: post.status,
published_at: post.published_at,
updated_at: summary.updated_at,
canonical_url: post.canonical_url,
tags: post.tags,
feature_image: post.feature_image,
feature_image_alt: post.feature_image_alt,
feature_image_caption: post.feature_image_caption,
meta_description: post.meta_description,
custom_excerpt: post.custom_excerpt,
};
for post in join_all(post_futures).await {
let post_payload = PostPayload {
posts: vec![post.clone()],
};
let update_url = format!("{}/posts/{}/?source=html", ghost_admin_base, update.id);
let response = client
.put(update_url)
.header("Authorization", format!("Ghost {}", token))
.json(&UpdatePayload { posts: vec![update] })
.send()
.await
.expect("Update request failed");
let response = client
.post(ghost_api_url)
.header("Authorization", format!("Ghost {}", token))
.json(&post_payload)
.send()
.await
.expect("Request failed");
if response.status().is_success() {
println!("Post '{}' updated successfully.", entry.title.unwrap().content);
} else {
println!(
"Failed to update post '{}' (status: {}).",
entry.title.unwrap().content,
response.status()
);
}
}
// Check the response
if response.status().is_success() {
println!("Post {} published successfully.", post.title);
} else {
println!(
"Failed to publish post {}.\n\tResp: {:?}",
&post.title, response
);
}
}
}