How to scrap news portal php script
To create a news portal scraper in PHP, you typically use a combination of an HTTP client (like Guzzle or cURL) to fetch the webpage and an HTML parser (like Simple HTML DOM) to extract specific headlines or articles.
Firecrawl +1
1. Recommended PHP Libraries (2025)
For modern and efficient scraping, avoid manual regex. Instead, use these industry-standard tools:
- Guzzle: The most popular HTTP client for sending requests.
- Simple HTML DOM Parser: A beginner-friendly library that allows you to find elements using CSS selectors (e.g., find('h2.headline')).
- Symfony DomCrawler & Panther: Advanced libraries that can even handle JavaScript-heavy news sites.
Bright Data +3
2. Simple News Scraper Script
This basic example uses the voku/simple_html_dom library to fetch news headlines from a target site.
Firecrawl +1
php
require 'vendor/autoload.php'; // Use Composer to install libraries use vokuhelperHtmlDomParser; // 1. Target News URL $url = 'https://example-news-portal.com'; // 2. Fetch the HTML content $html = file_get_contents($url); $dom = HtmlDomParser::str_get_html($html); // 3. Extract headlines (adjust the selector based on the site's structure) $news_items = []; foreach($dom->find('h2.article-title') as $element) { $news_items[] = [ 'title' => trim($element->plaintext), 'link' => $element->find('a', 0)->href ?? '#' ]; } // 4. Output the results print_r($news_items); ?>
Use code with caution.
3. Key Implementation Steps
- Inspect the Site: Right-click a news headline in your browser and select Inspect to find the specific HTML tag (like
) or class (like .news-card) you need to target. - Fetch HTML: Use curl or file_get_contents() to retrieve the raw page data.
- Parse Data: Use your chosen library to loop through the elements and save titles, links, or image URLs into an array.
- Store or Display: You can save this data into a MySQL database to build your own portal or display it immediately in a Bootstrap marquee.
4. Ethical & Legal Considerations
- Check robots.txt: Always verify if the site allows scraping by visiting ://example.com.
- Rate Limiting: Do not spam requests; add a sleep() delay between scrapes to avoid being blocked.
- API Alternative: Many major news portals (like BBC or CNN) offer official News APIs which are faster and safer than scraping.
YouTube +4
CirebonDigital
Anonim
8 jam yang lalu